Hardware / Cloud Requirements

Requirements for a single instance/VM

For simplicity and cost-effectiveness, you can run FormX.ai on a single server/instance; However, a single server cannot train new extractors; it can only run pre-built extractors or extractors trained on the FormX.ai SaaS platform.

Minimum Specification:

16 Core Intel CPU (each core must be at least 2.6Ghz or faster)
32GB RAM
120GB SSD
GPU is not required

Requirements for a cluster

FormX.ai deploys on the Kubernetes cluster by default. Here are the minimum specifications:

Purposes	Number of Instances	Minimal Specification
API / Extraction Workers	3 VMs (minimal for Kubernetes)	8 vCPU 16GB RAM 30GB SSD
Database (PostgreSQL) (Using managed PostgreSQL is recommended)	- 1 VM - 2 VMs + 1 Witness VM (for High Availability)	4 vCPU 8GB RAM 64GB SSD
Self-Hosted OCR optional : only required in disconnected environments	- 2 VMs (for High Availability)	8 vCPU 16GB RAM 30GB SSD
Self-Hosted and Fine-Tune LLM Model optional: only required for LLM-related features	- 2 VMs (for High Availability)	4 vCPU 16GB RAM 100GB SSD GPUs, one of: - 1xH100 (80GB) - 1xA100 (80GB) - 1xA6000 (48GB) - 1xL40 (48GB) - 2x4090 (24GB) - 2x3090 (24GB)
ML Workers for dataset generation optional: only required for DocInfo Model Training Pipeline	3 VMs	8 vCPU 16GB RAM 100GB+ SSD
ML Trainer for model training optional: only required for DocInfo Model Training Pipeline	1 VM (more for parallel training)	8 vCPU 16GB RAM 100GB+ SSD GPU: P100 or better 16GB+ vRAM
Storage optional: only required for Accuracy Center, DocInfo Model Training Pipeline, or Fine-tune Self-Hosted LLM Model	-	10GBs+ (depends on image size)

Cloud Resources Inventories

For a typical Cloud Deployment, here are the list of Cloud Resources Required:

Inventory	Purposes	Related Cloud Products
Kubernetes	Run the applications, workers, trainers	GCP GKE Azure AKS AWS EKS
Database	Store the configs, audit logs, temporarily result for async requests	GCP Cloud SQL for PostgreSQL Azure Database for PostgreSQL AWS RDS for PostgreSQL
Image Storage	Storage of the images for training (optional)	Google Cloud Storage Azure Blob Storage AWS S3
OCR	OCR	Google Vision API Azure OCR
Other Software Components	Redis: Cache authentication tokens Authgear: For authentication	Using some pods on the k8s cluster