Hardware / Cloud Requirements
Requirements for a single instance/VM
For simplicity and cost-effectiveness, you can run FormX.ai on a single server/instance; However, a single server cannot train new extractors; it can only run pre-built extractors or extractors trained on the FormX.ai SaaS platform.
Minimum Specification:
- 16 Core Intel CPU (each core must be at least 2.6Ghz or faster)
- 32GB RAM
- 120GB SSD
- GPU is not required
Requirements for a cluster
FormX.ai deploys on the Kubernetes cluster by default. Here are the minimum specifications:
Purposes | Number of Instances | Minimal Specification |
---|---|---|
API / Extraction Workers | 3 VMs (minimal for Kubernetes) | 8 vCPU 16GB RAM 30GB SSD |
Database (PostgreSQL) (Using managed PostgreSQL is recommended) | - 1 VM - 2 VMs + 1 Witness VM (for High Availability) | 4 vCPU 8GB RAM 64GB SSD |
Self-Hosted OCR optional : only required in disconnected environments | - 2 VMs (for High Availability) | 8 vCPU 16GB RAM 30GB SSD |
Self-Hosted and Fine-Tune LLM Model optional: only required for LLM-related features | - 2 VMs (for High Availability) | 4 vCPU 16GB RAM 100GB SSD GPUs, one of: - 1xH100 (80GB) - 1xA100 (80GB) - 1xA6000 (48GB) - 1xL40 (48GB) - 2x4090 (24GB) - 2x3090 (24GB) |
ML Workers for dataset generation optional: only required for DocInfo Model Training Pipeline | 3 VMs | 8 vCPU 16GB RAM 100GB+ SSD |
ML Trainer for model training optional: only required for DocInfo Model Training Pipeline | 1 VM (more for parallel training) | 8 vCPU 16GB RAM 100GB+ SSD GPU: P100 or better 16GB+ vRAM |
Storage optional: only required for Accuracy Center, DocInfo Model Training Pipeline, or Fine-tune Self-Hosted LLM Model | - | 10GBs+ (depends on image size) |
Cloud Resources Inventories
For a typical Cloud Deployment, here are the list of Cloud Resources Required:
Inventory | Purposes | Related Cloud Products |
---|---|---|
Kubernetes | Run the applications, workers, trainers | GCP GKE Azure AKS AWS EKS |
Database | Store the configs, audit logs, temporarily result for async requests | GCP Cloud SQL for PostgreSQL Azure Database for PostgreSQL AWS RDS for PostgreSQL |
Image Storage | Storage of the images for training (optional) | Google Cloud Storage Azure Blob Storage AWS S3 |
OCR | OCR | Google Vision API Azure OCR |
Other Software Components | Redis: Cache authentication tokens Authgear: For authentication | Using some pods on the k8s cluster |
Updated about 1 month ago