Hardware / Cloud Requirements

Requirements for a single instance/VM

For simplicity and cost-effectiveness, you can run FormX.ai on a single server/instance; However, a single server cannot train new extractors; it can only run pre-built extractors or extractors trained on the FormX.ai SaaS platform.

Minimum Specification:

  • 16 Core Intel CPU (each core must be at least 2.6Ghz or faster)
  • 32GB RAM
  • 120GB SSD
  • GPU is not required

Requirements for a cluster

FormX.ai deploys on the Kubernetes cluster by default. Here are the minimum specifications:

PurposesNumber of InstancesMinimal Specification
API / Extraction Workers3 VMs (minimal for Kubernetes)8 vCPU
16GB RAM
30GB SSD
Database (PostgreSQL)

(Using managed PostgreSQL is recommended)
- 1 VM
- 2 VMs + 1 Witness VM (for High Availability)
4 vCPU
8GB RAM
64GB SSD
Self-Hosted OCR

optional : only required in disconnected environments
- 2 VMs (for High Availability)8 vCPU
16GB RAM
30GB SSD
Self-Hosted and Fine-Tune LLM Model

optional: only required for LLM-related features
- 2 VMs (for High Availability)4 vCPU
16GB RAM
100GB SSD

GPUs, one of:

- 1xH100 (80GB)
- 1xA100 (80GB)
- 1xA6000 (48GB)
- 1xL40 (48GB)
- 2x4090 (24GB)
- 2x3090 (24GB)
ML Workers for dataset generation

optional: only required for DocInfo Model Training Pipeline
3 VMs8 vCPU
16GB RAM
100GB+ SSD
ML Trainer for model training

optional: only required for DocInfo Model Training Pipeline
1 VM (more for parallel training)8 vCPU
16GB RAM
100GB+ SSD
GPU: P100 or better
16GB+ vRAM
Storage

optional: only required for Accuracy Center, DocInfo Model Training Pipeline, or Fine-tune Self-Hosted LLM Model
-10GBs+

(depends on image size)

Cloud Resources Inventories

For a typical Cloud Deployment, here are the list of Cloud Resources Required:

InventoryPurposesRelated Cloud Products
KubernetesRun the applications, workers, trainersGCP GKE
Azure AKS
AWS EKS
DatabaseStore the configs, audit logs, temporarily result for async requestsGCP Cloud SQL for PostgreSQL
Azure Database for PostgreSQL
AWS RDS for PostgreSQL
Image StorageStorage of the images for training (optional)Google Cloud Storage
Azure Blob Storage
AWS S3
OCROCRGoogle Vision API
Azure OCR
Other Software ComponentsRedis: Cache authentication tokens
Authgear: For authentication
Using some pods on the k8s cluster