Hardware / Cloud Requirements

Requirements for a single instance/VM

For simplicity and cost-effectiveness, you can run FormX.ai on a single server/instance; However, a single server cannot train new extractors; it can only run pre-built extractors or extractors trained on the FormX.ai SaaS platform.

Minimum Specification:

  • 16 Core Intel CPU (each core must be at least 2.6Ghz or faster)
  • 32GB RAM
  • 120GB SSD
  • GPU is not required

Requirements for a cluster

FormX.ai deploys on the Kubernetes cluster by default. Here are the minimum specifications:

Purposes

Number of Instances

Minimal Specification

API / Extraction Workers

3 VMs (minimal for Kubernetes)

8 vCPU
16GB RAM
30GB SSD

Database (PostgreSQL)

(Using managed PostgreSQL is recommended)

  • 1 VM
  • 2 VMs + 1 Witness VM (for High Availability)

4 vCPU
8GB RAM
64GB SSD

Self-Hosted OCR

  • *optional **: only required in disconnected environments
  • 2 VMs (for High Availability)

8 vCPU
16GB RAM
30GB SSD

Self-Hosted and Fine-Tune LLM Model

  • *optional**: only required for LLM-related features
  • 2 VMs (for High Availability)

4 vCPU
16GB RAM
100GB SSD

GPUs, one of:

  • 1xH100 (80GB)
  • 1xA100 (80GB)
  • 1xA6000 (48GB)
  • 1xL40 (48GB)
  • 2x4090 (24GB)
  • 2x3090 (24GB)

ML Workers for dataset generation

  • *optional**: only required for DocInfo Model Training Pipeline

3 VMs

8 vCPU
16GB RAM
100GB+ SSD

ML Trainer for model training

  • *optional**: only required for DocInfo Model Training Pipeline

1 VM (more for parallel training)

8 vCPU
16GB RAM
100GB+ SSD
GPU: P100 or better
16GB+ vRAM

Storage

  • *optional**: only required for Accuracy Center, DocInfo Model Training Pipeline, or Fine-tune Self-Hosted LLM Model

10GBs+

(depends on image size)

Cloud Resources Inventories

For a typical Cloud Deployment, here are the list of Cloud Resources Required:

Inventory

Purposes

Related Cloud Products

Kubernetes

Run the applications, workers, trainers

GCP GKE
Azure AKS
AWS EKS

Database

Store the configs, audit logs, temporarily result for async requests

GCP Cloud SQL for PostgreSQL
Azure Database for PostgreSQL
AWS RDS for PostgreSQL

Image Storage

Storage of the images for training (optional)

Google Cloud Storage
Azure Blob Storage
AWS S3

OCR

OCR

Google Vision API
Azure OCR

Other Software Components

Redis: Cache authentication tokens
Authgear: For authentication

Using some pods on the k8s cluster