FormX.ai
Search…
Document Extraction

HTTP Endpoint

POST https://worker.formextractorai.com/extract

Overview

Send the image with a POST request to the Extract API endpoint and FormX will isolate extraction fields from the uploaded image then perform OCR on these fields.
FormX will use the form model of your choice to extract and return the data in a JSON format. The form model can be specified by the form_id parameter. An Access Token should also be included. They can be obtained from the web portal dashboard.
FormX provides two modes to access the API, synchronous or asynchronous, which can be specified by a parameter. See more details below.

cURL Example

1
curl -X POST \
2
https://worker.formextractorai.com/extract \
3
-H 'Content-Type: image/jpeg' \
4
-H 'X-WORKER-FORM-ID: REPLACE-YOUR-FORM-ID-HERE' \
5
-H 'X-WORKER-TOKEN: REPLACE-YOUR-WORKER-TOKEN-HERE' \
6
--data-binary "@/path/to/query/image.jpg"
Copied!

Making the Request

If you want to upload the image directly, it can be uploaded in the request body or via multipart/form-data. If you want to specify an image url, it can be submitted via a header or multipart/form-data.
Most of the parameters can be submitted either as multipart/form-data or as request headers.

Using parameters in HTTP request headers

Name
Optional
Description
Content-Type
optional
image/jpeg or image/png or application/pdf
*required if image is sent in the request body
X-WORKER-TOKEN
required
Access token
This parameter must be included in the header.
X-WORKER-FORM-ID
required
Form ID
X-WORKER-IMAGE-URL
optional
URL of the image, can be a JPG, PNG or PDF file *required if request body is empty
X-WORKER-ENCODING
optional
Encoding of the request body, allowed 'raw' or 'base64'
Default value: raw
X-WORKER-PDF-DPI
optional
DPI of the uploaded pdf file
Default value: 100
X-WORKER-SHOW-CONFIDENCE
optional
Flag for showing confidence score in response
Default value: false
X-WORKER-AUTO-ADJUST-IMAGE-SIZE
optional
Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled
Default value: true
X-WORKER-ASYNC
optional
Flag for using the asynchronous mode
Default value: false

Using parameters in form data

Name
Optional
Description
form_id
required
Form ID
image
optional
The image file, can be a JPG, PNG or PDF file
Either specify this or provide the image_url parameter
image_url
optional
URL of the image, can be a JPG, PNG or PDF file
Either specify this or provide the image parameter
pdf_dpi
optional
DPI of the uploaded pdf file
Default value: 100
show_confidence
optional
Flag for showing confidence score in response
Default value: false
auto_adjust_image_size
optional
Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled
Default value: true
async
optional
Flag for using the asynchronous mode
Default value: false

API Response

Name
Type
Description
status
string
"ok" if success, "failed" if failed
form_id
string
Form ID
fields
Field[]
List of extracted fields and fields in detection regions
auto_extraction_items
AutoExtractionItem[]
List of detected auto extraction items
key_values
KeyValue[]
List of detected token groups
token_groups
TokenGroup[]
List of detected token groups
error
any
Only exists if failed, its shape depends on the failure, but it always contain the "code" and "message" fields

Field

Name
Type
Description
region_id
string
Detection region ID
name
string
Field label
type
string
Field type
value
any
Extracted content
error
string
Message of the error if exists
confidence
number
Confidence score *exists if confidence score is enabled
If there is a list of values, e.g. fields with the type name or address, this will be return alongside the value in the list.

AutoExtractionItem

Name
Type
Description
name
string
Item name
value
any
Item value
confidence
number
Confidence score *exists if confidence score is enabled If there is a list of values, e.g. for name, address, and job_title items, this will be return alongside the value in the list.

KeyValue

Name
Type
Description
name
string
Item name
value
string
Item value
confidence
number
Confidence score
*exists if confidence score is enabled

TokenGroup

Name
Type
Description
name
string
Token group name
texts
Token[]
List of detected text tokens in this group
images
Token[]
List of detected image tokens in this group

Token

Name
Type
Description
id
string
Token id
value
string
Token value
confidence
number
Confidence score
*exists if confidence score is enabled

Using the Asynchronous mode

If the request takes too long to complete, you can use the asynchronous mode to avoid timeout. This can be enabled by the async parameter either in the header or form data.

Job ID

If the async job is successfully created, a 202 Accepted response will be returned with the job_id and the page_count.
1
202 Accepted {
2
async: true,
3
job_id: <string>,
4
page_count: <number>,
5
}
Copied!

Getting the extraction result

GET https://worker.formextractorai.com/extract/jobs/:job_id
The extraction result can be queried by polling the Job endpoint. Send GET request to the endpoint /extract/jobs/:job_id with the access token in the header until the result is returned.
The extraction result will be deleted 24 hours after the job is completed, no matter it has been queried or not.

Pending

1
201 Created {
2
status: "pending"
3
}
Copied!

Completed

1
200 OK {
2
status: "OK",
3
pages: [
4
// structure same as the extract API
5
]
6
}
Copied!
Last modified 4mo ago