Document Extraction

HTTP Endpoint

POST https://worker.formextractorai.com/extract

Overview

Send the image with a POST request to the Extract API endpoint and FormX will isolate extraction fields from the uploaded image then perform OCR on these fields.

FormX will use the form model of your choice to extract and return the data in a JSON format. The form model can be specified by the form_id parameter. An Access Token should also be included. They can be obtained from the web portal dashboard.

FormX provides two modes to access the API, synchronous or asynchronous, which can be specified by a parameter. See more details below.

cURL Example

curl -X POST \
https://worker.formextractorai.com/extract \
-H 'Content-Type: image/jpeg' \
-H 'X-WORKER-FORM-ID: REPLACE-YOUR-FORM-ID-HERE' \
-H 'X-WORKER-TOKEN: REPLACE-YOUR-WORKER-TOKEN-HERE' \
--data-binary "@/path/to/query/image.jpg"

Making the Request

If you want to upload the image directly, it can be uploaded in the request body or via multipart/form-data. If you want to specify an image url, it can be submitted via a header or multipart/form-data.

Most of the parameters can be submitted either as multipart/form-data or as request headers.

Using parameters in HTTP request headers

Name

Optional

Description

Content-Type

optional

image/jpeg or image/png or application/pdf

*required if image is sent in the request body

X-WORKER-TOKEN

required

Access token

This parameter must be included in the header.

X-WORKER-FORM-ID

required

Form ID

X-WORKER-IMAGE-URL

optional

URL of the image, can be a JPG, PNG or PDF file *required if request body is empty

X-WORKER-ENCODING

optional

Encoding of the request body, allowed 'raw' or 'base64'

Default value: raw

X-WORKER-PDF-DPI

optional

DPI of the uploaded pdf file

Default value: 100

X-WORKER-SHOW-CONFIDENCE

optional

Flag for showing confidence score in response

Default value: false

X-WORKER-AUTO-ADJUST-IMAGE-SIZE

optional

Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled

Default value: true

X-WORKER-ASYNC

optional

Flag for using the asynchronous mode

Default value: false

Using parameters in form data

Name

Optional

Description

form_id

required

Form ID

image

optional

The image file, can be a JPG, PNG or PDF file

Either specify this or provide the image_url parameter

image_url

optional

URL of the image, can be a JPG, PNG or PDF file

Either specify this or provide the image parameter

pdf_dpi

optional

DPI of the uploaded pdf file

Default value: 100

show_confidence

optional

Flag for showing confidence score in response

Default value: false

auto_adjust_image_size

optional

Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled

Default value: true

async

optional

Flag for using the asynchronous mode

Default value: false

API Response

Name

Type

Description

status

string

"ok" if success, "failed" if failed

form_id

string

Form ID

fields

Field[]

List of extracted fields and fields in detection regions

auto_extraction_items

AutoExtractionItem[]

List of detected auto extraction items

key_values

KeyValue[]

List of detected token groups

token_groups

TokenGroup[]

List of detected token groups

error

any

Only exists if failed, its shape depends on the failure, but it always contain the "code" and "message" fields

Field

Name

Type

Description

region_id

string

Detection region ID

name

string

Field label

type

string

Field type

value

any

Extracted content

error

string

Message of the error if exists

confidence

number

Confidence score *exists if confidence score is enabled

If there is a list of values, e.g. fields with the type name or address, this will be return alongside the value in the list.

AutoExtractionItem

Name

Type

Description

name

string

Item name

value

any

Item value

confidence

number

Confidence score *exists if confidence score is enabled If there is a list of values, e.g. for name, address, and job_title items, this will be return alongside the value in the list.

KeyValue

Name

Type

Description

name

string

Item name

value

string

Item value

confidence

number

Confidence score

*exists if confidence score is enabled

TokenGroup

Name

Type

Description

name

string

Token group name

texts

Token[]

List of detected text tokens in this group

images

Token[]

List of detected image tokens in this group

Token

Name

Type

Description

id

string

Token id

value

string

Token value

confidence

number

Confidence score

*exists if confidence score is enabled

Using the Asynchronous mode

If the request takes too long to complete, you can use the asynchronous mode to avoid timeout. This can be enabled by the async parameter either in the header or form data.

Job ID

If the async job is successfully created, a 202 Accepted response will be returned with the job_id and the page_count.

202 Accepted {
async: true,
job_id: <string>,
page_count: <number>,
}

Getting the extraction result

GET https://worker.formextractorai.com/extract/jobs/:job_id

The extraction result can be queried by polling the Job endpoint. Send GET request to the endpoint /extract/jobs/:job_id with the access token in the header until the result is returned.

The extraction result will be deleted 24 hours after the job is completed, no matter it has been queried or not.

Pending

201 Created {
status: "pending"
}

Completed

200 OK {
status: "OK",
pages: [
// structure same as the extract API
]
}