Document Extraction

HTTP Endpoint

POST https://worker.formextractorai.com/extract

Overview

Send the image with a POST request to the Extract API endpoint and FormX will recognize the information from the document.
FormX will use the extractor of your choice to extract and return the data in a JSON format. The extractor can be specified by the form_id parameter. An Access Token should also be included. They can be obtained from the web portal dashboard.
FormX provides two modes to access the API, synchronous or asynchronous, which can be specified by a parameter. See more details below.

cURL Example

curl -X POST \
https://worker.formextractorai.com/extract \
-H 'Content-Type: image/jpeg' \
-H 'X-WORKER-FORM-ID: REPLACE-YOUR-FORM-ID-HERE' \
-H 'X-WORKER-TOKEN: REPLACE-YOUR-WORKER-TOKEN-HERE' \
--data-binary "@/path/to/query/image.jpg"

Code Examples

Python
Node.js
Go
PHP
import requests
url = "https://worker.formextractorai.com/extract"
payload=open('FILE_PATH_TO_IMAGE', 'rb')
headers = {
'X-WORKER-TOKEN': 'ACCESS_TOKEN',
'X-WORKER-FORM-ID': 'FORM_ID',
'Content-Type': 'image/jpeg'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
var fs = require('fs');
var request = require('request');
var options = {
'method': 'POST',
'url': "https://worker.formextractorai.com/extract",
'headers': {
'X-WORKER-TOKEN': "ACCESS_TOKEN",
'X-WORKER-FORM-ID': "FORM_ID",
'Content-Type': 'image/jpeg',
'X-WORKER-ENCODING': 'base64'
},
body: base64_encode("FILE_PATH_TO_IMAGE")
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
// function to encode file data to base64 encoded string
function base64_encode(file) {
// read binary data
var bitmap = fs.readFileSync(file);
// convert binary data to base64 encoded string
return new Buffer(bitmap).toString('base64');
}
package main
import (
"fmt"
"strings"
"net/http"
"io/ioutil"
)
func main() {
url := "https://worker.formextractorai.com/extract"
method := "POST"
payload := strings.NewReader("<file contents here>")
client := &http.Client {
}
req, err := http.NewRequest(method, url, payload)
if err != nil {
fmt.Println(err)
return
}
req.Header.Add("X-WORKER-TOKEN", "ACCESS_TOKEN")
req.Header.Add("X-WORKER-FORM-ID", "FORM_ID")
req.Header.Add("Content-Type", "image/jpeg")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer res.Body.Close()
body, err := ioutil.ReadAll(res.Body)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(body))
}
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => 'https://worker.formextractorai.com/extract',
CURLOPT_RETURNTRANSFER => true,
CURLOPT_ENCODING => '',
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 0,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_CUSTOMREQUEST => 'POST',
CURLOPT_POSTFIELDS => "<file contents here>",
CURLOPT_HTTPHEADER => array(
'X-WORKER-TOKEN: ACCESS_TOKEN',
'X-WORKER-FORM-ID: FORM_ID',
'Content-Type: image/jpeg'
),
));
$response = curl_exec($curl);
curl_close($curl);
echo $response;

Postman Example

You can download the following file and import it to the Postman app to test the API. Remember to set the X-WORKEN_TOKEN and X-WORKER-FORM-ID variables in the collection scope to configure your form.
FormX Document Extraction.postman_collection.json.zip
1KB
Binary
Postman Collection

Image Sizing Recommendations

The images submitted to the extraction API should be of sufficient size so that the text and features can be easily distinguished.
The API supports JPEG, PDF, and PNG image types. It is recommended to use an image minimum of 1000x750 pixels or 100 DPI.

Making the Request

If you want to upload the image directly, it can be uploaded in the request body or via multipart/form-data. If you want to specify an image URL, it can be submitted via a header or multipart/form-data.
Most of the parameters can be submitted either as multipart/form-data or as request headers.

Using parameters in HTTP request headers

Name
Optional
Description
Content-Type
optional
image/jpeg or image/png or application/pdf
*required if image is sent in the request body
X-WORKER-TOKEN
required
Access token
This parameter must be included in the header.
X-WORKER-FORM-ID
required
Form ID
X-WORKER-IMAGE-URL
optional
URL of the image, can be a JPG, PNG or PDF file *required if request body is empty
X-WORKER-ENCODING
optional
Encoding of the request body, allowed 'raw' or 'base64'
Default value: raw
X-WORKER-PDF-DPI
optional
DPI of the uploaded pdf file
Default value: 100
X-WORKER-SHOW-CONFIDENCE
optional
Flag for showing confidence score in response
Default value: false
X-WORKER-AUTO-ADJUST-IMAGE-SIZE
optional
Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled
Default value: true
X-WORKER-ASYNC
optional
Flag for using the asynchronous mode
Default value: false

Using parameters in form data

Name
Optional
Description
form_id
required
Form ID
image
optional
The image file, can be a JPG, PNG or PDF file
Either specify this or provide the image_url parameter
image_url
optional
URL of the image, can be a JPG, PNG or PDF file
Either specify this or provide the image parameter
pdf_dpi
optional
DPI of the uploaded pdf file
Default value: 100
show_confidence
optional
Flag for showing confidence score in response
Default value: false
auto_adjust_image_size
optional
Flag for auto adjusting image size for better extraction result, it will take a longer for extraction if enabled
Default value: true
async
optional
Flag for using the asynchronous mode
Default value: false

API Response

Name
Type
Description
status
string
"ok" if success, "failed" if failed
form_id
string
Form ID
fields
Field[]
List of extracted fields and fields in detection regions
auto_extraction_items
AutoExtractionItem[]
List of detected auto extraction items
key_values
KeyValue[]
List of detected token groups
token_groups
TokenGroup[]
List of detected token groups
error
any
Only exists if failed, its shape depends on the failure, but it always contain the "code" and "message" fields

Field

Name
Type
Description
region_id
string
Detection region ID
name
string
Field label
type
string
Field type
value
any
Extracted content
error
string
Message of the error if exists
confidence
number
Confidence score *exists if confidence score is enabled
If there is a list of values, e.g. fields with the type name or address, this will be return alongside the value in the list.

AutoExtractionItem

Name
Type
Description
name
string
Item name
value
any
Item value
confidence
number
Confidence score *exists if confidence score is enabled If there is a list of values, e.g. for name, address, and job_title items, this will be return alongside the value in the list.

KeyValue

Name
Type
Description
name
string
Item name
value
string
Item value
confidence
number
Confidence score
*exists if confidence score is enabled

TokenGroup

Name
Type
Description
name
string
Token group name
texts
Token[]
List of detected text tokens in this group
images
Token[]
List of detected image tokens in this group

Token

Name
Type
Description
id
string
Token id
value
string
Token value
confidence
number
Confidence score
*exists if confidence score is enabled

Using the Asynchronous mode

If the request takes too long to complete, you can use the asynchronous mode to avoid timeout. This can be enabled by the async parameter either in the header or form data.

Job ID

If the async job is successfully created, a 202 Accepted response will be returned with the job_id and the page_count.
202 Accepted {
async: true,
job_id: <string>,
page_count: <number>,
}

Getting the extraction result

GET https://worker.formextractorai.com/extract/jobs/:job_id
The extraction result can be queried by polling the Job endpoint. Send GET request to the endpoint /extract/jobs/:job_id with the access token in the header until the result is returned.
The extraction result will be deleted 24 hours after the job is completed, no matter it has been queried or not.

Pending

201 Created {
status: "pending"
}

Completed

200 OK {
status: "OK",
pages: [
// structure same as the extract API
]
}

Troubleshooting

Error codes

If an error occured, the endpoint returns an error response in the following structure.
{
"error": {
"code": ERROR_CODE,
"message": "ERROR_MESSAGE"
},
"status": "failed"
}
Below is a table containing the possible errors and a short description of their cause.
Code
Message
401
Unauthorized
1001
Invalid token or Invalid argument
1002
Form ID not found
1003
Unsafe image url
1004
Cannot load image
1005
Uploaded file is too large
1006
Free quota used up, please upgrade to a paid plan to continue.
1007
Too many requests
1008
Usage reached hard limit
1009
Image dimension is too large
2001
Form not found
2002
Form not ready
2003
Query image is not match the specified form
2004
Error during extracting form info
2005
Form group not found
2006
Form group is empty
2007
Cannot recognize any text from the input image
3001
Receipt group ID not found
3002
Google Cloud Vision service account key is not set
3003
Error during extracting receipt info
3004
Error during accessing Google Vision API
3005
Azure computer vision subscription key and/or endpoint is not set
4001
Extract job not found
4002
Fail to submit extract job, please try again later
5001
Custom model ID not found
5002
Custom model not found
5003
Error during creating the sync cvat project task
6001
Endpoint is not available because Procrastinate is disabled
Copy link
On this page
HTTP Endpoint
Overview
cURL Example
Code Examples
Postman Example
Image Sizing Recommendations
Making the Request
API Response
Field
AutoExtractionItem
KeyValue
TokenGroup
Token
Using the Asynchronous mode
Job ID
Getting the extraction result
Troubleshooting
Error codes