FormX async call tutorial in NodeJS
What is an async (aynchronous) call?
An async (short for asynchronous) call is used when the extractor's result isn't returned immediately as part of the HTTP response. With a slight modification to a normal FormX HTTP call, instead of receiving the extractor's result, you'll get a job_id
(a unique string identifier). This job_id
acts as a key for retrieving the extractor job's result later. (In this context, job
is synonymous with task
, referring to the specific action you've requested FormX to perform in that HTTP request.)
When should I use an async call?
Use an async call when processing large files, such as PDFs with 4 or more pages. Normal synchronous calls may timeout due to HTTP limitations. This occurs because the HTTP protocol has no way to distinguish between a server legitimately taking a long time to process a large file and a server encountering an error (such as infinite loops). To prevent indefinite waiting, the protocol enforces a timeout, returning an HTTP timeout response after a set period. Async calls circumvent this limitation by allowing the server to process large files without being constrained by the HTTP timeout.
Technical details
Like a regular FormX extractor HTTP call, headers such as X-WORKER-EXTRACTOR-ID
and other optional headers are required. To enable an async call, simply set the X-WORKER-ASYNC
header to true (it defaults to false, which is why we didn't need to specify it before). If you are using formdata instead of headers, just append async: true
to the formdata in the extract HTTP request.
Reference: headers formdata
Here is a sample async extract response:
202 Accepted
{
"job_id": "<string>",
"request_id": "<string>",
"status": "ok"
}
Afterwards, repeatedly check on the job until it is done. The API for getting the async extraction result only requires the X-WORKER-TOKEN
for authorization and the job_id
in the URL.
Here is a sample Curl request:
curl --request GET \
--url https://worker.formextractorai.com/v2/extract/jobs/{replace with job_id} \
--header 'X-WORKER-TOKEN: {replace with token}' \
--header 'accept: application/json:'
For intermediate and final HTTP responses, refer to the end of this tutorial.
A complete NodeJS sample
- Create a folder named
async-example
(or your preferred name) for your project. - Open a terminal in the newly created folder.
- Run
npm init
in the terminal. (Note:npm
comes with NodeJS installation.) - Keep pressing enter for all prompts until you reach test command:.
- For
test command:
, typenode index.js
. - Continue pressing enter to finish the initialization.
- Verify that
package.json
has been created in the folder. - Create a new file named
index.js
in the same folder. - In the terminal, run
npm install --save node-fetch@2 form-data
to install dependencies. - Open
index.js
in a text editor. - Paste the code below into
index.js
:
const WORKER_TOKEN = 'replace with your worker access token';
const EXTRACTOR_ID = 'replace with your extractor ID';
const PDF_FILENAME = 'replace with PDF filename in the same folder as this index.js';
const ENDPOINT = 'worker.formextractorai.com'; // or 'sg-gcp.worker.formextractorai.com'
const EXTRACT_URL = `https://${ENDPOINT}/v2/extract`;
const WAIT_TIME = 1000; // time interval between get requests in milliseconds
const fs = require('fs');
const FormData = require('form-data');
const fetch = require('node-fetch');
async function performExtraction() {
const formData = new FormData();
formData.append('extractor_id', EXTRACTOR_ID);
formData.append('async', 'true');
formData.append('image', fs.createReadStream(PDF_FILENAME));
const extractOptions = {
method: 'POST',
headers: {
'accept': 'application/json',
'X-WORKER-TOKEN': WORKER_TOKEN
},
body: formData
};
const response = await fetch(EXTRACT_URL, extractOptions);
const json = await response.json();
console.log('Response from async extract HTTP call:', json);
if (json.status === "ok") {
return json.job_id;
} else {
throw new Error('Extraction failed');
}
}
async function getResult(jobID) {
const getOptions = {
method: 'GET',
headers: {
'accept': 'application/json',
'X-WORKER-TOKEN': WORKER_TOKEN
}
};
while (true) {
const response = await fetch(`${EXTRACT_URL}/jobs/${jobID}`, getOptions);
const json = await response.json();
console.log('Response from async get result HTTP call:', json);
if (json.status === "ok") {
return json;
} else if (json.status !== "pending") {
throw new Error('Unexpected job status');
}
await new Promise(resolve => setTimeout(resolve, WAIT_TIME));
}
}
async function main() {
const jobID = await performExtraction();
const result = await getResult(jobID);
console.log('Final result:', result);
// Process 'result' here
}
main();
Remember to replace all the const
values at the beginning. The PDF filename should include .pdf
at the end, and the file should be placed in the project folder. Here is the project structure:
async-example/
├─node_modules/
├─index.js
├─your_pdf_file.pdf
├─package.json
├─package-lock.json
Run the code with npm run test
in the terminal. Here is the expected result::
- After the first extract HTTP call:
response from async extract HTTP call: {
status: 'ok',
job_id: '<string>',
request_id: '<string>'
}
- While the PDF is being processed, this response should be returned repeatedly:
response from async get result HTTP call: { status: 'pending', job_id: '<string>' }
- Finally, the code will exit once the actual result is obtained:
response from async get result HTTP call: {
status: 'ok',
metadata: {
extractor_id: ...,
request_id: ...,
usage: <number of pages>,
job_id: ...
},
documents: [
{
extractor_id: ...,
metadata: [Object],
data: [Object],
detailed_data: [Object]
},
...
]
}
Updated 5 months ago