What is Fixed Layout Extractor?

Quick video intro

Watch 3:31 - 6:35 of the video below for a quick introduction to how to create a fixed format parser.

How it works?

By uploading a master image of the documents with a fixed layout and labelling the different areas as Anchor and Detection Region, you can extract most of the information.

Detection Region

A blue area is detection region in the form. One detection region can have multiple extraction fields, which means you're encouraged to draw a larger detection region and use the extraction AI or post-processing script to remove unwanted data.

These are the types of extraction fields supported:

Extraction Field	What it is?	Remarks
Text	Single Line or Block of English / Chinese	There is an option to remove any watermarks in the background
Barcode / QR code	Extract QR Codes or Barcodes in the area
Checkbox	Detect checkbox	Return true or false, empty boxes are considered false.
Name	Extract people's names from the text	There is an option to remove any watermarks in the background
Hong Kong Address	Extract the address from the text	Currently, only Hong Kong addresses are supported, Address from other countries can be requested via email or discussions forum.
Date/Time	Extract the Date or Time from the text
Total Amount	Extract the sum of the amount from the text
Text in paragraph	Extract a string between words	Common use cases are for letters. E.g. if there is a line of text "The total amount of the bill is $521 dollars", you can choose to extract "$521" between "the bill is" and "dollars"
Table	Extract a table	The table will be extracted and returned as an array.
Script	Write Javascript to process all text in the area and return data,	Common use cases include "fuzzy search", and "regular expression."

Anchor

An Anchor is a position keeper. It plays an essential part in warping images that will be uploaded in the future to match the master form before extraction can properly take place.

Thus, anchoring properly is extremely important to yield favourable data extraction results. Identical parts among your documents make good Anchors. It's also recommended to have at least two Anchors, preferably stretched as much as possible on the master image.

If you have only two Anchors, their location should be any of the following to boost accuracy: top and bottom; top left and bottom right; top right and bottom left; or left and right. The idea is to cover as much area as possible with the longest distance possible between the Anchors.

When you are unsatisfied with the extraction accuracy, and it appears to be related to image warping, try adding more valid Anchors. Areas that show up in every single document under the specific type of your form make a valid Anchor.

The screenshot above is an example of Anchor. The right bar has a check box that allows users to decide whether one is enabled or not. Two Anchors are marked here, where the selected one is the title of a Business Registration form, and the other top left one with the word "ORIGINAL" in it.

As mentioned above, to boost image warping accuracy, Anchors should be spread across the master form as much as possible, which is indicated by the green box in the below image. That way, FormX learns more about your form's appearance, hence can yield better extraction results. We have marked only three here for simplicity, whereas our actual Business Registration form template has much more Anchors.

Additional extraction items

You can also use various processors powered by AI, including:

Key-Value Processor to extract information present as key-value format, such as Name: Ben Cheng , "Name" would be the key, "Ben Cheng" would be the value.
Text Token Processor to detect if a text exists.
Image Token Processorto detect if an image such as logo exists.