What is Fixed Layout Extractor?

Quick video intro

Watch 3:31 - 6:35 of the video below for a quick introduction to how to create a fixed format parser.

How it works?

By uploading a master image of the documents with a fixed layout and labelling the different areas as Anchor and Detection Region, you can extract most of the information.

Detection Region

A blue area is detection region in the form. One detection region can have multiple extraction fields, which means you're encouraged to draw a larger detection region and use the extraction AI or post-processing script to remove unwanted data.

These are the types of extraction fields supported:

Extraction FieldWhat it is?Remarks
TextSingle Line or Block of English / ChineseThere is an option to remove any watermarks in the background
Barcode / QR codeExtract QR Codes or Barcodes in the area
CheckboxDetect checkboxReturn true or false, empty boxes are considered false.
NameExtract people's names from the textThere is an option to remove any watermarks in the background
Hong Kong AddressExtract the address from the textCurrently, only Hong Kong addresses are supported, Address from other countries can be requested via email or discussions forum.
Date/TimeExtract the Date or Time from the text
Total AmountExtract the sum of the amount from the text
Text in paragraphExtract a string between wordsCommon use cases are for letters. E.g. if there is a line of text "The total amount of the bill is $521 dollars", you can choose to extract "$521" between "the bill is" and "dollars"
TableExtract a tableThe table will be extracted and returned as an array.
ScriptWrite Javascript to process all text in the area and return data,Common use cases include "fuzzy search", and "regular expression."

Anchor

An Anchor is a position keeper. It plays an essential part in warping images that will be uploaded in the future to match the master form before extraction can properly take place.

Thus, anchoring properly is extremely important to yield favourable data extraction results. Identical parts among your documents make good Anchors. It's also recommended to have at least two Anchors, preferably stretched as much as possible on the master image.

If you have only two Anchors, their location should be any of the following to boost accuracy: top and bottom; top left and bottom right; top right and bottom left; or left and right. The idea is to cover as much area as possible with the longest distance possible between the Anchors.

When you are unsatisfied with the extraction accuracy, and it appears to be related to image warping, try adding more valid Anchors. Areas that show up in every single document under the specific type of your form make a valid Anchor.

The screenshot above is an example of Anchor. The right bar has a check box that allows users to decide whether one is enabled or not. Two Anchors are marked here, where the selected one is the title of a Business Registration form, and the other top left one with the word "ORIGINAL" in it.

As mentioned above, to boost image warping accuracy, Anchors should be spread across the master form as much as possible, which is indicated by the green box in the below image. That way, FormX learns more about your form's appearance, hence can yield better extraction results. We have marked only three here for simplicity, whereas our actual Business Registration form template has much more Anchors.

Additional extraction items

You can also use various processors powered by AI, including: