PDF version of this page

Data-Extraction Quick Hands-On Example

FTForm has two operating sections, the first builds an XML file from existing input, normally Print Line files direct from your ERP or other systems. If your system outputs its own XML file you can largely ignore this tutorial. The remainder of this section describes how to generate an XML output from your input, and for this discussion we'll use a sample standard Purchase Order from QAD. Please print document QAD-PO-with-notes, then load the program. We'll turn this input document into XML, in the process learning how to define Data Extraction.

This is the initial screen, from this select New (where shown), set the Document Name to QAD PO Example and select Template as QAD-PO. This is a minimum two page original file with most of the possible variations, including between page literals. Always check with an expert system user to make sure you have all data represented when defining XML from system data. The most common mistake with FormTrap forms is running into data that it has no knowledge of.

Define the master first, in the area which you'll see open over the text.

Shift the pink boundaries so it covers the entire header. In this example, there is a Header Comment line, so highlight to the end of Remarks (two lines down from the Remarks: literal - see "with notes") and pull the top down to the first line (P U R C H A S E     O R D E R). We need to "recognize" this page, so identify what makes it unique.

We'll use "P U R C H A S E" (identifies this document), "Page:   1" (Identifies this as the first page) and "Remarks:" (confirms the document) to identify this page. Select New Field (see below) and draw the I-Bar over text as shown.

This window opens, in which you give the field a Name.

Closing the window and selecting the field shows its name (in yellow, above the field).

Right click on the field, and select as shown to make this a "Rule". This rule requires an "Exact match" with the highlighted field. All three Rules need to be true to recognize this area (i.e. master area). Rule fields show in Green rather than yellow.

Double-clicking on a field shows its properties, right-click and selecting Rules and other properties allows you to see fields and rules for this area (see later).

Define all of the other fields on the header and see below for the field list and the finished header record.

You can "lock into" field creation by pressing the I-Bar icon in the Field group. Press again to unlock from field creation.

Now test this using the Test button Test Icon at top right, you should see this:

At this point we should define what we DO NOT want. These are typical:

Second and subsequent page headers - we've defined these as Unw Page Hdr. Screen shots below show New area (new detail record) and pressing Show next page to move to Page 2.

Note: Areas are normally placed on top of each other, with one "active" at a time. This is difficult to work with, below shows how to toggle to an "active only" area view:

  • This window allows select of the active area, Unw Page Hdr in the example.
  • This button toggles between all and the active area.
  • UPDATE WHEN FTFORM UPDATED

Select down to and including the Detail Line Header, and set the rules (you can use anything that identifies the subsequent pages, we recommend "first line" and "last line" constants).

What else to get rid of? The only remaining "unwanted" area on this form is the Detail Heading on Page 1, so get rid of this, as below:

Now we'll define Totals. These are part of the Master, so select Master and press New (Area), and select New master. Move to the end of the data and highlight totals.

Icons -Text, -Number, and -Date are selected from the Parser prompt (in Data field window). Define a Rule for the first and last lines, and define the other fields required. You should end like this:

Top

Order of Evaluation

Do a new test and view the XML output. While this will work in this case, it will not always work. Why? Because although it's not obvious from this file, some output can split the total over more than one page, hence we will not be able to find a "split" total until we've first removed the redundant page headers.

To change the Order of Evaluation, press the Evaluation Order button (Area) to change the order. See below for the required order, with Total after Unw Page Hdr.

Try the test again, you should see this, with all of the Trailer fields included as Master fields:

Next we'll generate the Product Area, so select New (area) and (new detail record), and define just the test fields first, using a date Slash and the Decimal Points as tests. This defines the record.

This is defining the Rules for the Product.

and here is the Product record with all fields defined. Note this time you can see fields defined with different icons as -Text, -Numeric, and -Date from the Parser prompt (in Data field Window).

You do this by selecting a field, right-clicking and going to either Field properties (to change just this field) or to Rules and other area properties allowing edits to any field or rule within this record.

Look at the "with notes" file, we have up to four (and down to none) lines following Product. Here is the definition for the first (Revision) line, comprising one rule, on the literal "Revision:" with just the two fields:

Define the site line, optional line, with two fields, one as a rule testing for Site:.

Define the Supplier line, optional line, with two fields, one as a rule testing for Supplier:.

Here is the Manufacturer record, last of the optional lines and showing all records so far. It has one rule and two additional fields.

The next line required to define a product is the product name, "KICKPLATE ...". This record is always present and is defined with rule "not empty" on the single field.

Following this line may be optional comment lines, to be treated as one long paragraph, so the individual lines are stored in the one field name with a space between what were individual lines. See the difference between Prod Name and Product Comment as below under the Options: and Field merging: prompts.

This is the XML for the first and last products in the file:

A comparison at this point with Version 7 FTDesign is illustrative, it took 16 different structures to define this situation (4 optional lines, zero to 4 present), while here it takes just the one structure, with each element defined just once, including the Product Comments which can now be line wrapped if required. The very superior efficiency of FTForm8 data definition is obvious. Similarly it will identify floating totals, document comments and similar structure easily.

Finally, we require the Header Comment defined, this is part of the Master area and is defined with tests in the first two columns for "empty" and in the following two columns for "not empty". The comment occupies from Column 3 to the end of the line. The comment is defined similar to Product Comments as "Repetitive" and "Space separated", meaning additional comment lines are added to a long block of text.

This is the Header Comment record defined, with the final XML file produced.

A few comments on what you can do with RegEx

Regular Expressions are used as the expression logic in FormTrap due to its power to isolate anything you might require and for standards of its documentation.

The common Regular Expressions are included in the list shown by the Wizard, and you are free to add your own expressions to this list. Some expressions contain FIXME as a placeholder for "Equal to", "Not equal to", "Contains" and "Does not contain" and others. For these, please select, then change FIXME to whatever you want to test for (or not for).

Expression you may want to add are for dual purpose lines, such as these (from Invoices that are also Credits). To trap BOTH kind of totals, we use this as the RegEx.

Total Credit Amount

Total Credit Amount

Total Credit Amount

Total Credit Amount

Documentation of RegEx is on the web at this address: http://regular-expressions.info/

Support is not available under our Support and Upgrades contract with customers. If you have issues with RegEx and/or require new RegEx expressions made for you we will do this at our then current consulting rates, in 15 minutes increments (current rates for calendar 2012 are $60 / 15 minutes in AUD or USD).

Top