Transformer stages do not extract data or write data to a target database. They are used to handle extracted data, perform any conversions required, and pass data to another Transformer stage or a stage that writes data to a target data table.

Transformer stages can have any number of inputs and outputs. The link from the maindata input source is designated the primary input link. There can only be one primary input link, but there can be any number of reference inputs.

Input Links

The main data source is joined to the Transformer stage via the primary link, but the stage can also have any number of reference input links.

A reference link represents a table lookup. These are used to provide information that might affect the way the data is changed, but do not supply the actual data to be changed.

Reference input columns can be designated as key fields. You can specify key expressions that are used to evaluate the key fields. The most common use for the key expression is to specify an equi-join, which is a link between a primary link column and a reference link column. For example, if your primary input data contains names and addresses, and a reference input contains names and phone numbers, the reference link name column is marked as a key field and the key expression refers to the primary link’s name column. During processing, the name in the primary input is looked up in the reference input. If the names match, the reference data is consolidated with the primary data. If the names do not match, i.e., there is no record in the reference input whose key matches the expression given, all the columns specified for the reference input are set to the null value.

Where a reference link originates from a UniVerse or ODBC stage, you can look up multiple rows from the reference table. The rows are specified by a foreign key, as opposed to a primary key used for a single-row lookup.

Output Links:

You can have any number of output links from your Transformer stage.

You may want to pass some data straight through the Transformer stage unaltered, but it’s likely that you’ll want to transform data from some input columns before outputting it from the Transformer stage.

You can specify such an operation by entering a BASIC expression or by selecting a transform to apply to the data. DataStage has many built-in transforms, or you can define your own custom transforms that are stored in the Repository and can be reused as required.

The source of an output link column is defined in that column’s Derivation cell within the Transformer Editor. You can use the Expression Editor to enter expressions or transforms in this cell. You can also simply drag an input column to an output column’s Derivation cell, to pass the data straight through the Transformer stage.

In addition to specify derivation details for individual output columns, you can also specify constraints that operate on entire output links. A constraint is a BASIC expression that specifies criteria that data must meet before it can be passed to the output link. You can also specify a reject link, which is an output link that carries all the data not output on other links, that is, columns that have not met the criteria.

Each output link is processed in turn. If the constraint expression evaluates to TRUE for an input row, the data row is output on that link. Conversely, if a constraint expression evaluates to FALSE for an input row, the data row is not output on that link.

Constraint expressions on different links are independent. If you have more than one output link, an input row may result in a data row being output from some, none, or all of the output links.

For example, if you consider the data that comes from a paint shop, it could include information about any number of different colors. If you want to separate the colors into different files, you would set up different constraints. You could output the information about green and blue paint on LinkA, red and yellow paint on LinkB, and black paint on LinkC.

When an input row contains information about yellow paint, the LinkA constraint expression evaluates to FALSE and the row is not output on LinkA. However, the input data does satisfy the constraint criterion for LinkB and the rows are output on LinkB.

If the input data contains information about white paint, this does not satisfy any constraint and the data row is not output on Links A, B or C, but will be output on the reject link. The reject link is used to route data to a table or file that is a “catch-all” for rows that are not output on any other link. The table or file containing these rejects is represented by another stage in the job design.

Before-Stage and After-Stage Routines

Because the Transformer stage is an active stage type, you can specify routines to be executed before or after the stage has processed the data. For example, you might use a before-stage routine to prepare the data before processing starts. You might use an afterstage routine to send an electronic message when the stage has finished.

Specifying the Primary Input Link

The first link to a Transformer stage is always designated as the primary input link. However, you can choose an alternative link to be the primary link if necessary. To do this:

  1. Select the current primary input link in the Diagram window.
  2. Choose Convert to Reference from the Diagram window shortcut menu.
  3. Select the reference link that you want to be the new primary input link.
  4. Choose Convert to Stream from the Diagram window shortcut menu.

Leave a Reply