LINK PARTITIONER Stage:

The Link Partitioner stage is an active stage which takes one input and allows you to distribute partitioned rows to up to 64 output links. The stage expects the output links to use the same meta data as the input link.

Partitioning your data enables you to take advantage of a multi-processor system and have the data processed in parallel. It can be used in conjunction with the Link Collector stage to partition data, process it in parallel, then collect it together again before writing it to a single target. To really understand the benefits you need to know a bit about how DataStage jobs are run as processes, see “DataStage Jobs and Processes”.

In order for this job to compile and run as intended on a multi-processor system you must have inter-process buffering turned on, either at project level using the DataStage Administrator, or at job level from the Job Properties dialog box.

Before-Stage and After-Stage Subroutines

The General tab on the Stage page contains optional fields that allow you to define routines to use which are executed before or after the stage has processed the data.

  • Before-stage subroutine and Input Value. Contain the name (and value) of a subroutine that is executed before the stage starts to process any data. For example, you can specify a routine that prepares the data before processing starts.
  • After-stage subroutine and Input Value. Contain the name (and value) of a subroutine that is executed after the stage has processed the data. For example, you can specify a routine that sends an electronic message when the stage has finished

Choose a routine from the drop-down list box. This list box contains all the routines defined as a Before/After Subroutine under the Routines branch in the Repository. Enter an appropriate value for the routine’s input argument in the Input Value field..

If you choose a routine that is defined in the Repository, but which was edited but not compiled, a warning message reminds you to compile the routine when you close the Link Partitioner Stage dialog box.

A return code of 0 from the routine indicates success, any other code indicates failure and causes a fatal error when the job is run.

If you installed or imported a job, the Before-stage subroutine or Afterstage subroutine field may reference a routine that does not exist on your system. In this case, a warning message appears when you close the Link Partitioner Stage dialog box. You must install or import the “missing” routine or choose an alternative one to use.

Defining Link Partitioner Stage Properties

The Properties tab allows you to specify two properties for the Link Partitioner stage:

  • Partitioning Algorithm. Use this property to specify the method the stage uses to partition data. Choose from:
    • Round-Robin. This is the default method. Using the round-robin method the stage will write each incoming row to one of its output links in turn.
    • Random. Using this method the stage will use a random number generator to distribute incoming rows evenly across all output links.
    • Hash. Using this method the stage applies a hash function to one or more input column values to determine which output link the row is passed to.
    • Modulus. Using this method the stage applies a modulus function to an integer input column value to determine which output link the row is passed to.
  • Partitioning Key. This property is only significant where you have chosen a partitioning algorithm of Hash or Modulus. For the Hash algorithm, specify one or more column names separated by commas. These keys are concatenated and a hash function applied to determine the destination output link. For the Modulus algorithm, specify a single column name which identifies an integer numeric column. The value of this column value determines the destination output link.
Defining Link Partitioner Stage Input Data
The Link Partitioner stage can have one input link. This is where the data to be partitioned arrives.
The Inputs page has two tabs: General and Columns.
  • General. The General tab allows you to specify an optional description of the stage.
  • Columns. The Columns tab contains the column definitions for the data on the input link. This is normally populated by the meta data of the stage connecting on the input side. You can also Load a column definition from the Repository, or type one in yourself (and Save it to the Repository if required). Note that the meta data on the input link must be identical to the meta data on the output links.
Defining Link Partitioner Stage Output Data
The Link Partitioner stage can have up to 64 output links. Partitioned data flows along these links. The Output Name drop-down list on the Outputs pages allows you to select which of the 64 links you are looking at.
The Outputs page has two tabs: General and Columns.
  • General. The General tab allows you to specify an optional description of the stage.
  • Columns. The Columns tab contains the column definitions for the data on the input link. You can Load a column definition from the Repository, or type one in yourself (and Save it to the Repository if required). Note that the meta data on the output link must be identical to the meta data on the input link. So the meta data is identical for all the output links.

 

Leave a Reply