IBM Information Server 8.X (DataStage): Parallel Transformation Stage

DataStage: What is Transformer Stage?

DataStage provides multiple stages for data extraction, transformations, and loading into the data warehouse or data marketplace. The stages are classified into General, Database, Development and debugging, Archive, Processing, Real time, etc. These stages will be classified into categories of active or passive stages.

The transformer stage is a processing stage.

This stage allows us to create transformations to apply to your data based on the given business rules.

It can have a single input and any number of outputs. You can also have a reject link that takes any row that has not been written to any of the output links due to a write or expression evaluation failure or null handling rejects.

The transformer stage is divided into

1. Link area

  • Define column definition
  • Define stage variables

2. Metadata area

  • Define column metadata for inputs and outputs.

Exit links:

  1. Pass some data directly through the Transformer stage without alterations
  2. Modify the derivation by entering transformation expression.
  3. Specify constraints that operate on full outbound links
  4. You can also specify a constraint otherwise, link, which is an output link that carries all the data that is not emitted in other links, that is, columns that do not meet the criteria.

A constraint is an expression that specifies the criteria that the data must meet before it can be passed to the output binding.

Reject link:

You can also specify another link that takes rows that have not been written to any other link due to a typing or expression evaluation error. This is specified out of stage by adding a link and making it a reject link. All records that are dropped due to null handling will also be written to reject the link.

If runtime column propagation is enabled, the metadata is not required for the outputs.

Find and Replace capabilities allow you to find the particular string with an expression or search for column names or find an empty expression in expression types.

Defining derivations of output columns:

  • Use drag and drop or copy and paste to copy an input column to the outputs
  • Column auto matching feature to automatically configure derived columns from your matching input columns.

Column auto match

  1. Choose the output link that you want to match the columns with the input link from the dropdown list.
  2. Match type area.
    • Location Match: This will set column derivations to the input link columns at equivalent positions.
    • Name Match: The set of output derivations based on the name match.

RESTRICTIONS and OTHERS / Registration

A constraint is an expression that specifies the criteria that the data must meet before it can be passed to the output binding.

  • Clicking the Otherwise / Record field to bring up a check mark and leaving the Restriction fields blank. This will catch any rows that have not met the constraints on all previous outbound links.
  • Clicking the Otherwise / Log field will log the number of rows written to that link (i.e. rows that satisfy the restriction) in the job log as a warning message

Along with these, we can define the variables of the local stage, use the system variables, and we can also set the partition methods and sort operations.

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *