Pages

Wednesday, April 17, 2013

Data Stage Transformer Usage Guidelines

Choosing Appropriate Stages

The parallel Transformer stage always generates “C” code which is then compiled to a parallel component. For this reason, it is important to minimize the number of transformers, and to use other stages (Copy, Filter, Switch, etc) when derivations are not needed. 
  • The Copy stage should be used instead of a Transformer for simple operations including:
  • Job Design placeholder between stages (unless the Force option =true, EE will optimize this out at runtime)
  • Renaming Columns
  • Dropping Columns
  • Default Type Conversions

Note that rename, drop (if runtime column propagation is disabled), and default type conversion can also be performed by the output mapping tab of any stage
NEVERuse the “BASIC Transformer” stage in large-volume job flows. Instead, user-defined functions and routines can expand parallel Transformer capabilities. 
Consider, if possible, implementing complex derivation expressions using regular patterns by Lookup tables instead of using a Transformer with nested derivations.
For example, the derivation expression:

If A=0,1,2,3 Then B=”X” If A=4,5,6,7 Then B=”C”
Could be implemented with a lookup table containing values for column A and corresponding values of column B.
Optimize the overall job flow design to combine derivations from multiple Transformers into a single Transformer stage when possible.
In v7 and later, the Filter and/or Switch stages can be used to separate rows into multiple output links based on SQL-like link constraint expressions. 
In v7 and later, the Modify stage can be used for non-default type conversions, null handling, and character string trimming.
 Buildops should be used instead of Transformers in the handful of scenarios where complex reusable logic is required, or where existing Transformer-based job flows do not meet performance requirements.

Transformer NULL Handling and Reject Link

When evaluating expressions for output derivations or link constraints, the Transformer will reject (through the reject link indicated by a dashed line) any row that has a NULL value used in the expression. To create a Transformer reject link in DataStage Designer, right-click on an output link and choose “Convert to Reject”.
The Transformer rejects NULL derivation results because the rules for arithmetic and string handling of NULL values are by definition undefined. For this reason, always test for null values before using a column in an expression, for example:
If ISNULL(link.col) Then… Else…
Note that if an incoming column is only used in a pass-through derivation, the Transformer will allow this row to be output. DataStage release 7 enhances this behavior by placing warnings in the log file when discards occur.

Transformer Derivation Evaluation

Output derivations are evaluated BEFORE any type conversions on the assignment. For example, the PadString function uses the length of the source type, not the target. Therefore, it is important to make sure the type conversion is done before a row reaches the Transformer.

For example, TrimLeadingTrailing(string) works only if string is a VarChar field. Thus, the incoming column must be type VarChar before it is evaluated in the Transformer.

Conditionally Aborting Jobs

The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. Create a new output link that will handle rows that match the abort rule. Within the link constraints dialog box, apply the abort rule to this output link, and set the “Abort After Rows” count to the number of rows allowed before the job should be aborted .
Since the Transformer will abort the entire job flow immediately, it is possible that valid rows will not have been flushed from Sequential File (export) buffers, or committed to database tables. It is important to set the Sequential File buffer flush  or database commit parameters.

2 comments:


  1. Wonderful put up, very informative. You must continue your writing. I’m confident, you’ve a great readers’ base already!

    Jin
    www.imarksweb.org

    ReplyDelete
  2. It is good to see the best blog for DataStage
    Thanks for updating more on datastage with us.

    ReplyDelete