surrogate-key-generation in Transformer
This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.
In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.
svMyCounter = svMyCounter + 1
This simple counter adds 1 each time a row is processed.
The counter can be given a seed value by passing a value in as a job parameter and setting the initial value of svMyCounter to that job parameter.
In a parallel job this simple counter will create duplicate values on each node as the transformer is split into parallel instances. It can be turned into a unique counter by using special parallel macros.
Remember this method only works if your data is evenly balanced i.e. equal number of rows going through each partition. Alternative syntax is:
@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM
(numpartion*rwonum)+partinnum-numpartition+1
Example:(2 partitions)
2*1+0-2+1=1
2*1+1-2+1=2
2*2+0-2+1=3
2*2+1-2+1=4
Or
Sequence numbers can be generated in Datastage using certain routines. They are
-KeyMgtGetNextVal
-KeyMgtGetNextValConn
Or
NextSurrogatekey()
Create a unique counter in datastage
This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.
In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.
Steps
In a DataStage job the easiest way to create a counter is within the Transformer stage with a Stage Variable.svMyCounter = svMyCounter + 1
This simple counter adds 1 each time a row is processed.
The counter can be given a seed value by passing a value in as a job parameter and setting the initial value of svMyCounter to that job parameter.
In a parallel job this simple counter will create duplicate values on each node as the transformer is split into parallel instances. It can be turned into a unique counter by using special parallel macros.
- Create a stage variable for the counter, eg. SVCounter.
- At the Stage Properties form set the Initial Value of the Stage Variable to "@PARTITIONNUM - @NUMPARTITIONS + 1".
- Set the derivation of the stage variable to "svCounter + @NUMPARTITIONS". You can embed this in an IF statement if it is a conditional counter.
Remember this method only works if your data is evenly balanced i.e. equal number of rows going through each partition. Alternative syntax is:
@INROWNUM * @NUMPARTITIONS + @PARTITIONNUM
(numpartion*rwonum)+partinnum-numpartition+1
Example:(2 partitions)
2*1+0-2+1=1
2*1+1-2+1=2
2*2+0-2+1=3
2*2+1-2+1=4
Or
Sequence numbers can be generated in Datastage using certain routines. They are
-KeyMgtGetNextVal
-KeyMgtGetNextValConn
Or
NextSurrogatekey()
While DataStage provides robust parallel processing, Informatica is more flexible for a wider variety of data sources due to its extensive connector library.
ReplyDeleteInformatica Datastage