Tuesday, September 2, 2014

Sequential file with Duplicate Records

Sequential file with Duplicate Records

A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6

In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How will you do it 
 
Sol1:
1. Introduce a sort stage very next to sequential file,
2. Select a property (key change column) in sort stage and you can assign 0-Unique or 1- duplicate or viceversa as you wish.
3. Put a filter or transformer next to it and now you have unique in 1 link and duplicates in other link.  

Sol2:(Should check though)
First of all take a source file then connect it to copy stage. Then, 1 link is connected to the aggregator stage and another link is connected to the lookup stage or join stage. In Aggregator stage using the count function, Calculate how many times the values are repeating in the key column.

After calculating that it is connected to the filter stage where we filter the cnt=1(cnt is new column for repeating rows).
Then the o/p from the filter is connected to the lookup stage as reference. In the lookup stage LOOKUP FAILURE=REJECT.

Then place two output links for the lookup, One collects the non-repeated values and another collects the repeated values in reject link.

No comments: