Monday, September 8, 2014

Scenario: Get the max salary from data file ( Seq file )


We know that max function is use to get the max value in a column of a table, but here you have to design a datastage job which get the max value from seq file.



Input Seq File :


FIRSTNME,SALARY,WORKDEPT

EILEEN, 0089750.00,E11
EVA, 0096170.00,D21
JOHN, 0080175.00,E01
CHRISTINE, 0152750.00,A00
SALLY, 0098250.00,C01
IRVING, 0072250.00,D11
THEODORE, 0086150.00,E21
MICHAEL, 0094250.00,B01



Output File
0152750.00


Solution Design :

a) Job Design :  


   Below is the design which can achieve the output as we needed. In this design, we are reading the data from flat file, generating a dummy column in column generator stage, doing aggregate on that and sending the output to seq file.






b) Column Generator Stage :
      In column generator stage, we will generate a column DUMMY having value 'X' for all the rows which we are reading from the seq file

For generating value 'X' for all the rows, need to change extended metadata for the column DUMMY. Set generator for Cycle algorithm with value 'X'

Map the Salary and DUMMY column to output of Column Generator stage
c) Aggregator Stage :

In aggregator stage, we will aggregate on the DUMMY column and calculate the MAX of Salary.
and Map the Max Salary to Output .

No comments: