Monday, September 8, 2014

Dummy Data Generation using Row Generator



By default the Row Generator stage runs sequentially, generating data in a single partition. You can, however, configure it to run in parallel and meaningful data.



a) Job Design :








b) RowGenerator Stage :

- Double click on Stage
- Fill the properties tab, Fill the No of Rows  you want to generate. ( Here I filled 50 )





- Now, clock on column tab and define the column you needed on o/p file, Need to define Column Name, type, length etc.



-- Now Here we took our Magic Step :-)  We will edit the Meta Data of Column


-- When you double click on Column 1 (Name), This Window will open, As you can see, we can edit a lot of metadata of a column here.


-- We want to generate some meaningful data, so we use the Generator properties here.
Here we selected Algorithm ( of data generation ) is Cycle which repeat the data from start to end. and we have passed some values ( Names ) 


-- Same we will follow for Second Coulmn ( Salary )




- And then Click OK, we are done with Column Generator Stage.

c) Seq File Stage

- Define the Seq File stage properties here, Like, O/p file location, delimiter, column name, quotes etc in o/p file.
- and Keep it all other tab as it is.



- Now Save the job design,Compile and RUN.

Output File will look like below -

 
Here, as we can see, Name is repeating in a Cycle again n again and Salary column also follow the same. So here we have some meaningful data for our dummy job test.

No comments: