Filter Stage:
Filter stage is a processing stage used to filter database based on filter condition.
The filter stage is configured by creating expression in the where clause.
Scenario1:Check for empty values in the customer name field.We are reading from sequential file and hence we should check for empty value instead of null.
Scenario 2:Comparing incoming fields.check transaction date falls between strt_dt and end_dt and filter those records.
Input Data:
CUSTOMER_ID | CUSTOMER_NAME | TRANSACTION_DT | STR_DT | END_DT |
1 | UMA | 1/1/2010 | 5/20/2010 | 12/20/2010 |
1 | UMA | 5/28/2011 | 5/20/2010 | 12/20/2010 |
Output:
CUSTOMER_ID | CUSTOMER_NAME | TRANSACTION_DT | STR_DT | END_DT |
1 | UMA | 5/28/2011 | 5/20/2010 | 12/20/2010 |
Reject:
CUSTOMER_ID | CUSTOMER_NAME | TRANSACTION_DT | STR_DT | END_DT |
1 | UMA | 1/1/2010 | 5/20/2010 | 12/20/2010 |
Partition data based on CUSTOMER_ID to make sure all rows with same key values process on the same node.
Actual Reject Data:
Scenario 3:Evaluating input column data
ex:Where CUSTOMER_NAME=’UMA’ AND CUSTOMER_ID=’1′
This covers most filter stage scenarios.
FILTER STAGE WITH REAL TIME EXAMPLE
Filter Stage is used to write the conditions on Columns.
We can write Conditions on any number of columns.
For Example if you have the data like as follows
e_id,e_name,e_sal
1,sam,2000
2,ram,2200
3,pollard,1800
4,ponting,2200
5,sachin,2200
If we need to find who are getting the salary of 2200.
( In real time there will thousands of records at the source)
We can take Sequential file to read the and filter stage for writing Conditions.
And Dataset file to load the data into the Target.
Design as follows: ---
Seq.File---------Filter------------DatasetFile
Open Sequential File And
Read the data.
In filter stage -- Properties -- Write Condition in Where clause as
e_sal=2200
Go to Output -- Drag and Drop
Click Ok
Go to Target Dataset file and give some name to the file and that's it
Compile and Run
You will get the required output in Target file.
If you are trying to write conditions on multiple columns
Write condition in where clause
and give output like=(Link order number ) For EXAMPLE : 1
And Write another condition and select output link =0
( You can get the link order number in link ordering Option)
Design as follows : ----
Compile And Run
You will get the data to the both the Targets.
We can write Conditions on any number of columns.
For Example if you have the data like as follows
e_id,e_name,e_sal
1,sam,2000
2,ram,2200
3,pollard,1800
4,ponting,2200
5,sachin,2200
If we need to find who are getting the salary of 2200.
( In real time there will thousands of records at the source)
We can take Sequential file to read the and filter stage for writing Conditions.
And Dataset file to load the data into the Target.
Design as follows: ---
Seq.File---------Filter------------DatasetFile
Open Sequential File And
Read the data.
In filter stage -- Properties -- Write Condition in Where clause as
e_sal=2200
Go to Output -- Drag and Drop
Click Ok
Go to Target Dataset file and give some name to the file and that's it
Compile and Run
You will get the required output in Target file.
If you are trying to write conditions on multiple columns
Write condition in where clause
and give output like=(Link order number ) For EXAMPLE : 1
And Write another condition and select output link =0
( You can get the link order number in link ordering Option)
Design as follows : ----
Compile And Run
You will get the data to the both the Targets.
No comments:
Post a Comment