MULTIPLE JOIN STAGES TO JOIN THREE TABLES
If we have three tables to join and we don't have same key column in all the tables to
join the tables using one join stage.
In this case we can use Multiple join stages to join the tables.
You can take sample data as below
soft_com_1
e_id,e_name,e_job,dept_no
001,james,developer,10
002,merlin,tester,20
003,jonathan,developer,10
004,morgan,tester,20
005,mary,tester,20
soft_com_2
dept_no,d_name,loc_id
10,developer,200
20,tester,300
soft_com_3
loc_id,add_1,add_2
10,melbourne,victoria
20,brisbane,queensland
Take Job Design as below
Read and load the data in three sequential files.
In first Join stage ,
Go to Properties ----Select Key column as Deptno
and you can select Join type = Inner
Drag and drop the required columns in Output
Click Ok
In Second Join Stage
Go to Properties ---- Select Key column as loc_id
and you can select Join type = Inner
Drag and Drop the required columns in the output
Click ok
Give file name to the Target file, That's it
Compile and Run the Job
You can Learn more on Join Stage with example here
join the tables using one join stage.
In this case we can use Multiple join stages to join the tables.
You can take sample data as below
soft_com_1
e_id,e_name,e_job,dept_no
001,james,developer,10
002,merlin,tester,20
003,jonathan,developer,10
004,morgan,tester,20
005,mary,tester,20
soft_com_2
dept_no,d_name,loc_id
10,developer,200
20,tester,300
soft_com_3
loc_id,add_1,add_2
10,melbourne,victoria
20,brisbane,queensland
Take Job Design as below
Read and load the data in three sequential files.
In first Join stage ,
Go to Properties ----Select Key column as Deptno
and you can select Join type = Inner
Drag and drop the required columns in Output
Click Ok
In Second Join Stage
Go to Properties ---- Select Key column as loc_id
and you can select Join type = Inner
Drag and Drop the required columns in the output
Click ok
Give file name to the Target file, That's it
Compile and Run the Job
You can Learn more on Join Stage with example here
JOIN STAGE WITHOUT COMMON KEY COLUMN
If we like to join the tables using Join stage , we need to have common key
columns in those tables. But some times we get the data without common key column.
In that case we can use column generator to create common column in both the
tables.
Read and load the data in Seq. Files
Go to Column Generator to create column and sample data.
In properties select name to create.
and Drag and Drop the columns into the target
Now Go to the Join Stage and select Key column which we have created( You can give
any name, based on business requirement you can give understandable name)
In Output Drag and Drop all required columns
Give File name to Target File. Than
Compile and Run the Job.
Sample Tables You can take as below
Table1
e_id,e_name,e_loc
100,andi,chicago
200,borny,Indiana
300,Tommy,NewYork
Table2
Bizno,Job
20,clerk
30,salesman
columns in those tables. But some times we get the data without common key column.
In that case we can use column generator to create common column in both the
tables.
Read and load the data in Seq. Files
Go to Column Generator to create column and sample data.
In properties select name to create.
and Drag and Drop the columns into the target
Now Go to the Join Stage and select Key column which we have created( You can give
any name, based on business requirement you can give understandable name)
In Output Drag and Drop all required columns
Give File name to Target File. Than
Compile and Run the Job.
Sample Tables You can take as below
Table1
e_id,e_name,e_loc
100,andi,chicago
200,borny,Indiana
300,Tommy,NewYork
Table2
Bizno,Job
20,clerk
30,salesman
INNER JOIN IN JOIN STAGE WITH EXAMPLE
If we have a Source data as below
xyz1 (Table 1 )
e_id,e_name,e_add
1,tim,la
2,sam,wsn
3,kim,mex
4,lin,ind
5,elina,chc
xyz2 (Table 2 )
e_id,address
1,los angeles
2,washington
3,mexico
4,indiana
5,chicago
We need the output as a
e_id, e_name,address
1,tim,los angeles
2,sam,washington
3,kim,meixico
4,lin,indiana
5,elina,chicago
Take job design as below
Read and Load the both the sourc tables in seq. files
And go to Join stage properties
Select Key column as e_id
JOIN Type = Inner
In Out put Column Drag and Drop Required Columns to go to output file and click ok.
Give file name for Target dataset and then
Compile and Run the Job . You will get the Required Output in the Target File.
Join stages and its types explained:
Inner Join:
LeftOuter Join:
All the records from left table and all matching records. If we dont exists in the right table it will be populated with nulls.
Right Outer Join:
All the records from right table and all matching records.
Full Outer Join:
All records and all matching records:
Difference Between lookup/join/merge
xyz1 (Table 1 )
e_id,e_name,e_add
1,tim,la
2,sam,wsn
3,kim,mex
4,lin,ind
5,elina,chc
xyz2 (Table 2 )
e_id,address
1,los angeles
2,washington
3,mexico
4,indiana
5,chicago
We need the output as a
e_id, e_name,address
1,tim,los angeles
2,sam,washington
3,kim,meixico
4,lin,indiana
5,elina,chicago
Take job design as below
Read and Load the both the sourc tables in seq. files
And go to Join stage properties
Select Key column as e_id
JOIN Type = Inner
In Out put Column Drag and Drop Required Columns to go to output file and click ok.
Give file name for Target dataset and then
Compile and Run the Job . You will get the Required Output in the Target File.
Join stages and its types explained:
Inner Join:
Say if we have duplicates in left table on key field? What will happen?
We all get all matching records. We will get all matching Duplicates all well here is the table Representation of join.
LeftOuter Join:
All the records from left table and all matching records. If we dont exists in the right table it will be populated with nulls.
Right Outer Join:
All the records from right table and all matching records.
Full Outer Join:
All records and all matching records:
Difference Between lookup/join/merge
Join stage is one of the typical stage that use to have . But mostly we will be working all complicated jobs with this Join Stage . In this blog , we got all tips and examples to get the work done easily.
ReplyDeleteThanks for sharing with us.
Hi, I have a requirement where I have a loop from which list of table names are passed as parameter to the parallel job. In the parallel job i need to perform join on 2 tables (one of the tables is being passed through loop). In this scenario since I am using RCP to get the columns from the table (passed from loop), how will I mention the key column on which join has to be performed? Any help is appreciated.
ReplyDelete