Join Stage


MULTIPLE JOIN STAGES TO JOIN THREE TABLES

If we have three tables to join and we don't have same key column in all the tables to

join the tables using one join stage.

In this case we can use Multiple join stages to join the tables.

You can take sample data as below

soft_com_1
e_id,e_name,e_job,dept_no
001,james,developer,10
002,merlin,tester,20
003,jonathan,developer,10
004,morgan,tester,20
005,mary,tester,20


soft_com_2
dept_no,d_name,loc_id
10,developer,200
20,tester,300

soft_com_3
loc_id,add_1,add_2
10,melbourne,victoria
20,brisbane,queensland



Take Job Design as below




Read and load the data in three sequential files.

In first Join stage ,

Go to Properties ----Select Key column as Deptno

and you can select Join type = Inner

Drag and drop the required columns in Output

Click Ok

In Second Join Stage

Go to Properties ---- Select Key column as loc_id

and you can select Join type = Inner

Drag and Drop the required columns in the output

Click ok

Give file name to the Target file, That's it

Compile and Run the Job

You can Learn more on Join Stage with example here



JOIN STAGE WITHOUT COMMON KEY COLUMN

If we like to join the tables using Join stage , we need to have common key

columns in those tables. But some times we get the data without common key column.

In that case we can use column generator to create common column in both the

tables.


Read and load the data in Seq. Files

Go to Column Generator to create column and sample data.

In properties select name to create.

and Drag and Drop the columns into the target

Now Go to the Join Stage and select Key column which we have created( You can give

any name, based on business requirement you can give understandable name)

In Output Drag and Drop all required columns

Give File name to Target File. Than

Compile and Run the Job.

Sample Tables You can take as below


Table1

e_id,e_name,e_loc
100,andi,chicago
200,borny,Indiana
300,Tommy,NewYork



Table2

Bizno,Job
20,clerk
30,salesman


INNER JOIN IN JOIN STAGE WITH EXAMPLE

If we have a Source data as below

xyz1 (Table 1 )

e_id,e_name,e_add
1,tim,la
2,sam,wsn
3,kim,mex
4,lin,ind
5,elina,chc



xyz2 (Table 2 )


e_id,address
1,los angeles
2,washington
3,mexico
4,indiana
5,chicago



We need the output as a

e_id, e_name,address
1,tim,los angeles
2,sam,washington
3,kim,meixico
4,lin,indiana
5,elina,chicago





Take job design as below








Read and Load the both the sourc tables in seq. files



And go to Join stage properties

Select Key column as e_id

JOIN Type = Inner

In Out put Column Drag and Drop Required Columns to go to output file and click ok.

Give file name for Target dataset and then

Compile and Run the Job . You will get the Required Output in the Target File.

Join stages and its types explained:

Inner Join:
Say if we have duplicates in left table on key field? What will happen?

We all get all matching records. We will get all matching Duplicates all well here is the table Representation of join.




LeftOuter Join:

All the records from left table and all matching records. If we dont exists in the right table it will be populated with nulls.



Right Outer Join:

All the records from right table and all matching records.

Full Outer Join:

All records and all matching records:







Difference Between lookup/join/merge

2 comments:

Unknown said...

Join stage is one of the typical stage that use to have . But mostly we will be working all complicated jobs with this Join Stage . In this blog , we got all tips and examples to get the work done easily.
Thanks for sharing with us.

Unknown said...

Hi, I have a requirement where I have a loop from which list of table names are passed as parameter to the parallel job. In the parallel job i need to perform join on 2 tables (one of the tables is being passed through loop). In this scenario since I am using RCP to get the columns from the table (passed from loop), how will I mention the key column on which join has to be performed? Any help is appreciated.