Pivot enterprise stage is a processing stage which pivots data
vertically and horizontally depending upon the requirements. There are
two types
Let’s try to understand it one by one with following example.
Step 1: Design Your Job Structure Like below.
Configure above table with input sequential stage ‘se_product_clr_det’.
Step 2: Let’s configure ‘Pivot enterprise stage’. Double click on it. Following window will pop up.
Select ‘Horizontal’ for Pivot Type from drop-down menu under ‘Properties’ tab for horizontal Pivot operation.
Step 3: Click on‘Pivot Properties’ tab. Under which we need to check box against ‘Pivot Index’. After which column of name ‘Pivot_Index’ will appear under ‘Name’ column also declare a new column of name ’Color’ as shown below.
Step 4: Now we have to mention columns to be pivoted under ‘Derivation’ against column ‘Color’. Double click on it. Following Window will pop up.
Select columns to be pivoted from ‘Available column’ pane as shown. Click ‘OK’.
Step 5: Under ‘Output’ tab, only map pivoted column as shown.
Configure output stage. Give the file path. See below image for reference.
Step 6: Compile and Run the job. Let’s see what is happen to the output.
This is how we can set multiple input columns to the single column (As here for colors).
Vertical Pivot Operation:
Here, we are going to use ‘Pivot Enterprise’ stage to vertically pivot data. We are going to set multiple input rows to a single row. The main advantage of this stage is we can use aggregation functions like avg, sum, min, max, first, last etc. for pivoted column. Let’s see how it works.
Consider an output data of Horizontal Operation as input data for the Pivot Enterprise stage. Here, we will be adding one extra column for aggregation function as shown in below table.
Let’s study for vertical pivot operation step by step.
Step 1: Design your job structure like below. Configure above table data with input sequential file ‘se_product_det’.
Step 2: Open Pivot Enterprise stage and select Pivot type as vertical under properties tab.
Step 3: Under Pivot Properties tab minimum one pivot column and one group by column. Here, we declared Product as group by column. Color and prize as Pivot columns.Lets see how to use ‘Aggregation functions’ in next step.
Step 4: On clicking Aggregation functions required for this column for particular column following window will pop up. In which we can select functions whichever required for that particular column. Here we are using ‘min’, ’max’ and ‘average’ functions with proper precision and scale for Prize column as shown.
Step 5: Now we just have to do mapping under output tab as shown below.
Step 6: compile and Run the job. Lets see what will be the output is.
Output :
2) By using a pivot stage, we can convert 10 rows into 100 columns and 100 columns into 10 rows
3) You can add more points here!!
Let me first tell you that a Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as NORMALIZATION. Another fact about the Pivot stage is that it's irreplaceable i.e no other stage has this functionality of converting columns into rows!!! So , that makes it unique, doesn't!!!
Let's cover how exactly it does it....
For example, lets take a file with the following fields: Item, Quantity1, Quantity2, Quantity3....
Item~Quantity1~Quantity2~Quantity3
ABC~100~1000~10000
DEF~200~2000~20000
GHI~300~3000~30000
Basically you would use a pivot stage when u need to convert those 3 Quantity fields into a single field whch contains a unique Quantity value per row...i.e. You would need the following output
Item~Quantity
ABC~100
ABC~1000
ABC~10000
DEF~200
DEF~2000
DEF~20000
GHI~300
GHI~3000
GHI~30000
How to achieve the above in Datastage???
In this case our source would be a flat file. Read it using any file stage of your choice: Sequential file stage, File set stage or Dataset stage. Specify 4 columns in the Output column derivation tab.
Now connect a Pivot stage from the Tool pallette to the above output link and create an output link for the Pivot stage itself (fr enabling the Output tab for the pivot stage).
Unlike other stages, a pivot stage doesn't use the generic GUI stage page. It has a stage page of its own. And by default the Output columns page would not have any fields. Hence, you need to manually type in the fields. In this case just type in the 2 field names : Item and Quantity. However manual typing of the columns becomes a tedious process when the number of fields is more. In this case you can use the Metadata Save - Load feature. Go the input columns tab of the pivot stage, save the table definitions and load them in the output columns tab. This is the way I use it!!!
Now, you have the following fields in the Output Column's tab...Item and Quantity....Here comes the tricky part i.e you need to specify the DERIVATION ....In case the field names of Output columns tab are same as the Input tab, you need not specify any derivation i.e in this case for the Item field, you need not specify any derivation. But if the Output columns tab has new field names, you need to specify Derivation or you would get a RUN-TIME error for free....
For our example, you need to type the Derivation for the Quantity field as
Column name Derivation
Item Item (or you can leave this blank)
Quantity Quantity1, Quantity2, Quantity3.
Just attach another file stage and view your output!!! So, objective met!!!
- Horizontal
- Vertical
Let’s try to understand it one by one with following example.
- Horizontal Pivot Operation.
Product Type
|
Color_1
|
Color_2
|
Color_3
|
Pen
|
Yellow
|
Blue
|
Green
|
Dress
|
Pink
|
Yellow
|
Purple
|
Configure above table with input sequential stage ‘se_product_clr_det’.
Step 2: Let’s configure ‘Pivot enterprise stage’. Double click on it. Following window will pop up.
Select ‘Horizontal’ for Pivot Type from drop-down menu under ‘Properties’ tab for horizontal Pivot operation.
Step 3: Click on‘Pivot Properties’ tab. Under which we need to check box against ‘Pivot Index’. After which column of name ‘Pivot_Index’ will appear under ‘Name’ column also declare a new column of name ’Color’ as shown below.
Step 4: Now we have to mention columns to be pivoted under ‘Derivation’ against column ‘Color’. Double click on it. Following Window will pop up.
Select columns to be pivoted from ‘Available column’ pane as shown. Click ‘OK’.
Step 5: Under ‘Output’ tab, only map pivoted column as shown.
Configure output stage. Give the file path. See below image for reference.
Step 6: Compile and Run the job. Let’s see what is happen to the output.
This is how we can set multiple input columns to the single column (As here for colors).
Vertical Pivot Operation:
Here, we are going to use ‘Pivot Enterprise’ stage to vertically pivot data. We are going to set multiple input rows to a single row. The main advantage of this stage is we can use aggregation functions like avg, sum, min, max, first, last etc. for pivoted column. Let’s see how it works.
Consider an output data of Horizontal Operation as input data for the Pivot Enterprise stage. Here, we will be adding one extra column for aggregation function as shown in below table.
Product
|
Color
|
Prize
|
Pen
|
Yellow
|
38
|
Pen
|
Blue
|
43
|
Pen
|
Green
|
25
|
Dress
|
Pink
|
1000
|
Dress
|
Yellow
|
695
|
Dress
|
purple
|
738
|
Step 1: Design your job structure like below. Configure above table data with input sequential file ‘se_product_det’.
Step 2: Open Pivot Enterprise stage and select Pivot type as vertical under properties tab.
Step 3: Under Pivot Properties tab minimum one pivot column and one group by column. Here, we declared Product as group by column. Color and prize as Pivot columns.Lets see how to use ‘Aggregation functions’ in next step.
Step 4: On clicking Aggregation functions required for this column for particular column following window will pop up. In which we can select functions whichever required for that particular column. Here we are using ‘min’, ’max’ and ‘average’ functions with proper precision and scale for Prize column as shown.
Step 5: Now we just have to do mapping under output tab as shown below.
Step 6: compile and Run the job. Lets see what will be the output is.
Output :
One more approach:
Many people have the following misconceptions about Pivot stage.
1) It converts rows into columns 2) By using a pivot stage, we can convert 10 rows into 100 columns and 100 columns into 10 rows
3) You can add more points here!!
Let me first tell you that a Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as NORMALIZATION. Another fact about the Pivot stage is that it's irreplaceable i.e no other stage has this functionality of converting columns into rows!!! So , that makes it unique, doesn't!!!
Let's cover how exactly it does it....
For example, lets take a file with the following fields: Item, Quantity1, Quantity2, Quantity3....
Item~Quantity1~Quantity2~Quantity3
ABC~100~1000~10000
DEF~200~2000~20000
GHI~300~3000~30000
Basically you would use a pivot stage when u need to convert those 3 Quantity fields into a single field whch contains a unique Quantity value per row...i.e. You would need the following output
Item~Quantity
ABC~100
ABC~1000
ABC~10000
DEF~200
DEF~2000
DEF~20000
GHI~300
GHI~3000
GHI~30000
How to achieve the above in Datastage???
In this case our source would be a flat file. Read it using any file stage of your choice: Sequential file stage, File set stage or Dataset stage. Specify 4 columns in the Output column derivation tab.
Now connect a Pivot stage from the Tool pallette to the above output link and create an output link for the Pivot stage itself (fr enabling the Output tab for the pivot stage).
Unlike other stages, a pivot stage doesn't use the generic GUI stage page. It has a stage page of its own. And by default the Output columns page would not have any fields. Hence, you need to manually type in the fields. In this case just type in the 2 field names : Item and Quantity. However manual typing of the columns becomes a tedious process when the number of fields is more. In this case you can use the Metadata Save - Load feature. Go the input columns tab of the pivot stage, save the table definitions and load them in the output columns tab. This is the way I use it!!!
Now, you have the following fields in the Output Column's tab...Item and Quantity....Here comes the tricky part i.e you need to specify the DERIVATION ....In case the field names of Output columns tab are same as the Input tab, you need not specify any derivation i.e in this case for the Item field, you need not specify any derivation. But if the Output columns tab has new field names, you need to specify Derivation or you would get a RUN-TIME error for free....
For our example, you need to type the Derivation for the Quantity field as
Column name Derivation
Item Item (or you can leave this blank)
Quantity Quantity1, Quantity2, Quantity3.
Just attach another file stage and view your output!!! So, objective met!!!
1 comment:
yo man
in vertical to horizontal
snap shots are not clear
and i think u didn't mention group by and pivot check option
just look into the pics
and rectify.
Post a Comment