Surrogate_Key_Stage



Surrogate Key  Importance:

SURROGATE KEY IN DATASTAGE

Surrogate Key is a unique identification key. It is alternative to natural key .

And in natural key, it may have alphanumeric composite key but the surrogate is

always single numeric key.

Surrogate key is used to generate key columns, for which characteristics can be

specified. The surrogate key generates sequential incremental and unique integers for a

provided start point. It can have a single input and a single output link.



WHAT IS THE IMPORTANCE OF OF SURROGATE KEY
Surrogate Key is a Primary Key for a dimensional table. ( Surrogate key is alternate to Primary Key) The most importance of using Surrogate key is not affected by the changes going on with a database.

And in Surrogate Key Duplicates are allowed, where it cant be happened in the Primary Key .

By using Surrogate key we can continue the sequence for any jobs. If any job was aborted at the n records loaded.. By using surrogate key you can continue the sequence from n+1.



Surrogate Key Generator:

The Surrogate Key Generator stage is a processing stage that generates surrogate key columns and maintains the key source.
A surrogate key is a unique primary key that is not derived from the data that it represents, therefore changes to the data will not change the primary key. In a star schema database, surrogate keys are used to join a fact table to a dimension table.
surrogate key generator stage uses:
  • Create or delete the key source before other jobs run
  • Update a state file with a range of key values
  • Generate surrogate key columns and pass them to the next stage in the job
  • View the contents of the state file
Generated keys are 64 bit integers and the key source can be stat file or database sequence.
Creating the key source
Drag the surrogate key stage from palette to parallel job canvas with no input and output links.
Double click on the surrogate key stage and click on properties tab.
Properties:
Key Source Action = create
Source Type : FlatFile or Database sequence(in this case we are using FlatFile)
When you run the job it will create an empty file.
If you want to the check the content change the View Stat File = YES and check the job log for details.
skey_genstage,0: State file /tmp/skeycutomerdim.stat is empty.
if you try to create the same file again job will abort with the following error.
skey_genstage,0: Unable to create state file /tmp/skeycutomerdim.stat: File exists.
Deleting the key source:
Updating the stat File:
To update the stat file add surrogate key stage to the job with single input link from other stage.
We use this process to update the stat file if it is corrupted or deleted.
1)open the surrogate key stage editor and go to the properties tab.
If the stat file exists we can update otherwise we can create and update it.
We are using SkeyValue parameter to update the stat file using transformer stage.
Generating Surrogate Keys:
Now we have created stat file and will generate keys using the stat key file.
Click on the surrogate keys stage and go to properties add add type a name for the surrogate key column in the Generated Output Column Name property

Go to ouput and define the mapping like below.
Rowgen we are using 10 rows and hence when we run the job we see 10 skey values in the output.
I have updated the stat file with 100 and below is the output.
If you want to generate the key value from begining you can use following property in the surrogate key stage.
  1. If the key source is a flat file, specify how keys are generated:
    • To generate keys in sequence from the highest value that was last used, set the Generate Key from Last Highest Value property to Yes. Any gaps in the key range are ignored.
    • To specify a value to initialize the key source, add the File Initial Value property to the Options group, and specify the start value for key generation.
    • To control the block size for key ranges, add the File Block Size property to the Options group, set this property toUser specified, and specify a value for the block size.
  2. If there is no input link, add the Number of Records property to the Options group, and specify how many records to generate.

2 comments:

Anitha said...

This article is really helpful for me. I am regular visitor to this blog. Share such kind of article more in future. Personally i like this article a lot and you can have a look at my services also: I was seriously search for a Salesforce training institutes in ameerpet which offer job assistance and Salesforce training institutes in Hyderabad who are providing certification material. It's worth to join Salesforce training institutes in India because of their real time projects material and 24x7 support from customer desk. You can easily find the best Salesforce training institutes in kukatpally kphb which are also a part of Pega training institutes in hyderabad. This is amazing to join Data science training institutes in ameerpet who are quire popular with Selenium training institutes in ameerpet and trending coureses like Java training institutes in ameerpet and data science related programming coures python training institutes in ameerpet If you want HCM course then this workday training institutes in ameerpet is best for you to get job on workday.

Surrogate Finder said...

Hi dear,

Thank you for this wonderful post. It is very informative and useful. I would like to share something here too. If you are facing problems in having a child and wants to be the parents of one soon then surrogacy is the best option for that. Surrogate Finder is the organisation who helps the people who are facing these kinds of probems. Surrogatefinder is a very unique database, containing a world wide registar of couples and singles thet want or would like to give the chance of birth. you wont find another site or database in the world that offers this service on such a large scale.For further inquiries please do visit the website mention in the comment below.


surrogate services