Sequential File:
The Sequential File stage is a file stage. It allows you to read data from or write
data to one or more flat files as shown in Below Figure:
data to one or more flat files as shown in Below Figure:
The stage executes in parallel mode by default if reading multiple files but executes sequentially if it is only reading one file.
In order read a sequential file datastage needs to know about the format of the file.
If you are reading a delimited file you need to specify delimiter in the format tab.
Reading Fixed width File:
Double click on the sequential file stage and go to properties tab.
Source:
File:Give the file name including path
Read Method:Whether to specify filenames explicitly or use a file pattern.
Important Options:
First Line is Column Names:If set true, the first line of a file contains column names on writing and is ignored on reading.
Keep File Partitions:Set True to partition the read data set according to the organization of the input file(s).
Reject Mode: Continue to simply discard any rejected rows; Fail to stop if any row is rejected; Output to send rejected rows down a reject link.
For fixed-width files, however, you can configure the stage to behave differently:
* You can specify that single files can be read by multiple nodes. This can improve performance on cluster systems.
* You can specify that a number of readers run on a single node. This means, for example, that a single file can be partitioned as it is read.
* You can specify that single files can be read by multiple nodes. This can improve performance on cluster systems.
* You can specify that a number of readers run on a single node. This means, for example, that a single file can be partitioned as it is read.
These two options are mutually exclusive.
Scenario 1:
Reading file sequentially.
Scenario 2:
Read From Multiple Nodes = Yes
Once we add Read From Multiple Node = Yes then stage by default executes in Parallel mode.
If you run the job with above configuration it will abort with following fatal error.
sff_SourceFile: The multinode option requires fixed length records.(That means you can use this option to read fixed width files only)
In order to fix the above issue go the format tab and add additions parameters as shown below.
Now job finished successfully and please below datastage monitor for performance improvements compare with reading from single node.
Scenario 3:Read Delimted file with By Adding Number of Readers Pernode instead of multinode option to improve the read performance and once we add this option sequential file stage will execute in default parallel mode.
If we are reading from and writing to fixed width file it is always good practice to add APT_STRING_PADCHAR Datastage Env variable and assign 0×20 as default value then it will pad with spaces ,otherwise datastage will pad null value(Datastage Default padding character).
Always Keep Reject Mode = Fail to make sure datastage job will fail if we get from format from source systems.
Sequential File Best Performance Settings/Tips
Sequential File Best Performance Settings/Tips
7 comments:
This is so far the best resource that i come across.keep it up dude.Make it as a reference of bible for DS and help us.
I completely agree deepaks , great effort
Wonderful resource for DS guys .. Great job keep it up
Is it possible to write unix scripting in this sequential file stage? If yes,please mention how and which scenario?
Exact same copy of information at following link:
https://datastage4u.wordpress.com/2011/04/26/reading-file-using-sequential-file-stage/
In filter option,you can write it
Best content for datastage
Post a Comment