Pages

Wednesday, February 11, 2015

Prallel Routine

How to create shared C code object and use them in datastage parallel routine?

Hello all, in this post i will give you information on datastage parallel routines and shared C code objects, and how can we create shared C code objects and use them in parallel routines.
Datastage Parallel Routine:
A parallel routine provides you feature to use external functionality written in C code to use in Datastage.
E.g.  Datastage does not provide regular expression functionality. So we can created shared object of regular expression functionality in C and used it in Datastage.
Steps:
1.       Create required function in C.
Here is a simple C function to add two numbers.
No need of Main.
#include
 
int sum_p(int a , int b)
{
c=a+b;
return c;
}
Suppose the name of above program is sum_pk.c

2.       Create shared object /library of the code.
Position Independent Object:
g++34 -fpic -c sum_pk.c
g++: GNU compiler available in Unix. g++34 is version of g++ available on our server.
-c : compiles code and creates object of file
-fpic: creates object with position independent code which is required for shared object/library
Object file with extension  .o will be created as sum_pk.o
a)      Shared Object:
Shared object is created from position independent object file created above.
g++34 -shared -o sum_pk.so  sum_pk.o
sum_pk.so is the shared object file created from sum_pk.o
b)      Shared library:
Shared library is also created from position independent object file created above.
g++34 -shared -o libsum_pk.so sum_pk.o
libsum_pk.so is the shared library file created from sum_pk.o
Shared library Vs Shared Object:
Shared Library
Shared object
A shared library file is linked to job at runtime and must be available at runtime. A shared object file is linked to job at compile time.
Shared library name should start with “lib” and should have “.so” as extension
E.g. libsum_pk.so
No such constraint on shared object.
Shared library should be present in predefined library paths.
E.g.
/opt/IBM/InformationServer/ASBNode/lib/cpp/
is the library path in our datastage installation
No such constraint on shared object.
3.       Creating a parallel routine in Datastage:
  • File>New>Routines>Parallel Routine
  • Fill all the required values as:
Routine Name:  Any name with just alphanumeric characters only. No underscore as well.
External subroutine name: Name of the C function which we want to invoke
Type: External Function
Object Type: Library if you are using shared library or Object if you are using shared object.
Return Type: Return type of the C function
Library path: Library name with complete path
If shared library the path should be
                            /opt/IBM/InformationServer/ASBNode/lib/cpp/
Go to Arguments tab and enter details about the input arguments that the C function/ parallel routine will consume.
Save your routine to required folder.
Usage:
You need a transformer stage to use parallel routine external function in your job.
In Derivation pane of any port in transformer stage:
Right Click > DS_Routines> will call your riutine
E.g. SumRoutinePK(%a%, %b%)
Enter the required input parameters and it will return the required result.
sumRoutinePK(DSLink2.F1, DSLink2.F2)

1 comment: