Introduction
With the release of Information Server/DataStage 11.3 a few weeks
ago, most DataStage developers are interested in knowing exactly what
new features have surfaced and how they can best be leveraged. With the
release of version 8.7, IBM introduced the Operations Console and
version 9.1 followed in-line with the release of the Workload Manager.
I’m afraid that DataStage developers don’t have anything too exciting to
look forward to in version 11.3. There are definitely some nifty new
features tacked on the suite from the standpoint of data governance,
metadata management, and administration, but this post will review just
the new features in DataStage.
There might be some hidden new features or “features” which aren’t
documented. Feel free to comment below on what you think they might be.
Hierarchical Data Stage
Remember how the XML stage
was pretty recently introduced for all XML processing in DataStage?
Well now it has been relabeled as the Hierarchical Data stage, I suppose
to account for its ability to process all types of Hierarchical Data
(JSON) as opposed to strictly being limited to XML. This stage also has
some additional functionality which wasn’t previously available. If you
are familiar with this stage (Hierarchical Data/XML) you will know it
has various steps which are added in the Assembly Editor, for a sequence
of processing events. There are now three new steps:
- REST – Invokes a RESTful web service
- JSON_Parser – Parse JSON content with a selected type
- JSON_Composer – Compose JSON content with a selected type
Big Data File Stage
The
Big Data File stage
is used to read and write to files on Hadoop (HDFS). The Big Data File
stage is now compatible with Hortonworks 2.1, Cloudera 4.5, and
InfoSphere BigInsights 3.0.
Greenplum Connector Stage
You can now use the Greenplum Connector stage for a native connection
for accessing data which is located in a Greenplum database. You can
now also import Table Definitions using the Greenplum Connector
framework.
InfoSphere Master Data Management Connector Stage
The Master Data Management Connector stage can be used to read and
write data from the IBM master data management solution – InfoSphere
MDM. This stage can be configured for Member read and Member write
interactions from the MDM server.
Amazon S3 Connector Stage
Amazon S3 (Simple Storage Service) is a cheap cloud file storage
system which offers availability through web services (REST, SOAP, and
BitTorrent). It offers scalability, high availability, and low latency
at extremely competitive prices. The Amazon S3 Connector stage be can
used to read and write data residing in Amazon S3.
Unstructured Data Stage – Microsoft Excel (.xls and .xlsx)
The
Unstructured Data stage was
first introduced in DataStage v9.1 and was used to read Excel files
through a native interface. Previously, Excel data was staged as a .csv
file or accessed through ODBC. The stage can also now be used to write
data to Excel files.
Sort Stage Optimization
The Sort stage now tries to optimize your DataStage sort operations
by converting length bounded columns to variable length before the sort
and then converts it back to a length bounded column after the sort.
When a record’s actual size of data is smaller than the defined upper
bound, the optimization will result in reduced disk I/O.
Improved Flexibility in Record Delimiting
The Sequential File stage now gives developers more flexibility with
how a source flat file has to be delimited. A new environment variable,
APT_IMPORT_HANDLE_SHORT,
can be set to enable the import operator the ability the read in
records which do not contain all of the fields defined in the import
schema. Previously, these records were rejected by the stage. The values
assigned to any missing field depends on the data type and nullability.
Operations Console/Workload Management
IBM lists the Operations Console and Workload Management as new
features of the 11.3 release documentation, even though these components
have already been introduced in previous releases. Both components are
now part of the base Information Server installation and Workload
Management is now by default enabled.
Its a quite an informative post. It really narrows down the changes in Datastage 11.3, for a developer.
ReplyDeleteThanks & keep posting