DataStage Tutorial

DataStage Interview Questions

Ping Yahoo
bluehost 3.95 promo

 

DataStage Tutorial 
vps web hosting

DataStage Interview Questions

 

DataStage Jobs:


admin on May 2nd, 2012

Data Stores:

For the purposes of this section, a data store is a physical piece of disk storage where data is held for a period of time. In DataStage terms, this can be either a table in a database structure or a file contained in a disk directory or catalog
structure. Data held in a database structure is referred to as either a table or a view. In data warehousing, two additional subclasses of table might be used: dimension and fact. Data held in a file in a directory structure is classified according to its type, for example: Sequential File, Parallel Dataset, Lookup File Set, and so on.

The concepts of “source” and “target” can be applied in a couple of ways. Every job in a series of jobs could consider the data it gets in to be a source and the data it writes out as being a target. However, for the sake of this naming convention a source is only data that is extracted from an original system. A target is the data structures that are produced or loaded as the final result of a particular series of jobs. This is based on the purpose of the project: to move data from a source to a target.

Data stores used as temporary structures to land data between jobs, supporting restart and modularity, should use the same names in the originating job and any downstream jobs reading the structure.

 

admin on May 2nd, 2012

Sequencer Object Naming:

In a job Sequencer, links are actually messages. Proceed sequencer links with the class word msg_ followed by the type of message (as examples, fail and unconditional), and followed by the ClassName. The following lists shows some examples:

  • Reception Succeeded Message: msg_ok_Reception
  • Reception Failed Message: msg_fail_Reception
admin on May 2nd, 2012

Stage Names:

DataStage assigns default names to stages as they are dragged onto the Designer canvas. These names are based on the type of stage (Object) and a unique number, based on the order the object was added to the flow. In a job or job sequence, stage names must be unique.

admin on May 2nd, 2012

Links:

In a DataStage job, links are objects that represent the flow of data from one stage to the next. In a job sequence, links represent the flow of a message from one activity or step to the next. In a DataStage job, links are objects that represent the flow of data from one stage to the next. In a job sequence, links represent the flow of a message from one activity or step to the next.

It is particularly important to establish a consistent naming convention for link names, instead of using the default DSLink# (where # is an assigned number). In the graphical Designer environment, stage editors identify links by name. Having a descriptive link name reduces the chance for errors (for example, during link ordering). Furthermore, when sharing data with external applications (for example, through job reporting), establishing standardized link names makes it easier to understand results and audit counts.

To differentiate link names from stage objects, and to identify in captured metadata, the prefix lnk_ is used before the subject name of a link.

The following rules can be used to establish a link name:

  • The link name should define the subject of the data that is being moved.
  • For non-stream links, the link name should include the link type (reference, reject) to reinforce the visual cues of the Designer canvas: 
    • Ref for reference links (Lookup)
    • Rej for reject links (such as Lookup, Merge, Transformer, Sequential File, and Database

The type of movement might optionally be part of the Class Word. As examples:)

  • In for input
  • Out for output
  • Upd for updates
  • Ins for inserts
  • Del for deletes
  • Get for shared container inputs
  • Put for shared container outputAs data is enriched through stages, the same name might be appropriate for multiple links. In this case, specify a unique link name in a particular job or job sequence by including a number. (The DataStage Designer does not require link names on stages to be unique.)

    The following list provides sample link names:

  • Input Transactions: lnk_Txn_In
  • Reference Account Number Rejects: lnk_Account_Ref_Rej
  • Customer File Rejects: lnk_Customer_Rej

 

 

 

admin on May 2nd, 2012

Shared Containers:

Shared containers have the same naming constraints as jobs in that the name can be long but cannot contain underscores, so word capitalization must be used for readability. Shared containers might be placed anywhere in the
repository tree and consideration must be given to a meaningful directory hierarchy. When a shared container is used, a character code is automatically added to that instance of its use throughout the project. It is optional as to whether you decide to change this code to something meaningful.

To differentiate between parallel shared containers and server shared containers, the following class word naming is recommended:

  • Psc = Parallel Shared Container
  • Ssc = Server Edition Shared Container

Note Use of Server Shared Containers is discouraged in a parallel job.

Examples of Shared Container naming are as follows:

  • AuditTrailPsc (original naming as seen in the Category Directory)
  • AuditTrailPscC1 (an instance of use of the previously mentioned shared container)
  • AuditTrailPscC2 (another instance of use of the same shared container)

In the aforementioned examples the characters C1 and the C2 are automatically applied to the Shared Container stage by DataStage Designer when dragged onto the design canvas.

 

 

admin on May 2nd, 2012

DataStage Folder Hierarchy:

The DataStage repository is organized in a folder hierarchy, allowing related objects to be grouped together. Folder names can be long, are alpha numeric and can also contain both spaces and underscores. Therefore, directory names are word capitalized and separated by either an underscore or a space

Information Server 8 maintains the restriction that there can only be a single object of a certain type with a given name.

Object Creation

In Information Server 8, object creation is simplified. To create a new object, right-click the target parent folder, select New and the option for the desired object

Categorization by Functional Module

For a given application or functional module, all objects can be grouped in a single top-level folder, with sub-levels for separate object types, as in Figure 3-6 on page 38. Job names must be unique in a DataStage project, not only in a folder.

Categorization by Developer

In development projects, folders might be created for each developer as their personal sandbox. That is the place where they perform unit test activities on jobs they are developing.

It is the responsibility of each developer to delete unused or obsolete code. The development manager, to whom is assigned the DataStage Manager role, must ensure that projects are not inflated with unused objects (such as jobs, sequences, folders, and table definitions).

Again, object names must be unique in a given project for the given object type. Two developers cannot save a copy of the same job with the same name in their individual sandbox categories. A unique job name must be given.

Table Definition Categories

Table definition stored directly underneath Table Definitions. Its data source
type and data source name properties do not determine names for parent subfolders.

When saving temporary TableDefs (usually created from output link definitions to assist with job creation), developers are prompted for the folder in the “Save Table Definition As” window. The user must pay attention to the folder location, as these objects are no longer stored in the Table Definition category by default.

Jobs and Job Sequences

Job names must begin with a letter and can contain letters, numbers, and underscores only. Because the name can be long, job and job sequence names must be descriptive and should use word capitalization to make them readable.

Jobs and job sequences are all held under the Category Directory Structure, of which the top level is the category Jobs.

A job is suffixed with the class word Job and a job sequence is suffixed with the class word Seq.

The following items are examples of job naming:

  • CodeBlockAggregationJob

CodeBlockProcessingSe

Jobs must be organized under category directories to provide grouping such that a directory should contain a sequence job and all the jobs that are contained in that sequence.

 

 

admin on May 2nd, 2012

Naming Conventions by Object Type:

In this section we describe the object type naming conventions.

Projects

Each DataStage Project is a standalone repository. It might have a one-to-one relationship with an organizations’ project of work. This factor can cause terminology issues especially in teamwork where both business and developers are involved.

The name of a DataStage project is limited to a maximum of 18 characters. The project name can contain alphanumeric characters and underscores.

Projects names must be maintained in unison with source code control. As projects are promoted through source control, the name of the phase and the project name should reflect the version, in the following form:

<Phase>_<ProjectName>_<version>

Project phases

Phase Name

Phase Description

Dev Development
IT Integration Test
UAT User Acceptance Test
Prod Production

admin on May 2nd, 2012

Documentation and Metadata Capture:

One of the major problems with any development effort, whatever tool you use, is maintaining documentation. Despite best intentions, and often due to time constraints, documentation is often something that is left until later or is inadequately implemented. Establishing a standard method of documentation with examples and enforcing this as part of the acceptance criteria is strongly recommended. The use of meaningful naming standards (as outlined in this section) compliments these efforts.

DataStage provides the ability to document during development with the use of meaningful naming standards (as outlined in this section). Establishing standards also eases use of external tools and processes such as InfoSphere Metadata Workbench, which can provide impact analysis, as well as documentation and auditing

admin on April 27th, 2012

Designer Object Layout:

The effective use of naming conventions means that objects need to be spaced appropriately on the DataStage Designer canvas. For stages with multiple links, expanding the icon border can significantly improve readability. This approach takes extra effort at first, so a pattern of work needs to be identified and adopted to help development. The Snap to Grid feature of Designer can improve development speed.

When development is more or less complete, attention must be given to the layout to enhance readability before it is handed over to versioning.

Where possible, consideration must be made to provide DataStage developers with higher resolution screens, as this provides them with more monitor display real-estate. This can help make them more productive and makes their work more easily read.