DataStage Training

DataStage Training

DataStage Tutorial

DataStage Interview Questions

Ping Yahoo
bluehost 3.95 promo

 

DataStage Tutorial 
vps web hosting

DataStage Interview Questions

 

DataStage Jobs:


InfoSphere QualityStage

Investigate, cleanse and manage data quality

IBM® InfoSphere® QualityStage® is a foundational component for your data quality and information governance initiatives. It helps you create and maintain consistent views of key entities including customers, vendors, locations and products. This lets you investigate, cleanse and manage your data.

Use InfoSphere QualityStage to deliver quality data for your big data, business intelligence, data warehousing, application migration and master data management projects.

InfoSphere QualityStage features:

  • High quality data about core business entities – provides capabilities including data profiling, standardization, probabilistic matching and data enrichment
  • Data quality within a unified platform – delivers data quality functions as part of a complete information integration platform
  • Support for information governance – enables cross-organization data quality and capabilities necessary for your information governance policies

InfoSphere DataStage

IBM® InfoSphere® DataStage® integrates data across multiple systems using a high performance parallel framework, and it supports extended metadata management and enterprise connectivity. The scalable platform provides more flexible integration of all types of data, including big data at rest (Hadoop-based) or in motion (stream-based), on distributed and mainframe platforms.

InfoSphere DataStage provides these features and benefits:

Powerful, scalable ETL platform—supports the collection, integration and transformation of large volumes of data, with data structures ranging from simple to complex.

Support for big data and Hadoop—enables you to directly access big data on a distributed file system, and helps clients more efficiently leverage new data sources by providing JSON support and a new JDBC connector.

Near real-time data integration—as well as connectivity between data sources and applications.

Workload and business rules management—helps you optimize hardware utilization and prioritize mission-critical tasks.

Ease of use—helps improve speed, flexibility and effectiveness to build, deploy, update and manage your data integration infrastructure.

Rich support for DB2Z and DB2 for z/OS—including data load optimization for DB2Z and balanced optimization for DB2 on z/OS

IBM InfoSphere Information Server

Helps create and maintain trusted information to support strategic business initiatives including big data, point-of-impact analytics, business intelligences, data warehousing, master data management, and application consolidation and migration.

  • Enables collaboration to bridge the gap between business and IT
  • Helps align business and IT objectives
  • Provides metadata integration & data lineage insight
  • Always-on operational data integration & data quality
  • Linear scalability and infrastructure optimization
  • Broad connectivity to nearly all data sources
  • Productivity tools for organizational efficiency
  • Accelerated data integration deployments
  • Can run natively in a Hadoop cluster
admin on January 19th, 2016

Cloud Computing and IBM InfoSphere Information Server

Trusted, scalable and “Pay as you go” data integration solution running on Amazon EC2

Learn how InfoSphere software’s first Cloud offering expands the opportunities for System Integrator partners and clients wishing to leverage the new pay-as-you-go pricing model with the rapid deployment paradigm and the massive scalability offered by InfoSphere Information Server.

What we offer

IBM InfoSphere Information Server running on Amazon EC2 is a trusted, scalable and “Pay as you go” data integration solution that helps organizations derive more value from the complex, heterogeneous information anywhere. With its rapid deployment paradigm and massive scalability, InfoSphere Information Server on Amazon EC2 reduces the upfront time and expense involved with setting up hardware infrastructure and purchasing software licenses for data integration projects lasting 3-12 months, and maximizes the data throughput of trusted information per hour thereby saving time and money.

Whether you are looking to leverage a cloud platform with data integration for developing a proof-of-concept, on-demand usage or continuous off-premise deployment, the following scenarios can be quickly delivered:

  • Data consolidation services provided by System Integrators (SI’s) to support complex application rationalization and migration projects lasting 3 to 12 months.
  • Flexible development capacity for existing enterprise IT clients using InfoSphere Information Server.
  • Ongoing data preparation for SaaS applications and Business Intelligence (BI) solutions deployed in the Cloud.

Features
The advantages of using IBM InfoSphere Information Server running on Amazon EC2 for delivery of trusted information include:

  • InfoSphere Information Server delivers integrated ETL and Data Quality development environment – enabling developers to cleanse, transform and move data with same tool using the same metadata, connectivity, and shared services.
  • InfoSphere DataStage is a powerful industry-leading data integration solution, built upon a common shared metadata repository with a the ability to scale from discrete projects to supporting massive data delivery in batch and real-time.
  • InfoSphere QualityStage is a comprehensive data quality solution employing an industry leading probabilistic matching engine – that ensures higher quality trusted data when cleansing, standardizing, or linking any data domain across multiple, complex data sources.
  • InfoSphere Information Server has a dynamic parallel execution engine – providing design once, deploy anywhere capability that dynamically and seamlessly scales up-or-down based upon hardware configuration thereby simplifying deployment and administration.
  • Pre-installed InfoSphere Information Server version 8.1 software on Novell SUSE Linux Enterprise Server version 10 (SLES 10 SP2) for the server and Windows 2003 Server for the InfoSphere Developer client licenses – that ensures immediate usage and minimum learning curve for those with InfoSphere development experience.

Benefits

The benefits of leveraging IBM InfoSphere Information Server running on Amazon EC2 for delivery of trusted information include:

  • Flexibility and additional capacity – as new intelligence workloads expand, data integration development teams can be deployed or augmented at a moment’s notice using InfoSphere Information Server on Amazon EC2. As part of a data integration Center of Excellence (COE), any data integration asset is reusable seamlessly across projects or for subsequent enterprise or cloud-based projects.
  • Lower upfront project costs “pay for what you use” – reduce upfront investment and achieve demonstrable ROI before committing to the capital expenditure on hardware and perpetual software licenses for establishing enterprise systems.
  • Simplifies data integration and data quality challenges and maximizes throughput per hour – InfoSphere Information Server accelerates project design and the delivery of trusted data using a collaborative, model-driven design environment coupled with massive scalability of a parallel processing architecture to ensure maximum throughput of trusted data per hour.
  • Comprehensive range of Enterprise and Cloud connectivity options – in addition to Amazon’s tape/drive options for very large data transfers, InfoSphere DataStage supports the majority of Amazon hosted database such as Oracle, MySQL, DB2, flat files, XML, and web services.
admin on January 19th, 2016

Cloud Computing and IBM InfoSphere Information Serve

Cloud computing is the next evolution in IT computing models following the time-tested centralized mainframe model, and the more recent client-server model. It has emerged to address the demand for reduced IT complexity, rapid deployment and consumption-based pricing models. By leveraging on-demand virtualized and dynamic IT infrastructure, cloud computing can help to transform multiple IT projects including accelerating core system renewal or outsourcing projects or business analytics and optimization investments. Whether an organization’s information-led IT transformation is on-premise, or in the cloud, or a mixture of the two, the lack of trusted information can interfere with an organization’s ability to act with speed, agility and success.

 

admin on June 4th, 2015

Data transformation and movement is the process by which source data is selected, converted, and mapped to the format required by targeted systems. The process manipulates data to bring it into compliance with business, domain, and integrity rules and with other data in the target environment. Transformation can take some of the following forms:

  • Aggregation Consolidating or summarizing data values into a single value. Collecting daily sales data to be aggregated to the weekly level is a common example of aggregation.
  • Basic conversion Ensuring that data types are correctly converted and mapped from source to target columns.
  • Cleansing Resolving inconsistencies and fixing the anomalies in source data.
  • Derivation Transforming data from multiple sources by using a complex business rule or algorithm.
  • Enrichment Combining data from internal or external sources to provide additional meaning to the data. Normalizing Reducing the amount of redundant and potentially duplicated data.
  • Combining The process of combining data from multiple sources via parallel Lookup, Join, or Merge operations.
  • Pivoting Converting records in an input stream to many records in the appropriate table in the data warehouse or data mart.
  • Sorting Grouping related records and sequencing data based on data or string values
admin on June 4th, 2015

In its simplest form, IBM InfoSphere DataStage performs data transformation and movement from source systems to target systems in batch and in real time. The data sources might include indexed files, sequential files, relational databases, archives, external data sources, enterprise applications, and message queues.

DataStage manages data that arrives and data that is received on a periodic or scheduled basis. It enables companies to solve large-scale business problems with high-performance processing of massive data volumes. By leveraging the parallel processing capabilities of multiprocessor hardware platforms, DataStage can scale to satisfy the demands of ever-growing data volumes, stringent real-time requirements, and ever-shrinking batch windows.

Leveraging the combined suite of IBM Information Server, DataStage can simplify the development of authoritative master data by showing where and how information is stored across source systems. DataStage can also consolidate disparate data into a single, reliable record, cleanses and standardizes information, removes duplicates, and links records together across systems. This master record can be loaded into operational data stores, data warehouses, or master data applications such as IBM MDM using IBM InfoSphere DataStage.

IBM InfoSphere DataStage delivers four core capabilities:

  • Connectivity to a wide range of mainframe, legacy, and enterprise applications, databases, file formats, and external information sources.
  • Prebuilt library of more than 300 functions including data validation rules and very complex transformations
  • Maximum throughput using a parallel, high-performance processing architecture
  • Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.

In the following sections, we briefly describe the following aspects of IBM InfoSphere DataStage:

  • Data transformation
  • Jobs
  • Parallel processing

Ans. MOLAP tools utilize a pre-calculated data set, commonly referred to as a data cube and MOLAP systems gives fast response. MOLAP systems are generally used for bounded problem set. ROLAP tools are best used for users who have “unbounded” problem set .

ROLAP tools do not use pre-calculated data cubes. Instead, queries the standard relational database and its tables in order to  answer the question and ROLAP systems are comparatively slow.

Hybrid OLAP systems can use capability of both pre-calculated cubes and relational data sources. 

Data in the data mart is refreshed in accordance with the predetermined frequency schedule established by the Project Manager. Within BusinessObjects, a document generated at a given point in time reflects the data as it existed at that time and may be inaccurate now. When you update a document, B O reconnects to the database and retrieves the updated data. This is called refreshing a document.

B O lets you update the data in a document while keeping the same presentation and formatting, either:

  1. manually
  2. automatically at specified times or intervals
  3. by sending the documents to Broadcast Agent, the B O product that manages scheduled processing of documents

every time you open a document.

admin on June 4th, 2015

Ans: Security access levels are assigned to user profiles by the BusinessObjects Administrator using the Supervisor component of the B O tool. Security access can be table or object specific. This allows the B O Administrator to prohibit certain end users from viewing information of a sensitive or critical nature. Universes, and user’s rights to them, are checked against user’s locally stored universe and security files at the beginning of the session.