Data Flow in the Data Warehouse (SAP Library

Data Flow in the Data Warehouse

The data flow in the Data Warehouse describes which objects are needed at design time and which objects are needed at runtime to transfer data from a source to BI and cleanse, consolidate and integrate the data so that it can be used for analysis, reporting and possibly for planning. The individual requirements of your company processes are supported by numerous ways to design the data flow. You can use any data sources that transfer the data to BI or access the source data directly, apply simple or complex cleansing and consolidating methods, and define data repositories that correspond to the requirements of your layer architecture.

With SAP NetWeaver 7.0, the concepts and technologies for certain elements in the data flow were changed. The most important components of the new data flow are explained below, whereby mention is also made of the changes in comparison to the past data flow. To distinguish them from the new objects, the objects previously used are appended with 3.x.

Data Flow in SAP NetWeaver 7.0

The following graphic shows the data flow in the Data Warehouse:

This graphic is explained in the accompanying text

In BI, the metadata description of the source data is modeled with DataSources. A DataSource is a set of fields that are used to extract data of a business unit from a source system and transfer it to the entry layer of the BI system or provide it for direct access.

There is a new object concept available for DataSources in BI. In BI, the DataSource is edited or created independently of 3.x objects on a unified user interface. When the DataSource is activated, the system creates a PSA table in the Persistent Staging Area (PSA), the entry layer of BI. In this way the DataSource represents a persistent object within the data flow.

Before data can be processed in BI, it has to be loaded into the PSA using an InfoPackage. In the InfoPackage, you specify the selection parameters for transferring data into the PSA. In the new data flow, InfoPackages are only used to load data into the PSA.

Using the transformation, data is copied from a source format to a target format in BI. The transformation process thus allows you to consolidate, cleanse, and integrate data. In the data flow, the transformation replaces the update and transfer rules, including transfer structure maintenance. In the transformation, the fields of a DataSource are also assigned to the InfoObjects of the BI system.

InfoObjects are the smallest units of BI. You map the information in a structured form that is required for constructing InfoProviders.

InfoProviders are persistent data repositories that are used in the layer architecture of the Data Warehouse or in views on data. They can provide the data for analysis, reporting and planning.

Using an InfoSource, which is optional in the new data flow, you can connect multiple sequential transformations. You therefore only require an InfoSource for complex transformations (multistep procedures).

You use the data transfer process (DTP) to transfer the data within BI from one persistent object to another object, in accordance with certain transformations and filters. Possible sources for the data transfer include DataSources and InfoProviders; possible targets include InfoProviders and open hub destinations. To distribute data within BI and in downstream systems, the DTP replaces the InfoPackage, the Data Mart Interface (export DataSources) and the InfoSpoke.

You can also distribute data to other systems using an open hub destination.

In BI, process chains are used to schedule the processes associated with the data flow, including InfoPackages and data transfer processes.

Uses and Advantages of the Data Flow with SAP NetWeaver 7.0

Use of the new DataSource permits real-time data acquisition as well as direct access to source systems of type File and DB Connect.

The data transfer process (DTP) makes the transfer processes in the data warehousing layers more transparent. The performance of the transfer processes increases when you optimize parallelization. With the DTP, delta processes can be separated for different targets and filtering options can be used for the persistent objects on different levels. Error handling can also be defined for DataStore objects with the DTP. The ability to sort out incorrect records in an error stack and to write the data to a buffer after the processing steps of the DTP simplifies error handling. When you use a DTP, you can also directly access each DataSource in the SAP source system that supports the corresponding mode in the metadata (also master data and text DataSources).

The use of transformations simplifies the maintenance of rules for cleansing and consolidating data. Instead of two rules (transfer rules and update rules), as in the past, only the transformation rules are still needed. You edit the transformation rule on an intuitive graphic user interface. InfoSources are no longer mandatory; they are optional and are only required for certain functions. Transformations also provide additional functions such as quantity conversion and the option to create an end routine or expert routine.

Constraints

Hierarchy DataSources, DataSources with the transfer method IDoc as well as DataSources for BAPI source systems cannot be created in the new data flow. They also cannot be migrated. However, DataSources 3.x can be displayed with the interfaces of the new DataSource concept and be used in the new data flow to a limited extent. More information: Using Emulated 3.x DataSources.

Migration

More information about how to migrate an existing data flow with 3.x objects can be found under Migrating Existing Data Flows.