Data Storage and Data Flow

SAP NetWeaver BW offers a number of options for data storage. These include the implementation of a data warehouse or an operational data store as well as the creation of the data stores used for the analysis.

Architecture

A multi-layer architecture serves to integrate data from heterogeneous sources, transform, consolidate, clean up and store this data, and stage it efficiently for analysis and interpretation purposes in the BW system..  There are two main layers here: the enterprise data warehouse layer and the architected data mart layer.

The graphic below shows the steps involved in the SAP NetWeaver BW data warehousing concept:

Enterprise Data Warehouse Layer:

  • Data Acquisition Layer

    The data acquisition layer receives the data from the source and distributes it in the BW system. This layer makes it possible to fill all targets independently of one another and at different times.

  • Quality and Harmonization Layer

    In the quality and harmonization layer, the data is transformed, harmonized and then stored in DataStore objects.

  • Data Propagation Layer

    The data propagation layer serves the applications. This should happen as quickly as possible, which is why semantic partitioning is possible in this layer. The data is stored and consolidated in DataStore objects. The data propagation layer offers a consistent basis for distributing and reusing data.

  • Corporate Memory

    The corporate memory is filled regardless of how data is posted to the architected data marts. It contains a complete history of all loaded data. Among other things, it can serve as the source for reconstructions, without the need to access the sources again.

Architected Data Mart Layer:

  • Business Transformation Layer

    In the business transformation layer, the data is transformed according to business logic. In the previous layer, the data propagation layer, the data's business logic should not yet have been transformed, thus allowing the data to be reused. DataStore objects might be needed in this layer in order to bring together data from various DataStore objects in the data propagation layer.

  • Reporting Layer (Architected Data Marts)

    The reporting layer contains the objects that queries are run on for analysis This layer is mainly modeled with InfoCubes. These can store their data in the BWA. To improve database performance further, the InfoCubes can be semantically partitioned. Special InfoCubes therefore make it possible to create planning scenarios. On the basis of these InfoCubes, layers can be created on data (in the aggregation level) and methods for changing data (planning functions and planning sequences for example). VirtualProviders allow you to access source data directly. Various composite objects (HybridProviders, InfoSets) provide you with benefits during analysis. Whether or not it is advisable to use these composite InfoProviders depends on the scenario.

  • Virtualization Layer

    To provide greater flexibility, queries should always be defined on a MultiProvider. These form the virtualization layer.

Operational Data Store:

An operational data store supports operational data analysis. In an operational data store, the data is processed continually or in short intervals, and is read for operative analysis. In an operational data store, the mostly uncompressed datasets are therefore quite up-to-date, thus providing excellent support for operative analyses.

Data Store

Various structures and objects that can be used, depending on your requirements, are available for the physical store when modeling the layers.

In the persistent staging area (PSA), the structure of the source data is represented by DataSources. The data of a business unit (for example, customer master data or item data of an order) for a DataSource is stored in a transparent, flat database table, the PSA table. The data storage in the persistent staging area is short- to medium-term. Since it provides the backup status for the subsequent data stores, queries are not possible on this level and this data cannot be archived.

Whereas a DataSource consists of a set of fields, the data stores in the data flow are defined by InfoObjects. The fields in the DataSource must be assigned to the InfoObjects using transformations in the SAP NetWeaver BW system. InfoObjects are thus the smallest metadata units in BW. They structure the information needed to create data stores. They are divided into key figures, characteristics and units.

  • Key figures provide the transaction data, i.e. the values to be analyzed. They can be quantities, amounts, or numbers of items, for example sales volumes or sales figures.
  • Characteristics are sorting keys, such as product, customer group, fiscal year, period, or region. They specify classification options for the dataset and are therefore reference objects for the key figures. Characteristics can contain master data in the form of attributes, texts or hierarchies. Master data is data that remains unchanged over a long period of time. The master data for a cost center, for example, contains the name (text), the person responsible (attribute), and the relevant hierarchy area (hierarchy).
  • Units such as currencies or units of measure define the context of the values of the key figures.

To ensure metadata consistency, you need to use identical InfoObjects to define the data stores in the different layers.

DataStore objects permit complete granular (document level) and historic data storage. As for DataSources, the data is stored in flat database tables. A DataStore object consists of a key (for example, document number, item) and a data area. The data area can contain both key figures (for example, order quantity) and characteristics (for example, order status). In addition to aggregating the data, you can also overwrite the data contents, for example to map the status changes of the order. This is particularly important with document-related structures.

Modeling of a multidimensional store is implemented using InfoCubes. An InfoCube is a set of relational tables that are compiled according to an enhanced star schema. There is a (large) fact table (containing many rows) that contains the key figures of the InfoCube as well as multiple (smaller) surrounding dimension tables containing the characteristics of the InfoCube. The characteristics represent the keys for the key figures. Storage of the data in an InfoCube is additive. For queries on an InfoCube, the facts and key figures are automatically aggregated (summation, minimum or maximum) if necessary. The dimensions combine characteristics that logically belong together, such as a customer dimension consisting of the customer number, customer group and the steps of the customer hierarchy, or a product dimension consisting of the product number, product group and brand. The characteristics refer to the master data (texts or attributes of the characteristic). The facts are the key figures to be evaluated, such as revenue or sales volume. The fact table and the dimensions are linked with one another using abstract identifying numbers (dimension IDs). As a result, the key figures of the InfoCube relate to the characteristics of the dimension. This type of modeling is optimized for efficient data analysis. The graphic below shows the structure of an InfoCube:

You can create logical views (MultiProviders, InfoSets, HybridProviders, CompositeProviders) on the physical data stores in the form of InfoObjects, InfoCubes and DataStore objects, for example to provide data from different data stores for a common evaluation. The link is created across the common Info Objects of the data stores.

The generic term for the physical data stores and the logical views on them is InfoProvider. The task of an InfoProvider is to provide optimized tools for data analysis, reporting and planning.

Data Flow

The data flow in the Enterprise Data Warehouse describes how the data is guided through the layers until it is finally available in the form required for the application. Data extraction and distribution can be controlled in this way and the origin of the data can be fully recorded. Data is transferred from one data store to the next using load processes. You use the InfoPackage to load the source data into the entry layer of SAP NetWeaver BW, the persistent staging area. The data transfer process (DTP) is used to load data from one physical data store within BW to the next, using the transformation rules described above. Fields/InfoObjects of the source store are assigned to InfoObjects of the target store during this process.

You define a load process for a combination of source/target and define the staging method described in the previous section here. You can define various settings for the load process; some of them depend on the type of data and source as well as the data target. For example, you can define data selections in order to transfer relevant data only and to optimize the performance of the load process. Alternatively, you can specify whether the entire source dataset or only the new data since the last load should be loaded into the source. The latter means that data transfer processes automatically permit delta processing for each individual data target. The method of processing InfoPackages - how they are loaded into the SAP NetWeaver BW System - depends on the extraction program used.

The following figure shows a simple data flow using two InfoProviders: