Show TOC

Architecture of a Data WarehouseLocate this document in the navigation structure

Use

There are many different definitions of a data warehouse. However, they all favor a layer-based architecture.

Data warehousing has developed into an advanced and complex technology. For some time it was assumed that it was sufficient to store data in a star schema optimized for reporting. However this does not adequately meet the needs for consistency and flexibility in the long run. Therefore data warehouses are now structured using a layer architecture. The different layers contain data in differing levels of granularity. We differentiate between the following layers:

  • Persistent staging area

  • Data warehouse

  • Architected data marts

  • Operational data store

Persistent Staging Area

After it is extracted from source systems, data is transferred to the entry layer of the data warehouse, the persistent staging area (PSA). In this layer, data is stored in the same form as in the source system. The way in which data is transferred from here to the next layer incorporates quality-assuring measures and the transformations and clean up required for a uniform, integrated view of the data.

Data Warehouse

The result of the first transformations and clean up is saved in the next layer, the data warehouse. This data warehouse layer offers integrated, granular, historic, stable data that has not yet been modified for a concrete usage and can therefore be seen as neutral. It acts as the basis for building consistent reporting structures and allows you to react to new requirements with flexibility.

Architected Data Marts

The data warehouse layer provides the most multidimensional analysis structures. These are also called architected data marts. This layer satisfies data analysis requirements. Data marts are not necessarily to be equated with the terms summarized or aggregated; here too you find highly granular structures but they are focused on data analysis requirements alone, unlike the granular data in the data warehouse layer which is application neutral so as to ensure reusability.

The term "architected" refers to the fact that these data marts are not isolated applications but are based on a universally consistent data model. This means that master data can be reused in the form of Shared or Conformed Dimensions.

Operational Data Store

As well as strategic data analysis, a data warehouse also supports operative data analysis by means of the operational data store. Data can be updated to an operational data store on a continual basis or at short intervals and be read for operative analysis. You can also forward the data from the operational data store layer to the data warehouse layer at set times. This means that the data is stored in different levels of granularity: while the operational data store layer contains all the changes to the data, only the days-end status, for example, is stored in the data warehouse layer.

The layer architecture of the data warehouse is largely conceptual. In reality the boundaries between these layers are often fluid; individual data memory can play a role in two different layers. The technical implementation is always specific to the organization.