Clustering (SAP Library

Clustering

Use

Clustering allows you to save the sorted data records in the fact table of an InfoCube. Data records with the same dimension keys are saved in the same extents (related database storage unit). This prevents the same data records from being spread over a large memory area and thereby reduces the number of extents to be read upon accessing tables. This greatly increases read, write and delete access to the fact table.

Prerequisites

Currently, the function is only supported by the database platform IBM DB2 Universal Database for UNIX and Windows. You can use partitioning to improve the performance of other databases. For more information, see Partitioning.

Features

Two types of clustering are available: index clustering and multidimensional clustering (MDC).

Index Clustering

Index clustering organizes the data records of a fact table according to the sort sequence of an index. There is a linear organization according to the values of the index field.

If a data record cannot be inserted according to sorting because the relevant extent is already full, the data record is inserted into an empty extent at the end of the table. Sorting is not guaranteed and can degenerate during many insert and delete operations. Reorganizing the table restores the sorting and releases memory space that is no longer required.

The clustering index of an F fact table is, by default, the secondary index in the time dimension. The clustering index of an E fact table is, by default, the acting primary index (P index).

As of release SAP BW 2.0, index clustering is standard for all InfoCubes and aggregates.

Multidimensional Clustering (MDC)

Multidimensional clustering organizes the data records of a fact table, according to one or more fields of your choice. The selected fields are also indicated as MDC dimensions. Only data records with the same values in the MDC dimensions are saved in an extent. In the context of MDC, an extent is called a block. This always guarantees sorting. Reorganizing the table is not necessary during many insert and delete operations.

Block indexes from within the database, instead of the default secondary indexes, are created for the selected fields. Block indexes link to extents instead of data record numbers and are therefore much smaller. They save memory space and can be searched through more quickly. This particularly improves performance of table queries that are restricted to these fields.

You can select the key fields of the time dimension and all customer dimensions of an InfoCube as an MDC dimension. You cannot select the key field of the package dimension; it is automatically added to the MDC dimensions in the F fact table.

You can also select a time characteristic instead of the time dimension. In this case, the fact table has an extra field. This contains the SID values of the time characteristic. Only the time characteristics Calendar Month (0CALMONTH) and Fiscal Year/Period (0FISCPER) are currently supported. The time characteristic must be contained in the InfoCube. If you select the Fiscal Year/Period (0FISCPER) characteristic, the Fiscal Year Variant (0FISCVARNT) characteristic must be set to a constant.

The clustering is applied to all the aggregates of the InfoCube. If an aggregate does not contain an MDC dimension of the InfoCube, or if all the InfoObjects of an MDC dimension are created as line item dimensions in the aggregate, the clustering of aggregates is created with the remaining MDC dimensions. Index clustering is used for the aggregate if it does not contain any MDC dimensions of the InfoCube, or if it only contains MDC dimensions.

Multidimensional clustering was introduced in Release SAP NetWeaver 2004s and can be set up separately for each InfoCube.

For procedures, see Definition of Clustering.