Clustering

Use

Clustering allows you to save sorted data records in the fact table of an InfoCube. Data records with the same dimension keys are saved in the same extents (related database storage unit). This means that same data records are not spread across a large memory area and thereby reduces the number of extents that the system has to read when it accesses tables. This greatly accelerates read, write and delete access to the fact table.

Prerequisites

Currently the function is only supported by the database platform IBM DB2 for Linux, UNIX, and Windows. You can use partitioning to improve the performance of other databases. For more information, see Partitioning.

Features

Two types of clustering are available: Index clustering and multidimensional clustering (MDC).

Index Clustering

Index clustering organizes the data records of a fact table according to the sort sequence of an index. Organization is linear and corresponds to the values of the index field.

If a data record cannot be inserted in accordance with the sort sequence because the relevant extent is already full, the data record is inserted into an empty extent at the end of the table. For this reason, the system cannot guarantee that the sort sequence is always correct, particularly if you perform many insert and delete operations. Reorganizing the table restores the sort sequence and frees up memory space that is no longer required.

The clustering index of an F fact table is, by default, the secondary index in the time dimension. The clustering index of an E fact table is, by default, the acting primary index (P index).

As of release SAP NetWeaver BW 2.0, index clustering is standard for all InfoCubes and aggregates.

Multidimensional Clustering (MDC)

Multidimensional clustering organizes the data records of a fact table in accordance with one or more fields that you define freely. The selected fields are also marked as MDC dimensions. Only data records that have the same values in the MDC dimensions are saved in an extent. In the context of MDC, an extent is called a block. The system can always guarantee that the sort sequence is correct. Reorganizing the table is not necessary, even with many insert and delete operations.

Block indexes from within the database, instead of the default secondary indexes, are created for the selected fields. Block indexes link to extents instead of data record numbers and are therefore much smaller. They save memory space and the system can search through them more quickly. This accelerates table requests that are restricted to these fields.

You can select the key fields of the time dimension or any customer-defined dimensions of an InfoCube as an MDC dimension. You cannot select the key field of the package dimension; it is automatically added to the MDC dimensions in the F fact table.

You can also select a time characteristic instead of the time dimension. In this case, the fact table has an extra field. This contains the SID values of the time characteristic. Currently only the time characteristics Calendar Month (0CALMONTH) and Fiscal Year/Period (0FISCPER) are supported. The time characteristic must be contained in the InfoCube. If you select the Fiscal Year/Period (0FISCPER) characteristic, a constant must be set for the Fiscal Year Variant (0FISCVARNT) characteristic.

Clustering is applied to all the aggregates of the InfoCube. If an aggregate does not contain an MDC dimension of the InfoCube, or if all the InfoObjects of an MDC dimension are created as line item dimensions in the aggregate, the aggregates are clustered using the remaining MDC dimensions. Index clustering is used for the aggregate if the aggregate does not contain any MDC dimensions of the InfoCube, or if it only contains MDC dimensions.

Multidimensional clustering was introduced in Release SAP NetWeaver 7.0 and can be set up separately for each InfoCube.

For procedures, see Definition of Clustering.