Start of Content Area

Function documentation Checking Data  Locate the document in its SAP Library structure

Use

A complete, error-free data basis is decisive for the results of an analysis process. This is realized step by step with the APD. It is then possible to check each individual processing step. The APD offers you the option of display the data for each step in the analysis process, to calculate intermediate results and to analyze the quality of the data for some nodes. You can perform this check of the data even before you execute the analysis process.

Functions

Display data

Using the Display Data function in the context menu for a node, you can display the data contained in a data source in a table. If an intermediate result was already calculated, it is displayed.

Basic statistics

Using the Display Basic Statistics function in the context menu for a node, you can display the statistics for the selected fields. This information on the data includes histograms, distribution and frequency calculations, simple statistic key figures, such as arithmetical means, standard deviations or correlations.

Here the information is differentiated with the values according to value type of the field. It differentiates between discrete (DST) and continuous (CNT) fields:

·        Discretemeans that there are a number of countable values for the field. This applies to almost all characteristics with check table. For characteristics with a large number of values, such as Business Partner, a report for each single value does not make much sense.

Basic statistics for discrete fields: A frequency table of the most frequent values is displayed.

·        Continuousmeans there are is an undefined number of values. A typical example of this is the key figure revenue.

Basic statistics for continuous fields: A frequency table of the most frequent values is displayed. A value distribution in intervals, the average value, the standard deviation (based on the population) and additional figures are displayed. You can see how these figures are calculated in Formulas for the Calculation of Statistics.

You can select the value type for each selected field. However, the system always suggests a reasonable value type: continuous for numeric fields and discrete for non-numeric fields.

Example

The Color field, with the values red, blue... receives the value type discrete as the proposal. The Environment field, with values between 0 and 1000, receives the value type continuous as the proposal. If Gender is coded as an integer (1 for male, 2 for female, 0 for sex unknown), you should change the suggested value type from continuous to discrete because calculating the average value does not make sense. If you have chosen continuous for a non-numeric field, the system automatically changes the value type to discrete during execution.

Intermediate result

Using the Calculate Intermediate Result function in the context menu for a node, you can calculate the data up to this node. The result is saved in a temporary database table and is helpful, for example, if you want to try out different options after this node during modeling of the analysis process. The intermediate results are also helpful with performance optimization during execution of the analysis process with large amounts of data. If an intermediate result is available for a node, this is displayed with an icon This graphic is explained in the accompanying text. The intermediate result becomes invalid and is no longer displayed if the node was changed. In this way you can also delete the intermediate result if it is no longer current.

Calculation summary

After you have executed the analysis process, you can display additional information about the calculation of the data using the Calculate Calculation Summary function in the context menu for a node. This information can only be called for data mining methods. Depending on the type of transformation, they are comprised of statistical data, probability information or similar. They help to improve evaluation of data quality.

Notes

In order to display data and statistics, you need authorization for the simulate activity (48) in authorization object RSANPR.

With the Display Data and Display Basic Statistics functions the complete calculation is performed up to the specified nodes with all data. This can lead to a short dump due to exceeding the maximum allowed runtime for the dialog process. In this case, create an intermediate result in the background for the selected node. Start the simulation again when the intermediate result has been calculated.

For larger volumes of data, short dump can also occur due to memory overflow. In this case, you select Goto  Performance Settings and delete the Process Data in Memory indicator. This indicator specifies whether the data is maintained in the main memory during the analysis process or whether data is temporarily stored in the database. This indicator is set by default, that is, the data is processed in the main memory. This setting is ideal when small amounts of data are to be processed. For larger volumes of data, the program can terminate when the data no longer fit in the main memory. If this occurs, deactivate this indicator. Then the data is temporarily stored in temporary tables in the database during the analysis process to reduce the main memory requirements. The generated tables begin with /BIC/000AP.

Tips for processing large amounts of data:

·        Optimize the performance by inserting a filter directly behind the data source.

·        Test with mass data: Insert a filter for testing to select a partial quantity of the data. With this restricted volume of data, select Display Data or Display Basic Statistics. Before executing the analysis process, delete the conditions in the filter.

·        If calculation takes too long, you can end the simulation in the Windows system menu of the new window using Cancel Transaction.

 

End of Content Area