Special Settings for Regression Analysis (SAP Library

Special Settings for Regression Analysis

Linear Regression

With the function type Linear Regression, the system trains the scoring function using data with known target values. You need to set the value type of the target value as continuous. At least one of the other model fields must also be continuous. The system defines a separate linear function for each combination of values in discrete model fields that occur in the training data. If alongside the continuous fields the model also contains, for example, the discrete fields "Gender" and "Region", which take the values "m"/"f" or "North"/"Center"/"South" respectively in the training data, then a separate linear function is defined for each combination - (m, North), (m, Center), (m, South), (f, North), (f, Center), (f, South) - for which training data exists. To exclude combinations with a minimal amount of data records, you can use the model parameter Minimum Number of Records. If you set this parameter to 100, for example, and there are 200 training records with (m, North) and 50 with (m, Center), then a linear regression is only performed for (m, North) but not for (m, Center), since the later falls below the minimum number. All data records with (m, Center) thus fall outside of the trained function's definition area (for the domain). If you select the indicator Skip input outside of trained domain, then no score value is calculated for such data records. If you do not select this indicator, the default score value is assigned to these data records.

With the parameters of the model fields, you can specify for discrete fields whether all values, just special values, or just the most frequent values should be considered. For continuous fields, you can explicitly specify both limits of a value range or have them specified automatically by choosing the option Complete Data Range. With the automatic option, the limits are determined by rounding off the maximum and minimum values of the field in the training data. When the function is applied to other data, values occurring outside of this range are then treated as outliers.

Nonlinear Regression

With the function type Nonlinear Regression (using multilinear splines), the system defines a separate multilinear spline function for each combination of discrete model field values occurring in the training data.

As with linear regression, you need to specify the value type of the target value and of at least one other model field as continuous. To prevent the function from overadjusting areas of the training data with a low density of data, you can use the model parameter Smoothing Factor. The greater the smoothing factor, the more the function will smooth out areas with a low density of data.

As with linear regression, you can specify for discrete model fields whether the system should consider all values, just special values, or just the most frequent values. You have to split the value ranges of continuous model fields into intervals. As with linear regression, you can have both of the outer interval limits determined automatically or you can enter them explicitly. You can then specify the desired number of intervals of equal size within those outer limits. Alternatively, you can set these intervals within the outer limits explicitly.

The greater the number of intervals, the greater the extent to which the function can adjust itself to accommodate nonlinear data. At the same time, more intervals mean an increase in processing effort. The number of model fields increases the complexity of the calculation to a greater degree than with linear regression. For this reason, narrower limits are set when nonlinear regression is used.