Condition Index — hanaml.ConditionIndex • hana.ml.r

hanaml.ConditionIndex is a R wrapper for SAP HANA PAL Condition Index.

hanaml.ConditionIndex(
  data,
  key,
  features = NULL,
  scaling = NULL,
  intercept = NULL,
  thread.ratio = NULL
)

Arguments

data: DataFrame
DataFrame containting the data.
key: character
Name of the ID column.
features: character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.
scaling: logical, optional
Specifies whether or not to scale the input data to have unit variance before the analysis.
Default to TRUE.
intercept: logical, optional
Specifies whether or not to consider intercept during the calculation.
Default to TRUE.
thread.ratio: double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

Value

Returns a list of 2 DataFrame:

DataFrame 1
Condition index results, structured as follows:
- COMPONENT_ID, principal component ID.
- EIGENVALUE, eigenvalue.
- CONDITION_INDEX, Condition index.
- FEATURES, variance decomposition proportion for each variable.
- INTERCEPT, variance decomposition proportion for the intercept term.
DataFrame 2
This table is empty if collinearity problem has not been detected.
Distinct values results, structured as follows:
- STAT_NAME: name for the values, including condition number, and the name of variables which are involved in collinearity problem.
- STAT_VALUE: values of the corresponding name.

Details

Condition index is used to detect collinearity problem between independent variables which are later used as predictors in a multiple linear regression model.

Examples

Input DataFrame data:


> data$Collect()
  ID X1 X2 X3 X4
1  1 12 52 20 44
2  2 12 57 25 45
3  3 12 54 21 45
4  4 13 52 21 46
5  5 14 54 24 46

Call ConditionIndex function:


> ci <- hanaml.ConditionIndex(data, key="ID", thread.ratio=0.1)

Output:


> ci[[1]]$Collect()
  COMPONENT_ID      EIGENVALUE CONDITION_INDEX             X1            X2
1       Comp_1    1.996669e+01         1.00000   1.185761e-05  1.556872e-06
2       Comp_2    2.073585e-02        31.03074   8.776374e-03  2.098206e-04
3       Comp_3    1.226013e-02        40.35575   5.347198e-02  2.570866e-03
4       Comp_4    2.295285e-04       294.94070   2.056656e-01  1.522431e-02
5       Comp_5    8.639595e-05       480.73565   7.320742e-01  9.819934e-01
                X3          X4    INTERCEPT
1    9.911148e-06 3.175778e-06 2.173805e-06
2    3.106275e-02 1.251087e-03 9.070816e-04
3    5.314573e-03 6.389341e-04 2.710487e-03
4    6.578588e-03 9.311208e-01 2.468621e-01
5    9.570342e-01 6.698598e-02 7.495182e-01