hanaml.ConditionIndex.Rdhanaml.ConditionIndex is a R wrapper for SAP HANA PAL Condition Index.
hanaml.ConditionIndex( data, key, features = NULL, scaling = NULL, intercept = NULL, thread.ratio = NULL )
| data |
|
|---|---|
| key |
|
| features |
|
| scaling |
|
| intercept |
|
| thread.ratio |
|
Returns a list of 2 DataFrame:
DataFrame 1
Condition index results, structured as follows:
COMPONENT_ID, principal component ID.
EIGENVALUE, eigenvalue.
CONDITION_INDEX, Condition index.
FEATURES, variance decomposition proportion for each variable.
INTERCEPT, variance decomposition proportion for the intercept term.
DataFrame 2
This table is empty if collinearity problem has not been detected.
Distinct values results, structured as follows:
STAT_NAME: Name for the values, including condition number,
and the name of variables which are involved in collinearity problem.
STAT_VALUE: values of the corresponding name.
Condition index is used to detect collinearity problem between independent variables which are later used as predictors in a multiple linear regression model.
Input DataFrame data:
> data$Collect() ID X1 X2 X3 X4 1 1 12 52 20 44 2 2 12 57 25 45 3 3 12 54 21 45 4 4 13 52 21 46 5 5 14 54 24 46
Call ConditionIndex function:
> ci <- hanaml.ConditionIndex(data, key = "ID", thread.ratio = 0.1)
Output:
> ci[[1]]$Collect()
COMPONENT_ID EIGENVALUE CONDITION_INDEX X1 X2
1 Comp_1 1.996669e+01 1.00000 1.185761e-05 1.556872e-06
2 Comp_2 2.073585e-02 31.03074 8.776374e-03 2.098206e-04
3 Comp_3 1.226013e-02 40.35575 5.347198e-02 2.570866e-03
4 Comp_4 2.295285e-04 294.94070 2.056656e-01 1.522431e-02
5 Comp_5 8.639595e-05 480.73565 7.320742e-01 9.819934e-01
X3 X4 INTERCEPT
1 9.911148e-06 3.175778e-06 2.173805e-06
2 3.106275e-02 1.251087e-03 9.070816e-04
3 5.314573e-03 6.389341e-04 2.710487e-03
4 6.578588e-03 9.311208e-01 2.468621e-01
5 9.570342e-01 6.698598e-02 7.495182e-01