hanaml.ConditionIndex is a R wrapper for SAP HANA PAL Condition Index.

hanaml.ConditionIndex(
  data,
  key,
  features = NULL,
  scaling = NULL,
  intercept = NULL,
  thread.ratio = NULL
)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

features

character of list of characters, optional
Name of feature columns.
If not provided, it defaults all non-key, non-label columns of data.

scaling

logical, optional
Specifies whether or not to scale the input data to have unit variance before the analysis.
Default to TRUE.

intercept

logical, optional
Specifies whether or not to consider intercept during the calculation.
Default to TRUE.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

Value

Returns a list of 2 DataFrame:

  • DataFrame 1
    Condition index results, structured as follows:

    • COMPONENT_ID, principal component ID.

    • EIGENVALUE, eigenvalue.

    • CONDITION_INDEX, Condition index.

    • FEATURES, variance decomposition proportion for each variable.

    • INTERCEPT, variance decomposition proportion for the intercept term.

  • DataFrame 2
    This table is empty if collinearity problem has not been detected.
    Distinct values results, structured as follows:

    • STAT_NAME: name for the values, including condition number, and the name of variables which are involved in collinearity problem.

    • STAT_VALUE: values of the corresponding name.

Details

Condition index is used to detect collinearity problem between independent variables which are later used as predictors in a multiple linear regression model.

Examples

Input DataFrame data:


> data$Collect()
  ID X1 X2 X3 X4
1  1 12 52 20 44
2  2 12 57 25 45
3  3 12 54 21 45
4  4 13 52 21 46
5  5 14 54 24 46

Call ConditionIndex function:


> ci <- hanaml.ConditionIndex(data, key="ID", thread.ratio=0.1)

Output:


> ci[[1]]$Collect()
  COMPONENT_ID      EIGENVALUE CONDITION_INDEX             X1            X2
1       Comp_1    1.996669e+01         1.00000   1.185761e-05  1.556872e-06
2       Comp_2    2.073585e-02        31.03074   8.776374e-03  2.098206e-04
3       Comp_3    1.226013e-02        40.35575   5.347198e-02  2.570866e-03
4       Comp_4    2.295285e-04       294.94070   2.056656e-01  1.522431e-02
5       Comp_5    8.639595e-05       480.73565   7.320742e-01  9.819934e-01
                X3          X4    INTERCEPT
1    9.911148e-06 3.175778e-06 2.173805e-06
2    3.106275e-02 1.251087e-03 9.070816e-04
3    5.314573e-03 6.389341e-04 2.710487e-03
4    6.578588e-03 9.311208e-01 2.468621e-01
5    9.570342e-01 6.698598e-02 7.495182e-01