hanaml.VarianceTest is a R wrapper for SAP HANA PAL Variance Test.

hanaml.VarianceTest(data, key, sigma.num, thread.ratio = NULL, data.col = NULL)

Arguments

data

DataFrame
DataFrame containting the data.

key

character
Name of the ID column.

sigma.num

double
Multiplier for sigma.

thread.ratio

double, optional
Controls the proportion of available threads that can be used by this function.
The value range is from 0 to 1, where 0 indicates a single thread, and 1 indicates all available threads.
Values between 0 and 1 will use up to that percentage of available threads.Values outside this range are ignored.
Defaults to 0.

data.col

character, optional
Name of the raw data column in the data.
If not specified, the first non-ID column is taken as data.col.

Value

Returns a list of two DataFrames:

  • DataFrame 1
    Sampling results, structured as follows:

    • DATA_ID: name as shown in input DataFrame.

    • IS_OUT_OF_RANGE: 0 -> in bounds, 1 -> out of bounds.

  • DataFrame 2
    Statistic results, structured as follows:

    • STAT_NAME: statistic name.

    • STAT_VALUE: statistic value.

Details

Variance Test is a method to identify the outliers of n number of numeric data xi where 0 < i < n+1, using the mean and the standard deviation(sigma) of n number of numeric data.

Examples

Input DataFrame data:


> data$Collect()
    ID  X
1   0  25
2   1  20
3   2  23
4   3  29
5   4  26
...
18 17  23
19 18  25
20 19 103

Call the function:


> vt <- hanaml.VarianceTest(data, key = "ID", sigma.num = 3.0)

Output:


> vt[[2]]$Collect()
   ID IS_OUT_OF_RANGE
1   0               0
2   1               0
3   2               0
...
18 17               0
19 18               0
20 19               1