hanaml.VarianceTest.Rd
hanaml.VarianceTest is a R wrapper for SAP HANA PAL Variance Test.
hanaml.VarianceTest(data, key, sigma.num, thread.ratio = NULL, data.col = NULL)
DataFrame
DataFrame containting the data.
character
Name of the ID column.
double
Multiplier for sigma.
double, optional
Controls the proportion of available threads that can be used by this
function.
The value range is from 0 to 1, where 0 indicates a single thread,
and 1 indicates all available threads.
Values between 0 and 1 will use up to
that percentage of available threads.Values outside this
range are ignored.
Defaults to 0.
character, optional
Name of the raw data column in the data.
If not specified, the first non-ID column is taken as data.col.
Returns a list of two DataFrames:
DataFrame 1
Sampling results, structured as follows:
DATA_ID: name as shown in input DataFrame.
IS_OUT_OF_RANGE: 0 -> in bounds, 1 -> out of bounds.
DataFrame 2
Statistic results, structured as follows:
STAT_NAME: statistic name.
STAT_VALUE: statistic value.
Variance Test is a method to identify the outliers of n number of numeric data xi where 0 < i < n+1, using the mean and the standard deviation(sigma) of n number of numeric data.
Input DataFrame data:
> data$Collect()
ID X
1 0 25
2 1 20
3 2 23
4 3 29
5 4 26
...
18 17 23
19 18 25
20 19 103
Call the function:
> vt <- hanaml.VarianceTest(data, key = "ID", sigma.num = 3.0)
Output:
> vt[[2]]$Collect()
ID IS_OUT_OF_RANGE
1 0 0
2 1 0
3 2 0
...
18 17 0
19 18 0
20 19 1