hanaml.GrubbsTest.Rd
hanaml.GrubbsTest is a R wrapper for SAP HANA PAL Grubbs' Test.
hanaml.GrubbsTest(data, key, col = NULL, method = NULL, alpha = NULL)
DataFrame
DataFrame containting the data points structured as follows:
SOURCE_ID : INTEGER
RAW_DATA : INTEGER or DOUBLE
character
Name of the ID column of data
.
character, optional
Name of the data column that needs to be tested.
If not provided, it defaults the non-key columns of data.
{"two.sided", "one.sided.min","one.sided.max",
"iter.two.sided"},optional
Specifies the method to test against the hypothesis. The test methods
are given as follows:
"two.sided"
use the two-sided test.
"one.sided.min"
use the one-sided test for
minimum value.
"one.sided.max"
use the one-sided test for
maximum value.
"iter.two.sided"
perform two-sided test
iteratively to detect multiple outliers.
Defaults to "two.sided".
double, optional
specifies the significance level at which the algorithm will
reject the hypothesis that there are no outliers in the given
data set.
Defaults to 0.05.
Returns a list of DataFrames.
DataFrame 1
Detected outliers, structured as follows:
SOURCE_ID : ID of the outlier data point.
RAW_DATA : the corresponding value.
DataFrame 2
Statistical information of the tests.
SOURCE_ID : ID of the outlier data point.
STAT_NAME : Statistics name.
STAT_VALUE : Statistics value.
Grubbs' test is used to detect a single outlier in a gaussian distributed
data set.
It can be applied iteratively to detect multiple outliers.
> data$Collect()
ID VAL
1 100 4.254843
2 200 0.135000
3 300 11.072257
4 400 14.797838
5 500 12.125133
6 600 14.265839
7 700 7.731352
8 800 6.856739
9 900 15.094403
10 101 8.149382
11 201 9.160144
Call the function:
> result <- hanaml.GrubbsTest(data=data,
method = "one.sided.min",
alpha = 0.2)
Results:
> result[[1]]$Collect()
ID VAL
1 200 0.135
> result[[2]]$Collect()
ID STAT_NAME STAT_VALUE
1 200 MEAN 9.4220845
2 200 STANDARD_SAMPLE_VARIANCE 4.6759352
3 200 T 1.9102192
4 200 G 1.9861448
5 200 U 0.5660752