hanaml.VarianceTest {hana.ml.r}R Documentation

Variance Test

Description

Variance Test is a method to identify the outliers of n number of numeric data xi where 0 < i < n+1, using the mean and the standard deviation(sigma) of n number of numeric data.

Usage

hanaml.VarianceTest(conn.context,
                           data,
                           key,
                           sigma.num,
                           thread.ratio = NULL,
                           data.col = NULL)

Arguments

conn.context

ConnectionContext
Database connection object.

data

DataFrame
Dataset used for variance test.

key

character
Name of the ID column in data.

sigma.num

double
Multiplier for sigma.

thread.ratio

numeric, optional
Specifies the ratio of total number of threads that can be used by this function. The value rangeis from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.

data.col

character, optional
Name of the raw data column in the DataFrame.
If not specified, the first non-ID column will be taken as the data column for variance test.

Value

Examples

## Not run: 
Input DataFrame data variance test:
> data$Collect()
       ID X
   1   0  25
   2   1  20
   3   2  23
   4   3  29
   5   4  26
   ...
   18 17  23
   19 18  25
   20 19 103

Do variance test for the input data:
>  vt <- hanaml.VarianceTest(conn.context, data, key = "ID",  sigma.num = 3.0)
Expected output:
> vt[[2]]$Collect()
      ID     IS_OUT_OF_RANGE
  1   0               0
  2   1               0
  3   2               0
  4   3               0
  5   4               0
  ...
  18 17               0
  19 18               0
  20 19               1

## End(Not run)

[Package hana.ml.r version 1.0.8 Index]