variance_test
- hana_ml.algorithms.pal.preprocessing.variance_test(data, sigma_num, thread_ratio=None, key=None, data_col=None)
Variance Test is a method to identify the outliers of n number of numeric data {xi} where 0 < i < n+1, using the mean and the standard deviation of n number of numeric data.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- sigama_numfloat
Multiplier for sigma.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- keystr, optional
Name of the ID column in
data
.If
key
is not specified, then:if
data
is indexed by a single column, thenkey
defaults to that index column;otherwise, it defaults to the first column of
data
.
- data_colstr, optional
Name of the raw data column in the dataframe.
If not specified, defaults to the last column of data.
- Returns:
- DataFrame
Sampling results, structured as follows:
DATA_ID: name as shown in input DataFrame.
IS_OUT_OF_RANGE: 0 -> in bounds, 1 -> out of bounds.
Statistic results, structured as follows:
STAT_NAME: statistic name.
STAT_VALUE: statistic value.
Examples
>>> res, stats = variance_test(data=df, sigma_num=3.0) >>> res.collect() >>> stats.collect()