iqr
- hana_ml.algorithms.pal.stats.iqr(data, key, col=None, multiplier=None)
Performs the inter-quartile range (IQR) test to find the outliers of the data. The inter-quartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1) of the data. Data points will be marked as outliers if they fall outside the range from Q1 -
multiplier
* IQR to Q3 +multiplier
* IQR.- Parameters:
- dataDataFrame
DataFrame containing the data.
- keystr
Name of the ID column.
- colstr, optional
Name of the data column that needs to be tested.
If not given, it defaults to the first non-ID column.
- multiplierfloat, optional
The multiplier used to calculate the value range during the IQR test.
Upper-bound = Q3 +
multiplier
* IQR,Lower-bound = Q1 -
multiplier
* IQR,
where Q1 is equal to 25th percentile and Q3 is equal to 75th percentile.
Defaults to 1.5.
- Returns:
- DataFrames
Test results, structured as follows:
ID column, with same name and type as
data
's ID column.IS_OUT_OF_RANGE, type INTEGER, containing the test results from the IQR test that determine whether each data sample is in the range or not:
0: a value is in the range.
1: a value is out of range.
Statistical outputs, including Upper-bound and Lower-bound from the IQR test, structured as follows:
STAT_NAME, type NVARCHAR(256), statistics name.
STAT_VALUE, type DOUBLE, statistics value.
Examples
Original data:
>>> df.collect() ID VAL 0 P1 10.0 1 P2 11.0 ... 13 P14 13.0 14 P15 12.0
Perform the IQR test:
>>> res, stat = iqr(data=df, key='ID', col='VAL', multiplier=1.5) >>> res.collect() ID IS_OUT_OF_RANGE 0 P1 0 1 P2 0 ... 13 P14 0 14 P15 0 >>> stat.collect() STAT_NAME STAT_VALUE 0 lower quartile 10.0 1 upper quartile 12.0