iqr
- hana_ml.algorithms.pal.stats.iqr(data, key, col=None, multiplier=None)
-
Performs the inter-quartile range (IQR) test to find the outliers of the data. The inter-quartile range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1) of the data. Data points will be marked as outliers if they fall outside the range from Q1 -
multiplier
* IQR to Q3 +multiplier
* IQR.- Parameters:
-
- dataDataFrame
-
DataFrame containing the data.
- keystr
-
Name of the ID column.
- colstr, optional
-
Name of the data column that needs to be tested.
If not given, it defaults to the first non-ID column.
- multiplierfloat, optional
-
The multiplier used to calculate the value range during the IQR test.
-
Upper-bound = Q3 +
multiplier
* IQR, -
Lower-bound = Q1 -
multiplier
* IQR,
where Q1 is equal to 25th percentile and Q3 is equal to 75th percentile.
Defaults to 1.5.
-
- Returns:
-
- DataFrames
-
Test results, structured as follows:
-
ID column, with same name and type as
data
's ID column. -
IS_OUT_OF_RANGE, type INTEGER, containing the test results from the IQR test that determine whether each data sample is in the range or not:
-
0: a value is in the range.
-
1: a value is out of range.
-
Statistical outputs, including Upper-bound and Lower-bound from the IQR test, structured as follows:
-
STAT_NAME, type NVARCHAR(256), statistics name.
-
STAT_VALUE, type DOUBLE, statistics value.
-
Examples
Original data:
>>> df.collect() ID VAL 0 P1 10.0 1 P2 11.0 ... 13 P14 13.0 14 P15 12.0
Perform the IQR test:
>>> res, stat = iqr(data=df, key='ID', col='VAL', multiplier=1.5) >>> res.collect() ID IS_OUT_OF_RANGE 0 P1 0 1 P2 0 ... 13 P14 0 14 P15 0 >>> stat.collect() STAT_NAME STAT_VALUE 0 lower quartile 10.0 1 upper quartile 12.0