wilcoxon

hana_ml.algorithms.pal.stats.wilcoxon(data, col=None, mu=None, test_type=None, correction=None)

Perform a one-sample or paired two-sample non-parametric test to check whether the median of the data is different from a specific value.

Parameters:
dataDataFrame

DataFrame containing the data.

colstr or a list of str, optional

Name of the data column that needs to be tested.

If not given, the input dataframe must only have one or two columns.

mufloat, optional

The location mu0 for the one sample test. It does not affect the two-sample test.

Defaults to 0.

test_type{'two_sides', 'less', 'greater'}, optional

Specifies the alternative hypothesis type:

Default to "two_sides".

corrctionbool, optional

Controls whether or not to include the continuity correction for the p value calculation.

Default to true.

Returns:
DataFrame

Test results, structured as follows:

  • STAT_NAME column, name of statistics.

  • STAT_VALUE column, value of statistics.

Examples

Original data:

>>> df.collect()
      X
0    85
1    65
2    20
3    56
4    30
5    46
6    83
7    33
8    89
9    72
10   51
11   76
12   68
13   82
14   27
15   59
16   69
17   40
18   64
19   8

Perform the wilcox signed rank test:

>>> res = wilcoxon(df, mu=40, test_type='two_sides', correction=true)

Result:

>>> res.collect()
     STAT_NAME  STAT_VALUE
0    statistic  158.5
1    p-value    0.011228240845317039