median_test_1samp

hana_ml.algorithms.pal.stats.median_test_1samp(data, col=None, mu=None, test_type=None, confidence_interval=None, thread_ratio=None)

Perform one-sample non-parametric test to check whether the median of the data is different from a user specified one.

Parameters
dataDataFrame

DataFrame containing the data.

colstr, optional

Name of the data column that needs to be tested.

If not given, it defaults to the first column.

mufloat, optional

The median of data. It only matters in the one sample test.

Defaults to 0.

test_type{'two_sides', 'less', 'greater'}, optional

Specifies the alternative hypothesis type.

Default to "two_sides".

confidence_intervalfloat, optional

Confidence interval for the estimated median.

Default to 0.95.

thread_ratiofloat, optional

Specifies the ratio of total number of threads that can be used by this function.

The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.

Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Default to 0.

Returns
DataFrame

Test results, structured as follows:

  • STAT_NAME column, name of statistics.

  • STAT_VALUE column, value of statistics.

Examples

Original data:

>>> df.collect()
     X
0    85
1    65
2    20
3    56
4    30
5    46
6    83
7    33
8    89
9    72
10   51
11   76
12   68
13   82
14   27
15   59
16   69
17   40
18   64
19   8

Perform the one-sample median test:

>>> res = onesample_median_test(df, mu=40, test_type='two_sides')

Result:

>>> res.collect()
                              STAT_NAME  STAT_VALUE
0                          total number   20.000000
1                number smaller than m0    5.000000
2                 number larger than m0   14.000000
3                      estimated median   61.500000
4  CI for estimated median, lower bound   27.000000
5  CI for estimated median, upper bound   83.000000
6                     sign test p-value    0.066457