median_test_1samp

hana_ml.algorithms.pal.stats.median_test_1samp(data, col=None, mu=None, test_type=None, confidence_interval=None, thread_ratio=None)

Performs one-sample non-parametric test to check whether the median of the data is different from a user specified one.

Parameters:
dataDataFrame

DataFrame containing the data.

colstr, optional

Name of the data column that needs to be tested.

If not given, it defaults to the first column.

mufloat, optional

The median of data. It only matters in the one sample test.

Defaults to 0.

test_type{'two_sides', 'less', 'greater'}, optional

Specifies the alternative hypothesis type.

Default to "two_sides".

confidence_intervalfloat, optional

Confidence interval for the estimated median.

Default to 0.95.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Default to 0.

Returns:
DataFrame

Test results, structured as follows:

  • STAT_NAME column, name of statistics.

  • STAT_VALUE column, value of statistics.

Examples

Original data:

>>> df.collect()
     X
0    85
1    65
...
18   64
19   8

Perform the one-sample median test:

>>> res = onesample_median_test(data=df, mu=40, test_type='two_sides')

Result:

>>> res.collect()
                              STAT_NAME  STAT_VALUE
0                          total number   20.000000
...
5  CI for estimated median, upper bound   83.000000
6                     sign test p-value    0.066457