median_test_1samp
- hana_ml.algorithms.pal.stats.median_test_1samp(data, col=None, mu=None, test_type=None, confidence_interval=None, thread_ratio=None)
Perform one-sample non-parametric test to check whether the median of the data is different from a user specified one.
- Parameters
- dataDataFrame
DataFrame containing the data.
- colstr, optional
Name of the data column that needs to be tested.
If not given, it defaults to the first column.
- mufloat, optional
The median of data. It only matters in the one sample test.
Defaults to 0.
- test_type{'two_sides', 'less', 'greater'}, optional
Specifies the alternative hypothesis type.
Default to "two_sides".
- confidence_intervalfloat, optional
Confidence interval for the estimated median.
Default to 0.95.
- thread_ratiofloat, optional
Specifies the ratio of total number of threads that can be used by this function.
The value range is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads.
Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Default to 0.
- Returns
- DataFrame
Test results, structured as follows:
STAT_NAME column, name of statistics.
STAT_VALUE column, value of statistics.
Examples
Original data:
>>> df.collect() X 0 85 1 65 2 20 3 56 4 30 5 46 6 83 7 33 8 89 9 72 10 51 11 76 12 68 13 82 14 27 15 59 16 69 17 40 18 64 19 8
Perform the one-sample median test:
>>> res = onesample_median_test(df, mu=40, test_type='two_sides')
Result:
>>> res.collect() STAT_NAME STAT_VALUE 0 total number 20.000000 1 number smaller than m0 5.000000 2 number larger than m0 14.000000 3 estimated median 61.500000 4 CI for estimated median, lower bound 27.000000 5 CI for estimated median, upper bound 83.000000 6 sign test p-value 0.066457