ks_test

hana_ml.algorithms.pal.stats.ks_test(data, distribution_name=None, distribution_parameter=None, test_type=None)

This function performs one-sample or two-sample Kolmogorov-Smirnov test for goodness of fit.

Parameters:
dataDataFrame

DataFrame containing the data.

distribution_namestr, optional

The distribution name. If not provided, it will take first two columns to do the two-sample test.

  • 'beta'

  • 'cauchy'

  • 'chi_square'

  • 'exponential'

  • 'gamma'

  • 'lognormal'

  • 'normal'

  • 'student_t'

  • 'uniform'

  • 'weibull'

distribution_parameterdict, optional

The distribution parameter for the given distribution. The key is the parameter name.

  • beta: {'shape1' : 0.5, 'shape2' : 0.5}

  • cauchy: {'location' : 0, 'scale' : 1}

  • chi_square: {'degrees_of_freedom' : 1}

  • exponential: {'rate' : 1}

  • gamma: {'shape' : 1, 'scale' : 1}

  • lognormal: {'location' : 0, 'scale' : 1}

  • normal: {'mean' : 0, 'sd' : 1}

  • students_t: {'degrees_of_freedom' : 1}

  • uniform: {'min' : 0, 'max' : 1}

  • weibull: {'shape' : 1, 'scale' : 1}

test_type{'two-sided', 'less', 'greater'}, optional

Defines the null and alternative hypotheses.

Defaults to 'two-sided'.

Returns:
DataFrame

Returned statistics, structured as follows:

  • STAT_NAME: name of statistics including KS statistic and p-value.

  • STAT_VALUE: value of statistics.

Examples

Input data:

>>> df.collect()
     DATA
0    0.58
1    0.42
2    0.52
3    0.33
4    0.43
5    0.23
6    0.58
7    0.76
8    0.53
9    0.64

Perform the function:

>>> res = ks_test(data,
                  distribution_name='uniform',
                  distribution_parameter={'min':0, 'max':1})
>>> res.collect()
               NAME    VALUE
0      KS statistic     0.26
1           p-value   0.4466