ks_test

hana_ml.algorithms.pal.stats.ks_test(data, distribution_name=None, distribution_parameter=None, test_type=None)

This function performs one-sample or two-sample Kolmogorov-Smirnov test for goodness of fit.

Parameters:

dataDataFrame

DataFrame containing the data.

distribution_namestr, optional

The distribution name. If not provided, it will take first two columns to do the two-sample test.

'beta'

'cauchy'

'chi_square'

'exponential'

'gamma'

'lognormal'

'normal'

'student_t'

'uniform'

'weibull'

distribution_parameterdict, optional

The distribution parameter for the given distribution. The key is the parameter name.

beta: {'shape1' : 0.5, 'shape2' : 0.5}

cauchy: {'location' : 0, 'scale' : 1}

chi_square: {'degrees_of_freedom' : 1}

exponential: {'rate' : 1}

gamma: {'shape' : 1, 'scale' : 1}

lognormal: {'location' : 0, 'scale' : 1}

normal: {'mean' : 0, 'sd' : 1}

students_t: {'degrees_of_freedom' : 1}

uniform: {'min' : 0, 'max' : 1}

weibull: {'shape' : 1, 'scale' : 1}

test_type{'two-sided', 'less', 'greater'}, optional

Defines the null and alternative hypotheses.

Defaults to 'two-sided'.

Returns:

DataFrame

Returned statistics, structured as follows:

STAT_NAME: name of statistics including KS statistic and p-value.

STAT_VALUE: value of statistics.

Examples

Input data:

>>> df.collect()
     DATA
  0.58
  0.42
  0.52
  0.33
  0.43
  0.23
  0.58
  0.76
  0.53
  0.64

Perform the function:

>>> res = ks_test(data,
                  distribution_name='uniform',
                  distribution_parameter={'min':0, 'max':1})
>>> res.collect()
               NAME    VALUE
0      KS statistic     0.26
1           p-value   0.4466