ks_test

hana_ml.algorithms.pal.stats.ks_test(data, distribution_name=None, distribution_parameter=None, test_type=None)

Performs one-sample or two-sample Kolmogorov-Smirnov test for goodness of fit.

Parameters:

dataDataFrame

HANA DataFrame containing the data.

distribution_namestr, optional

The distribution name. If not provided, it will take first two columns to do the two-sample test.

'beta'
'cauchy'
'chi_square'
'exponential'
'gamma'
'lognormal'
'normal'
'student_t'
'uniform'
'weibull'

distribution_parameterdict, optional

The distribution parameter for the given distribution. The key is the parameter name.

beta: {'shape1' : 0.5, 'shape2' : 0.5}
cauchy: {'location' : 0, 'scale' : 1}
chi_square: {'degrees_of_freedom' : 1}
exponential: {'rate' : 1}
gamma: {'shape' : 1, 'scale' : 1}
lognormal: {'location' : 0, 'scale' : 1}
normal: {'mean' : 0, 'sd' : 1}
student_t: {'degrees_of_freedom' : 1}
uniform: {'min' : 0, 'max' : 1}
weibull: {'shape' : 1, 'scale' : 1}

test_type{'two-sided', 'less', 'greater'}, optional

Defines the null and alternative hypotheses.

Defaults to 'two-sided'.

Returns:

DataFrame: Statistics.

Examples

>>> res = ks_test(data=df,
                  distribution_name='uniform',
                  distribution_parameter={'min':0, 'max':1})
>>> res.collect()