ks_test

hana_ml.algorithms.pal.stats.ks_test(data, distribution_name=None, distribution_parameter=None, test_type=None)

Performs one-sample or two-sample Kolmogorov-Smirnov test for goodness of fit.

Parameters:
dataDataFrame

HANA DataFrame containing the data.

distribution_namestr, optional

The distribution name. If not provided, it will take first two columns to do the two-sample test.

  • 'beta'

  • 'cauchy'

  • 'chi_square'

  • 'exponential'

  • 'gamma'

  • 'lognormal'

  • 'normal'

  • 'student_t'

  • 'uniform'

  • 'weibull'

distribution_parameterdict, optional

The distribution parameter for the given distribution. The key is the parameter name.

  • beta: {'shape1' : 0.5, 'shape2' : 0.5}

  • cauchy: {'location' : 0, 'scale' : 1}

  • chi_square: {'degrees_of_freedom' : 1}

  • exponential: {'rate' : 1}

  • gamma: {'shape' : 1, 'scale' : 1}

  • lognormal: {'location' : 0, 'scale' : 1}

  • normal: {'mean' : 0, 'sd' : 1}

  • student_t: {'degrees_of_freedom' : 1}

  • uniform: {'min' : 0, 'max' : 1}

  • weibull: {'shape' : 1, 'scale' : 1}

test_type{'two-sided', 'less', 'greater'}, optional

Defines the null and alternative hypotheses.

Defaults to 'two-sided'.

Returns:
DataFrame

Statistics.

Examples

>>> res = ks_test(data=df,
                  distribution_name='uniform',
                  distribution_parameter={'min':0, 'max':1})
>>> res.collect()