distribution_fit

hana_ml.algorithms.pal.stats.distribution_fit(data, distr_type, optimal_method=None, censored=False)

This algorithm aims to fit a probability distribution for a variable according to a series of measurements to the variable. There are many probability distributions of which some can be fitted more closely to the observed variable than others.

Parameters:

dataDataFrame

DataFrame containing the data.

distr_type{'exponential', 'gamma', 'normal', 'poisson', 'uniform', 'weibull'}

Specify the type of distribution to fit.

optimal_method{'maximum_likelihood', 'median_rank'}, optional

Specifies the estimation method.

Defaults to 'median_rank' when distr_type is 'weibull', 'maximum_likelihood' otherwise.

censoredbool, optional

Specify if data is censored of not.

Only valid when distr_type is 'weibull'.

Default to False.

Returns:

DataFrame

Fitting results, structured as follows:

NAME: name of distribution parameters.

VALUE: value of distribution parameters.

Fitting statistics, structured as follows:

STAT_NAME: name of statistics.

STAT_VALUE: value of statistics.

Examples

Original data:

>>> df.collect()
     DATA
  71.0
  83.0
  92.0
 104.0
 120.0
 134.0
 138.0
 146.0
 181.0
 191.0
206.0
226.0
276.0
283.0
291.0
332.0
351.0
401.0
466.0

Perform the function:

>>> res, stats = distribution_fit(data, distr_type, optimal_method='maximum_likelihood')
>>> res.collect()
               NAME    VALUE
0  DISTRIBUTIONNAME  WEIBULL
1             SCALE    244.4
2             SHAPE  2.06698
>>> stats.collect()
Empty DataFrame
Columns: [STAT_NAME, STAT_VALUE]
Index: []