distribution_fit

hana_ml.algorithms.pal.stats.distribution_fit(data, distr_type, optimal_method=None, censored=False)

Aims to fit a probability distribution for a variable according to a series of measurements to the variable. There are many probability distributions of which some can be fitted more closely to the observed variable than others.

Parameters:
dataDataFrame

DataFrame containing the data.

distr_type{'exponential', 'gamma', 'normal', 'poisson', 'uniform', 'weibull'}

Specify the type of distribution to fit.

optimal_method{'maximum_likelihood', 'median_rank'}, optional

Specifies the estimation method.

Defaults to 'median_rank' when distr_type is 'weibull', 'maximum_likelihood' otherwise.

censoredbool, optional

Specify if data is censored of not. Only valid when distr_type is 'weibull'.

Default to False.

Returns:
DataFrames

DataFrame 1 : fitting results, structured as follows:

  • NAME: name of distribution parameters.

  • VALUE: value of distribution parameters.

DataFrame 2 : fitting statistics.

Examples

>>> res, stats = distribution_fit(data=df,
                                  distr_type='weibull',
                                  optimal_method='maximum_likelihood')
>>> res.collect()
>>> stats.collect()