distribution_fit
- hana_ml.algorithms.pal.stats.distribution_fit(data, distr_type, optimal_method=None, censored=False)
This algorithm aims to fit a probability distribution for a variable according to a series of measurements to the variable. There are many probability distributions of which some can be fitted more closely to the observed variable than others.
- Parameters
- dataDataFrame
DataFrame containing the data.
- distr_type{'exponential', 'gamma', 'normal', 'poisson', 'uniform', 'weibull'}
Specify the type of distribution to fit.
- optimal_method{'maximum_likelihood', 'median_rank'}, optional
Specifies the estimation method.
Defaults to 'median_rank' when
distr_type
is 'weibull', 'maximum_likelihood' otherwise.- censoredbool, optional
Specify if
data
is censored of not.Only valid when
distr_type
is 'weibull'.Default to False.
- Returns
- DataFrame
Fitting results, structured as follows:
NAME: name of distribution parameters.
VALUE: value of distribution parameters.
Fitting statistics, structured as follows:
STAT_NAME: name of statistics.
STAT_VALUE: value of statistics.
Examples
Original data:
>>> df.collect() DATA 0 71.0 1 83.0 2 92.0 3 104.0 4 120.0 5 134.0 6 138.0 7 146.0 8 181.0 9 191.0 10 206.0 11 226.0 12 276.0 13 283.0 14 291.0 15 332.0 16 351.0 17 401.0 18 466.0
Perform the function:
>>> res, stats = distribution_fit(data, distr_type, optimal_method='maximum_likelihood') >>> res.collect() NAME VALUE 0 DISTRIBUTIONNAME WEIBULL 1 SCALE 244.4 2 SHAPE 2.06698 >>> stats.collect() Empty DataFrame Columns: [STAT_NAME, STAT_VALUE] Index: []