cdf
- hana_ml.algorithms.pal.stats.cdf(data, distr_info, col=None, complementary=False)
Evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.
- Parameters:
- dataDataFrame
DataFrame containing the data.
- distr_infodict
A python dictionary object that contains the distribution name and parameter. Supported distributions include: uniform, normal, weibull and gamma. Examples for illustration:
{'name':'normal', 'mean':0, 'variance':1.0}.
{'name':'uniform', 'min':0.0, 'max':1.0}.
{'name':'weibull', 'shape':1.0, 'scale':1.0}.
{'name':'gamma', 'shape':1.0, 'scale':1.0}.
You may change the parameter values followed by any of the supported distribution name listed as above.
- colstr, optional
Name of the column in the data frame that needs to be processed. If not given, the input DataFrame data should only have one column.
- complementarybool, optional
False: 'cdf'.
True: 'ccdf'.
Default to False.
- Returns:
- DataFrame
CDF results.
Examples
Input DataFrames:
>>> df.collect() DATACOL 0 37.4 1 277.9 2 463.2
>>> df_distri.collect() NAME VALUE 0 DistributionName Weibull 1 Shape 2.11995 2 Scale 277.698
Apply the cdf function:
>>> res = cdf(data=df, distri=df_distri) >>> res.collect() DATACOL PROBABILITY 0 37.4 0.014160 1 277.9 0.632688 2 463.2 0.948094