cdf

hana_ml.algorithms.pal.stats.cdf(data, distr_info, col=None, complementary=False)

This algorithm evaluates the probability of a variable x from the cumulative distribution function (CDF) or complementary cumulative distribution function (CCDF) for a given probability distribution.

Parameters:
dataDataFrame

DataFrame containing the data.

distr_infodict

A python dictionary object that contains the distribution name and parameter. Supported distributions include: uniform, normal, weibull and gamma. Examples for illustration:

  • {'name':'normal', 'mean':0, 'variance':1.0}.

  • {'name':'uniform', 'min':0.0, 'max':1.0}.

  • {'name':'weibull', 'shape':1.0, 'scale':1.0}.

  • {'name':'gamma', 'shape':1.0, 'scale':1.0}.

You may change the parameter values followed by any of the supported distribution name listed as above.

colstr, optional

Name of the column in the data frame that needs to be processed. If not given, the input dataframe data should only have one column.

complementarybool, optional
  • False: 'cdf'.

  • True: 'ccdf'.

Default to False.

Returns:
DataFrame

CDF results.

Examples

Original data:

>>> df.collect()
    DATACOL
0     37.4
1    277.9
2    463.2
>>> df_distri.collect()
               NAME    VALUE
0  DistributionName  Weibull
1             Shape  2.11995
2             Scale  277.698

Apply the cdf function:

>>> res = cdf(data, distri)
>>> res.collect()
   DATACOL  PROBABILITY
0     37.4     0.014160
1    277.9     0.632688
2    463.2     0.948094