quantile

hana_ml.algorithms.pal.stats.quantile(data, distr_info, col=None, complementary=False)

This algorithm evaluates the inverse of the cumulative distribution function (CDF) or the inverse of the complementary cumulative distribution function (CCDF) for a given probability p and probability distribution.

Parameters:

dataDataFrame

DataFrame containing the data.

distr_infodict

A python dictionary object that contains the distribution name and parameter. Supported distributions include: uniform, normal, weibull and gamma. Examples for illustration:

{'name':'normal', 'mean':0, 'variance':1.0}.

{'name':'uniform', 'min':0.0, 'max':1.0}.

{'name':'weibull', 'shape':1.0, 'scale':1.0}.

{'name':'gamma', 'shape':1.0, 'scale':1.0}.

You may change the parameter values followed by any of the supported distribution name listed as above.

colstr, optional

Name of the column in the data frame that needs to be processed.

If not given, it defaults to the first column.

complementarybool, optional

False: 'cdf'
True: 'ccdf'

Default to False.

Returns:

DataFrame: CDF results.

Examples

Original data:

>>> df.collect()
  DATACOL
0    0.3
1    0.5
2  0.632
3    0.8
>>> df_distr.collect()
               NAME    VALUE
0  DistributionName  Weibull
1             Shape  2.11995
2             Scale  277.698

Apply the quantile function:

>>> res = quantile(data, distr)
>>> res.collect()
   DATACOL      QUANTILE
0     0.3     170.755854
1     0.5     233.608506
2   0.632     277.655075
3     0.8     347.586495