OnlineBCPD

class hana_ml.algorithms.pal.tsa.changepoint.OnlineBCPD(alpha=None, beta=None, kappa=None, mu=None, lamb=None, threshold=None, delay=None, prune=None, massive=False, group_params=None)

Online Bayesian Change-point detection.

Parameters:
alphafloat, optional

Parameter of t-distribution.

Defaults to 0.1.

betafloat, optional

Parameter of t-distribution.

Defaults to 0.01.

kappafloat, optional

Parameter of t-distribution.

Defaults to 1.0.

mufloat, optional

Parameter of t-distribution.

Defaults to 0.0.

lambfloat, optional

Parameter of constant hazard function.

Defaults to 250.0.

thresholdfloat, optional

Threshold to determine a change point:

  • 0: Return the probability of change point for every time step.

  • 0~1: Only return the time step of which the probability is above the threshold.

Defaults to 0.0.

delayint, optional

Number of incoming time steps to determine whether the current time step is a change point.

Defaults to 3.

prunebool, optional

Reduce the size of model table after every run:

  • False: Do not prune.

  • True: Prune.

Defaults to False.

massivebool, optional

Specifies whether or not to use massive mode of croston TSB.

  • True : massive mode.

  • False : single mode.

For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.

An example is as follows:

>>> obcpd = OnlineBCPD(massive=True,
                       threshold=2,
                       group_params= {'Group_1': {'threshold':10, 'prune' :False}})
>>> res = obcpd.fit_predict(data=df,
                            key='ID',
                            endog='y',
                            group_key='GROUP_ID')

In this example, as 'threshold' is set in group_params for Group_1 is not applicable to Group_1.

Defaults to False.

group_paramsdict, optional

If massive mode is activated (massive is True), input data for croston TSB shall be divided into different groups with different parameters applied.

An example is as follows:

>>> obcpd = OnlineBCPD(massive=True,
                       group_params= {'Group_1': {'threshold':10, 'prune' :False},
                                      'Group_2': {'threshold':10, 'prune' :True}})
>>> res = obcpd.fit_predict(data=df,
                            key='ID',
                            endog='y',
                            group_key='GROUP_ID')

Valid only when massive is True and defaults to None.

Examples

Input Data:

>>> df.collect()
   ID        VAL
0   0   9.926943
1   1   9.262971
2   2   9.715766
3   3   9.944334
4   4   9.577682
5   5  10.036977
6   6   9.513112
7   7  10.233246
8   8  10.159134
9   9   9.759518
.......

Create an OnlineBCPD instance:

>>> obcpd = OnlineBCPD(alpha=0.1,
                       beta=0.01,
                       kappa=1.0,
                       mu=0.0,
                       delay=5,
                       threshold=0.5,
                       prune=True)

Invoke fit_predict():

>>> model, cp = obcpd.fit_predict(data=df, model=None)

Output:

>>> print(model.head(5).collect())
   ID  ALPHA        BETA  KAPPA         MU          PROB
0   0    0.1    0.010000    1.0   0.000000  4.000000e-03
1   1    0.6   71.013179    2.0   8.426338  6.478577e-05
2   2    1.1   86.966340    3.0  10.732357  7.634862e-06
3   3    1.6  100.514641    4.0  12.235038  1.540977e-06
4   4    2.1  107.197565    5.0  13.052529  3.733699e-07
>>> print(cp.collect())
   ID  POSITION  PROBABILITY
0   0        58     0.989308
1   1       249     0.991023
2   2       402     0.994154
3   3       539     0.981004
4   4       668     0.994708
Attributes:
model_DateFrame

Model.

error_msg_DataFrame

Error message. Only valid if massive is True when initializing an 'OnlineBCPD' instance.

Methods

fit_predict(data[, key, endog, model, group_key])

Detects change-points of the input data.

get_model_metrics()

Get the model metrics.

get_score_metrics()

Get the score metrics.

get_stats()

Gets the statistics.

get_model_metrics()

Get the model metrics.

Returns:
DataFrame

The model metrics.

get_score_metrics()

Get the score metrics.

Returns:
DataFrame

The score metrics.

fit_predict(data, key=None, endog=None, model=None, group_key=None)

Detects change-points of the input data.

Parameters:
dataDataFrame

Input time-series data for change-point detection.

keystr, optional

Column name for time-stamp of the input time-series data.

If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.

If the index of data is set as a single column, the default value of key is index column of data.

endogstr, optional

Column name for the value of the input time-series data. Defaults to the first non-key column.

modelDataFrame, optional

The model for change point detection.

Defaults to self.model_ (the default value of self.model_ is None).

group_keystr, optional

The column of group_key. The data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.

This parameter is only valid when massive is True.

Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.

Returns:
DataFrame 1

Model.

DataFrame 2

The detected change points.

get_stats()

Gets the statistics.

Returns:
DataFrame

Statistics.

Inherited Methods from PALBase

Besides those methods mentioned above, the OnlineBCPD class also inherits methods from PALBase class, please refer to PAL Base for more details.