OnlineBCPD
- class hana_ml.algorithms.pal.tsa.changepoint.OnlineBCPD(alpha=None, beta=None, kappa=None, mu=None, lamb=None, threshold=None, delay=None, prune=None, massive=False, group_params=None)
Online Bayesian Change-point detection.
- Parameters:
- alphafloat, optional
Parameter of t-distribution.
Defaults to 0.1.
- betafloat, optional
Parameter of t-distribution.
Defaults to 0.01.
- kappafloat, optional
Parameter of t-distribution.
Defaults to 1.0.
- mufloat, optional
Parameter of t-distribution.
Defaults to 0.0.
- lambfloat, optional
Parameter of constant hazard function.
Defaults to 250.0.
- thresholdfloat, optional
Threshold to determine a change point:
0: Return the probability of change point for every time step.
0~1: Only return the time step of which the probability is above the threshold.
Defaults to 0.0.
- delayint, optional
Number of incoming time steps to determine whether the current time step is a change point.
Defaults to 3.
- prunebool, optional
Reduce the size of model table after every run:
False: Do not prune.
True: Prune.
Defaults to False.
- massivebool, optional
Specifies whether or not to use massive mode of croston TSB.
True : massive mode.
False : single mode.
For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.
An example is as follows:
>>> obcpd = OnlineBCPD(massive=True, threshold=2, group_params= {'Group_1': {'threshold':10, 'prune' :False}}) >>> res = obcpd.fit_predict(data=df, key='ID', endog='y', group_key='GROUP_ID')
In this example, as 'threshold' is set in group_params for Group_1 is not applicable to Group_1.
Defaults to False.
- group_paramsdict, optional
If massive mode is activated (
massive
is True), input data for croston TSB shall be divided into different groups with different parameters applied.An example is as follows:
>>> obcpd = OnlineBCPD(massive=True, group_params= {'Group_1': {'threshold':10, 'prune' :False}, 'Group_2': {'threshold':10, 'prune' :True}}) >>> res = obcpd.fit_predict(data=df, key='ID', endog='y', group_key='GROUP_ID')
Valid only when
massive
is True and defaults to None.
Examples
Input Data:
>>> df.collect() ID VAL 0 0 9.926943 1 1 9.262971 2 2 9.715766 3 3 9.944334 4 4 9.577682 5 5 10.036977 6 6 9.513112 7 7 10.233246 8 8 10.159134 9 9 9.759518 .......
Create an OnlineBCPD instance:
>>> obcpd = OnlineBCPD(alpha=0.1, beta=0.01, kappa=1.0, mu=0.0, delay=5, threshold=0.5, prune=True)
Invoke fit_predict():
>>> model, cp = obcpd.fit_predict(data=df, model=None)
Output:
>>> print(model.head(5).collect()) ID ALPHA BETA KAPPA MU PROB 0 0 0.1 0.010000 1.0 0.000000 4.000000e-03 1 1 0.6 71.013179 2.0 8.426338 6.478577e-05 2 2 1.1 86.966340 3.0 10.732357 7.634862e-06 3 3 1.6 100.514641 4.0 12.235038 1.540977e-06 4 4 2.1 107.197565 5.0 13.052529 3.733699e-07 >>> print(cp.collect()) ID POSITION PROBABILITY 0 0 58 0.989308 1 1 249 0.991023 2 2 402 0.994154 3 3 539 0.981004 4 4 668 0.994708
- Attributes:
- model_DateFrame
Model.
- error_msg_DataFrame
Error message. Only valid if
massive
is True when initializing an 'OnlineBCPD' instance.
Methods
fit_predict
(data[, key, endog, model, group_key])Detects change-points of the input data.
Get the model metrics.
Get the score metrics.
Gets the statistics.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
- fit_predict(data, key=None, endog=None, model=None, group_key=None)
Detects change-points of the input data.
- Parameters:
- dataDataFrame
Input time-series data for change-point detection.
- keystr, optional
Column name for time-stamp of the input time-series data.
If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.
If the index of data is set as a single column, the default value of key is index column of data.
- endogstr, optional
Column name for the value of the input time-series data. Defaults to the first non-key column.
- modelDataFrame, optional
The model for change point detection.
Defaults to self.model_ (the default value of self.model_ is None).
- group_keystr, optional
The column of group_key. The data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.
This parameter is only valid when
massive
is True.Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- Returns:
- DataFrame 1
Model.
- DataFrame 2
The detected change points.
- get_stats()
Gets the statistics.
- Returns:
- DataFrame
Statistics.
Inherited Methods from PALBase
Besides those methods mentioned above, the OnlineBCPD class also inherits methods from PALBase class, please refer to PAL Base for more details.