OnlineBCPD
- class hana_ml.algorithms.pal.tsa.changepoint.OnlineBCPD(alpha=None, beta=None, kappa=None, mu=None, lamb=None, threshold=None, delay=None, prune=None, massive=False, group_params=None)
Online Bayesian Change-point detection.
- Parameters:
- alphafloat, optional
Parameter of t-distribution.
Defaults to 0.1.
- betafloat, optional
Parameter of t-distribution.
Defaults to 0.01.
- kappafloat, optional
Parameter of t-distribution.
Defaults to 1.0.
- mufloat, optional
Parameter of t-distribution.
Defaults to 0.0.
- lambfloat, optional
Parameter of constant hazard function.
Defaults to 250.0.
- thresholdfloat, optional
Threshold to determine a change point:
0: Return the probability of change point for every time step.
0~1: Only return the time step of which the probability is above the threshold.
Defaults to 0.0.
- delayint, optional
Number of incoming time steps to determine whether the current time step is a change point.
Defaults to 3.
- prunebool, optional
Reduce the size of the model table after every run:
False: Do not prune.
True: Prune.
Defaults to False.
- massivebool, optional
Specifies whether or not to use massive mode of OnlineBCPD.
True: Massive mode.
False: Single mode.
For parameter setting in massive mode, you can use both
group_params(please see the example below) or the original parameters. Using original parameters will apply to all groups. However, if you define some parameters for a group, the value of all original parameter settings will not be applicable to such a group.An example is as follows:
>>> obcpd = OnlineBCPD(massive=True, threshold=2, group_params={'Group_1': {'threshold': 10, 'prune': False}}) >>> res = obcpd.fit_predict(data=df, key='ID', endog='y', group_key='GROUP_ID')
In this example, as 'threshold' is set in group_params for Group_1, it is not applicable to Group_1.
Defaults to False.
- group_paramsdict, optional
If massive mode is activated (massive is True), input data shall be divided into different groups with different parameters applied.
An example is as follows:
>>> obcpd = OnlineBCPD(massive=True, group_params={'Group_1': {'threshold': 10, 'prune': False}, 'Group_2': {'threshold': 10, 'prune': True}}) >>> res = obcpd.fit_predict(data=df, key='ID', endog='y', group_key='GROUP_ID')
Valid only when massive is True and defaults to None.
- Attributes:
- model_DataFrame
Model.
- error_msg_DataFrame
Error message. Only valid if massive is True when initializing an OnlineBCPD instance.
Methods
fit_predict(data[, key, endog, model, group_key])Detects change-points of the input data.
Gets the statistics.
Examples
Input Data:
>>> df.collect() ID VAL 0 0 9.926943 1 1 9.262971 2 2 9.715766 3 3 9.944334 4 4 9.577682 5 5 10.036977 6 6 9.513112 7 7 10.233246 8 8 10.159134 9 9 9.759518 .......
Create an OnlineBCPD instance:
>>> obcpd = OnlineBCPD(alpha=0.1, beta=0.01, kappa=1.0, mu=0.0, delay=5, threshold=0.5, prune=True)
Invoke fit_predict():
>>> model, cp = obcpd.fit_predict(data=df, model=None)
Output:
>>> print(model.head(5).collect()) ID ALPHA BETA KAPPA MU PROB 0 0 0.1 0.010000 1.0 0.000000 4.000000e-03 1 1 0.6 71.013179 2.0 8.426338 6.478577e-05 2 2 1.1 86.966340 3.0 10.732357 7.634862e-06 3 3 1.6 100.514641 4.0 12.235038 1.540977e-06 4 4 2.1 107.197565 5.0 13.052529 3.733699e-07 >>> print(cp.collect()) ID POSITION PROBABILITY 0 0 58 0.989308 1 1 249 0.991023 2 2 402 0.994154 3 3 539 0.981004 4 4 668 0.994708
- fit_predict(data, key=None, endog=None, model=None, group_key=None)
Detects change-points of the input data.
- Parameters:
- dataDataFrame
Input time-series data for change-point detection.
- keystr, optional
Column name for time-stamp of the input time-series data.
If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.
If the index of data is set as a single column, the default value of key is index column of data.
- endogstr, optional
Column name for the value of the input time-series data. Defaults to the first non-key column.
- modelDataFrame, optional
The model for change point detection.
Defaults to self.model_ (the default value of self.model_ is None).
- group_keystr, optional
The column of group_key. The data type can be INT or NVARCHAR/VARCHAR. This parameter is only valid when
massiveis True.Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- Returns:
- A tuple of DataFrames:
DataFrame 1
Model.
DataFrame 2
The detected change points.
- get_stats()
Gets the statistics.
- Returns:
- DataFrame
Statistics.
Inherited Methods from PALBase
Besides those methods mentioned above, the OnlineBCPD class also inherits methods from PALBase class, please refer to PAL Base for more details.