CPD
- class hana_ml.algorithms.pal.tsa.changepoint.CPD(cost=None, penalty=None, solver=None, lamb=None, min_size=None, min_sep=None, max_k=None, dispersion=None, lamb_range=None, max_iter=None, range_penalty=None, value_penalty=None)
Change-point detection (CPDetection) methods aim at detecting multiple abrupt changes such as change in mean, variance or distribution in an observed time-series data.
- Parameters:
- cost{'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential', 'normal_m', 'negbinomial'}, optional
The cost function for change-point detection.
Defaults to 'normal_mse'.
- penalty{'aic', 'bic', 'mbic', 'oracle', 'custom'}, optional
The penalty function for change-point detection.
- Defaults to
(1)'aic' if
solver
is 'pruneddp', 'pelt' or 'opt', (2)'custom' ifsolver
is 'adppelt'.
- solver{'pelt', 'opt', 'adppelt', 'pruneddp'}, optional
Method for finding change-points of given data, cost and penalty. Each solver supports different cost and penalty functions.
For cost functions, 'pelt', 'opt' and 'adpelt' support the following eight: 'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential'; while 'pruneddp' supports the following four cost functions: 'poisson', 'exponential', 'normal_m', 'negbinomial'.
For penalty functions, 'pruneddp' supports all penalties, 'pelt', 'opt' and 'adppelt' support the following three: 'aic','bic','custom', while 'adppelt' only supports 'custom' cost.
Defaults to 'pelt'.
- lambfloat, optional
Assigned weight of the penalty w.r.t. the cost function, i.e. penalization factor. It can be seen as trade-off between speed and accuracy of running the detection algorithm. A small values (usually less than 0.1) will dramatically improve the efficiency.
Defaults to 0.02, and valid only when
solver
is 'pelt' or 'adppelt'.- min_sizeint, optional
The minimal length from the very beginning within which change would not happen. Valid only when
solver
is 'opt', 'pelt' or 'adppelt'.Defaults to 2.
- min_sepint, optional
The minimal length of separation between consecutive change-points. Defaults to 1, valid only when
solver
is 'opt', 'pelt' or 'adppelt'.- max_kint, optional
The maximum number of change-points to be detected. If the given value is less than 1, this number would be determined automatically from the input data.
Defaults to 0, valid only when
solver
is 'pruneddp'.- dispersionfloat, optinal
Dispersion coefficient for Gamma and negative binomial distribution. Valid only when cost is 'gamma' or 'negbinomial'.
Defaults to 1.0.
- lamb_rangelist of two numerical(float and int) values, optional(deprecated)
User-defined range of penalty. Only valid when
solver
is 'adppelt'.Deprecated, please use
range_penalty
instead.- max_iterint, optional
Maximum number of iterations for searching the best penalty. Valid only when
solver
is 'adppelt'.Defaults to 40.
- range_penaltylist of two numerical values, optional
User-defined range of penalty. Valid only when
solver
is 'adppelt' andvalue_penalty
is not provided.Defaults to [0.01, 100].
- value_penaltyfloat, optional
Value of user-defined penalty. Valid when
penalty
is 'custom' orsolver
is 'adppelt'.No default value.
Examples
>>> cpd = CPD(solver='pelt', ... cost='normal_mse', ... penalty='aic', ... lamb=0.02)
Perform fit_predict() and check the results:
>>> cp = cpd.fit_predict(data=df) >>> cp.collect() >>> cpd.stats_.collect()
- Attributes:
- stats_DataFrame
Statistics.
Methods
fit_predict
(data[, key, features])Detecting change-points of the input data.
Get the model metrics.
Get the score metrics.
- fit_predict(data, key=None, features=None)
Detecting change-points of the input data.
- Parameters:
- dataDataFrame
Input time-series data for change-point detection.
- keystr, optional
Column name for time-stamp of the input time-series data.
If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.
If the index of data is set as a single column, the default value of key is index column of data.
- featuresstr or a list of str, optional
Column name(s) for the value(s) of the input time-series data.
- Returns:
- DataFrame
Detected the change-points of the input time-series data.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the CPD class also inherits methods from PALBase class, please refer to PAL Base for more details.