CPD
- class hana_ml.algorithms.pal.tsa.changepoint.CPD(cost=None, penalty=None, solver=None, lamb=None, min_size=None, min_sep=None, max_k=None, dispersion=None, lamb_range=None, max_iter=None, range_penalty=None, value_penalty=None)
Change-point detection (CPDetection) methods aim at detecting multiple abrupt changes such as change in mean, variance or distribution in an observed time-series data.
- Parameters:
- cost{'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential', 'normal_m', 'negbinomial'}, optional
The cost function for change-point detection.
Defaults to 'normal_mse'.
- penalty{'aic', 'bic', 'mbic', 'oracle', 'custom'}, optional
The penalty function for change-point detection.
- Defaults to
(1)'aic' if
solveris 'pruneddp', 'pelt' or 'opt', (2)'custom' ifsolveris 'adppelt'.
- solver{'pelt', 'opt', 'adppelt', 'pruneddp'}, optional
Method for finding change-points of given data, cost and penalty. Each solver supports different cost and penalty functions.
For cost functions, 'pelt', 'opt' and 'adpelt' support the following eight: 'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential'; while 'pruneddp' supports the following four cost functions: 'poisson', 'exponential', 'normal_m', 'negbinomial'.
For penalty functions, 'pruneddp' supports all penalties, 'pelt', 'opt' and 'adppelt' support the following three: 'aic','bic','custom', while 'adppelt' only supports 'custom' cost.
Defaults to 'pelt'.
- lambfloat, optional
Assigned weight of the penalty w.r.t. the cost function, i.e. penalization factor. It can be seen as trade-off between speed and accuracy of running the detection algorithm. A small values (usually less than 0.1) will dramatically improve the efficiency.
Defaults to 0.02, and valid only when
solveris 'pelt' or 'adppelt'.- min_sizeint, optional
The minimal length from the very beginning within which change would not happen. Valid only when
solveris 'opt', 'pelt' or 'adppelt'.Defaults to 2.
- min_sepint, optional
The minimal length of separation between consecutive change-points. Defaults to 1, valid only when
solveris 'opt', 'pelt' or 'adppelt'.- max_kint, optional
The maximum number of change-points to be detected. If the given value is less than 1, this number would be determined automatically from the input data.
Defaults to 0, valid only when
solveris 'pruneddp'.- dispersionfloat, optinal
Dispersion coefficient for Gamma and negative binomial distribution. Valid only when cost is 'gamma' or 'negbinomial'.
Defaults to 1.0.
- lamb_rangelist of two numerical(float and int) values, optional(deprecated)
User-defined range of penalty. Only valid when
solveris 'adppelt'.Deprecated, please use
range_penaltyinstead.- max_iterint, optional
Maximum number of iterations for searching the best penalty. Valid only when
solveris 'adppelt'.Defaults to 40.
- range_penaltylist of two numerical values, optional
User-defined range of penalty. Valid only when
solveris 'adppelt' andvalue_penaltyis not provided.Defaults to [0.01, 100].
- value_penaltyfloat, optional
Value of user-defined penalty. Valid when
penaltyis 'custom' orsolveris 'adppelt'.No default value.
- Attributes:
- stats_DataFrame
Statistics.
Methods
fit_predict(data[, key, features])Detecting change-points of the input data.
Examples
>>> cpd = CPD(solver='pelt', ... cost='normal_mse', ... penalty='aic', ... lamb=0.02)
Perform fit_predict() and check the results:
>>> cp = cpd.fit_predict(data=df) >>> cp.collect() >>> cpd.stats_.collect()
- fit_predict(data, key=None, features=None)
Detecting change-points of the input data.
- Parameters:
- dataDataFrame
Input time-series data for change-point detection.
- keystr, optional
Column name for time-stamp of the input time-series data.
If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.
If the index of data is set as a single column, the default value of key is index column of data.
- featuresstr or a list of str, optional
Column name(s) for the value(s) of the input time-series data.
- Returns:
- DataFrame
Detected the change-points of the input time-series data.
Inherited Methods from PALBase
Besides those methods mentioned above, the CPD class also inherits methods from PALBase class, please refer to PAL Base for more details.