CPD

class hana_ml.algorithms.pal.tsa.changepoint.CPD(cost=None, penalty=None, solver=None, lamb=None, min_size=None, min_sep=None, max_k=None, dispersion=None, lamb_range=None, max_iter=None, range_penalty=None, value_penalty=None)

Change-point detection (CPDetection) methods aim at detecting multiple abrupt changes such as change in mean, variance or distribution in an observed time-series data.

Parameters:
cost{'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential', 'normal_m', 'negbinomial'}, optional

The cost function for change-point detection.

Defaults to 'normal_mse'.

penalty{'aic', 'bic', 'mbic', 'oracle', 'custom'}, optional

The penalty function for change-point detection.

Defaults to

(1)'aic' if solver is 'pruneddp', 'pelt' or 'opt', (2)'custom' if solver is 'adppelt'.

solver{'pelt', 'opt', 'adppelt', 'pruneddp'}, optional

Method for finding change-points of given data, cost and penalty. Each solver supports different cost and penalty functions.

    1. For cost functions, 'pelt', 'opt' and 'adpelt' support the following eight: 'normal_mse', 'normal_rbf', 'normal_mhlb', 'normal_mv', 'linear', 'gamma', 'poisson', 'exponential'; while 'pruneddp' supports the following four cost functions: 'poisson', 'exponential', 'normal_m', 'negbinomial'.

    1. For penalty functions, 'pruneddp' supports all penalties, 'pelt', 'opt' and 'adppelt' support the following three: 'aic','bic','custom', while 'adppelt' only supports 'custom' cost.

Defaults to 'pelt'.

lambfloat, optional

Assigned weight of the penalty w.r.t. the cost function, i.e. penalization factor. It can be seen as trade-off between speed and accuracy of running the detection algorithm. A small values (usually less than 0.1) will dramatically improve the efficiency.

Defaults to 0.02, and valid only when solver is 'pelt' or 'adppelt'.

min_sizeint, optional

The minimal length from the very beginning within which change would not happen. Valid only when solver is 'opt', 'pelt' or 'adppelt'.

Defaults to 2.

min_sepint, optional

The minimal length of separation between consecutive change-points. Defaults to 1, valid only when solver is 'opt', 'pelt' or 'adppelt'.

max_kint, optional

The maximum number of change-points to be detected. If the given value is less than 1, this number would be determined automatically from the input data.

Defaults to 0, valid only when solver is 'pruneddp'.

dispersionfloat, optinal

Dispersion coefficient for Gamma and negative binomial distribution. Valid only when cost is 'gamma' or 'negbinomial'.

Defaults to 1.0.

lamb_rangelist of two numerical(float and int) values, optional(deprecated)

User-defined range of penalty. Only valid when solver is 'adppelt'.

Deprecated, please use range_penalty instead.

max_iterint, optional

Maximum number of iterations for searching the best penalty. Valid only when solver is 'adppelt'.

Defaults to 40.

range_penaltylist of two numerical values, optional

User-defined range of penalty. Valid only when solver is 'adppelt' and value_penalty is not provided.

Defaults to [0.01, 100].

value_penaltyfloat, optional

Value of user-defined penalty. Valid when penalty is 'custom' or solver is 'adppelt'.

No default value.

Examples

>>> cpd = CPD(solver='pelt',
...           cost='normal_mse',
...           penalty='aic',
...           lamb=0.02)

Perform fit_predict() and check the results:

>>> cp = cpd.fit_predict(data=df)
>>> cp.collect()
>>> cpd.stats_.collect()
Attributes:
stats_DataFrame

Statistics.

Methods

fit_predict(data[, key, features])

Detecting change-points of the input data.

get_model_metrics()

Get the model metrics.

get_score_metrics()

Get the score metrics.

fit_predict(data, key=None, features=None)

Detecting change-points of the input data.

Parameters:
dataDataFrame

Input time-series data for change-point detection.

keystr, optional

Column name for time-stamp of the input time-series data.

If the index column of data is not provided or not a single column, and the key of fit_predict function is not provided, the default value is the first column of data.

If the index of data is set as a single column, the default value of key is index column of data.

featuresstr or a list of str, optional

Column name(s) for the value(s) of the input time-series data.

Returns:
DataFrame

Detected the change-points of the input time-series data.

get_model_metrics()

Get the model metrics.

Returns:
DataFrame

The model metrics.

get_score_metrics()

Get the score metrics.

Returns:
DataFrame

The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the CPD class also inherits methods from PALBase class, please refer to PAL Base for more details.