TimeSeriesClassification

class hana_ml.algorithms.pal.tsa.classification.TimeSeriesClassification(classification_method='LogisticRegression', transform_method='MiniRocket', **kwargs)

Time series classification. Currently, we support RandOm Convolutional KErnel Transform (ROCKET) in this class. Unlike other proposed time series classification algorithms which attain excellent accuracy, ROCKET maintains its performance with a fraction of the computational expense by transforming input time series using random convolutional kernels. Afterwards, a simple classifier is enough to train the transformed features and render the state-of-the-art accuracy. Specifically, in our implementation, the multi-class logistic regression is employed. Instead of the original ROKCET algorithm, we provide two variants of it to transform the input time series, that is, MiniRocket and MultiRocket. MiniRokcet is even faster than ROCKET while maintaining essentially the same accuracy. MultiRocket takes the first order difference of the input time series and improves the diversity of generated features to boost the accuracy with the extra cost of time and space. Particularly, this function supports both univariate and multivariate time series.

Parameters:

classification_methodstr, optional

The options is "LogisticRegression".

Defaults to "LogisticRegression".

transform_methodstr, optional

The options are "MiniRocket" and "MultiRocket".

Defaults to "MiniRocket".

**kwargskeyword arguments

Arbitrary keyword arguments and please referred to the responding algorithm for the parameters' key-value pair.

For "MiniRocket"/ "MultiRocket":

num_features : int, optional

Number of transformed features for each time series.

Defaults to 9996 when transform_method is "MiniRocket", 49728 when transform_method is "MultiRocket".
data_dim : int, optional

Dimensionality of the multivariate time series.

1 means univariate time series and others for multivariate. Cannot be smaller than 1.

Defaults to 1.
random_seed : int, optional

0 indicates using machine time as seed.

Defaults to 0.

Examples

Example 1: Univariate time series fitted and transformed by MiniRocket Input DataFrame:

>>> df.collect()
    RECORD_ID  VAL_1  VAL_2  VAL_3  VAL_4  VAL_5  VAL_6  ...  VAL_10  VAL_11  VAL_12  VAL_13  VAL_14  VAL_15  VAL_16
0           0  1.598  1.599  1.571  1.550  1.507  1.434  ...   1.117   1.024   0.926   0.828   0.739   0.643   0.556
1           1  1.701  1.671  1.619  1.547  1.475  1.391  ...   1.070   0.985   0.899   0.816   0.733   0.658   0.581
...
11         11  1.652  1.665  1.656  1.623  1.571  1.499  ...   1.155   1.058   0.973   0.877   0.797   0.704   0.609

The Dataframe of label of time series:

>>> label_df.collect()
    DATA_ID LABEL
0         0     A
1         1     B
...
11       11     A

Create an instance of TimeSeriesClassification:

>>> tsc = TimeSeriesClassification(classification_method="LogisticRegression",
                                   transform_method="MiniRocket")

Perform fit():

>>> tsc.fit(data=df, label=label_df)

Output:

>>> tsc.model_.collect()
      ID                                      MODEL_CONTENT
0     -1                                         MiniRocket
1      0  {"SERIES_LENGTH":16,"NUM_CHANNELS":1,"BIAS_SIZ...
2      1  3005121315,1.685720499622002,2.819106917236017...
..   ...                                                ...
121  120  90682684,0.05522285367663122,0.0,0.0,0.0,0.0,0...
>>> tsc.statistics_.collect()
                   STAT_NAME   STAT_VALUE
0  MINIROCKET_TRANSFORM_TIME       0.010s
1              TRAINING_TIME       0.043s
2          TRAINING_ACCURACY            1
3               TRAINING_OBJ  6.45594e-14
4              TRAINING_ITER           56

Perform predict():

>>> result = tsc.predict(data=df)
>>> result.collect()
    ID CLASS  PROBABILITY
0    0     A          1.0
1    1     B          1.0
...
11  11     A          1.0

Example 2: Multivariate time series (with dimensionality 8) fitted and transformed by MultiRocket Input DataFrame:

>>> df.collect()
    RECORD_ID  VAL_1  VAL_2  VAL_3  VAL_4  VAL_5  VAL_6  ...  VAL_10  VAL_11  VAL_12  VAL_13  VAL_14  VAL_15  VAL_16
0           0  1.645  1.646  1.621  1.585  1.540  1.470  ...   1.161   1.070   0.980   0.893   0.798   0.705   0.620
1           1  1.704  1.705  1.706  1.680  1.632  1.560  ...   1.186   1.090   0.994   0.895   0.799   0.702   0.605
...
31         31  1.708  1.663  1.595  1.504  1.411  1.318  ...   0.951   0.861   0.794   0.704   0.614   0.529   0.446

The Dataframe of label of time series:

>>> label_df.collect()
   DATA_ID LABEL
0        0     A
1        1     B
2        2     C
3        3     A

Create an instance of TimeSeriesClassification:

>>> tscm = TimeSeriesClassification(classification_method="LogisticRegression",
                                    transform_method = "MultiRocket",
                                    data_dim=8,
                                    random_seed=1)

Perform fit():

>>> tscm.fit(data=df, label=label_df)

Output:

>>> tscm.model_.collect()
      ID                                      MODEL_CONTENT
0     -1                                        MultiRocket
1      0  {"SERIES_LENGTH":16,"NUM_CHANNELS":8,"BIAS_SIZ...
2      1  HANNELS":[6]},{"ID":77,"CHANNELS":[1,4,7,6,5]}...
...
537  536  99,-0.000006703700582543153,0.0,-0.00132556403...

>>> tscm.statistics_.collect()
                    STAT_NAME   STAT_VALUE
MULTIROCKET_TRANSFORM_TIME       0.005s
             TRAINING_TIME       0.147s
         TRAINING_ACCURACY            1
              TRAINING_OBJ  7.96585e-14
             TRAINING_ITER           48

Perform predict():

>>> tscm.predict(data=df_predict).collect()
   ID CLASS  PROBABILITY
0   0     A          1.0
1   1     B          1.0
2   2     C          1.0
3   3     A          1.0

Attributes:

model_DataFrame: Model content.
statistics_DataFrame: Statistics.
forecast_DataFrame: Forecast values.

Methods

`fit`(data[, label, key, thread_ratio])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_score_metrics`()	Get the score metrics.
`predict`(data[, key, thread_ratio])	Generates time series forecasts based on the fitted model.

fit(data, label=None, key=None, thread_ratio=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Input data. When transform_method="MiniRocket", for univariate time series, each row represents one time series. when transform_method="MultiRocket", for multivariate time series , a fixed number of consecutive rows forms one time series, and that number is designated by the parameter data_dim when initializing a TimeSeriesClassification instance.

labelDataFrame, optional

The label of time series. If classification_method is "LogisticRegression" and transform_method is "MiniRocket"/"MultiRocket", label is a mandatory parameter.

keystr, optional

The ID column.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 1.0.

Returns:

A fitted object of class "TimeSeriesClassification".

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

predict(data, key=None, thread_ratio=None)

Generates time series forecasts based on the fitted model.

Parameters:

dataDataFrame

Input data. For univariate time series, each row represents one time series, while for multivariate time series, a fixed number of consecutive rows forms one time series, and that number is designated by the parameter data_dim when initializing a TimeSeriesClassification instance.

keystr, optional

The ID column.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

thread_ratiofloat, optional

Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.

Defaults to 1.0.

Returns:

DataFrame: Prediction.

Inherited Methods from PALBase

Besides those methods mentioned above, the TimeSeriesClassification class also inherits methods from PALBase class, please refer to PAL Base for more details.