TimeSeriesClassification
- class hana_ml.algorithms.pal.tsa.classification.TimeSeriesClassification(classification_method='LogisticRegression', transform_method='MiniRocket', **kwargs)
Time series classification. Currently, we support RandOm Convolutional KErnel Transform (ROCKET) in this class. Unlike other proposed time series classification algorithms which attain excellent accuracy, ROCKET maintains its performance with a fraction of the computational expense by transforming input time series using random convolutional kernels. Afterwards, a simple classifier is enough to train the transformed features and render the state-of-the-art accuracy. Specifically, in our implementation, the multi-class logistic regression is employed. Instead of the original ROKCET algorithm, we provide two variants of it to transform the input time series, that is, MiniRocket and MultiRocket. MiniRokcet is even faster than ROCKET while maintaining essentially the same accuracy. MultiRocket takes the first order difference of the input time series and improves the diversity of generated features to boost the accuracy with the extra cost of time and space. Particularly, this function supports both univariate and multivariate time series.
- Parameters:
- classification_methodstr, optional
The options is "LogisticRegression".
Defaults to "LogisticRegression".
- transform_methodstr, optional
The options are "MiniRocket" and "MultiRocket".
Defaults to "MiniRocket".
- **kwargskeyword arguments
Arbitrary keyword arguments and please referred to the responding algorithm for the parameters' key-value pair.
For "MiniRocket"/ "MultiRocket":
num_features : int, optional
Number of transformed features for each time series.
Defaults to 9996 when
transform_method
is "MiniRocket", 49728 whentransform_method
is "MultiRocket".data_dim : int, optional
Dimensionality of the multivariate time series.
1 means univariate time series and others for multivariate. Cannot be smaller than 1.
Defaults to 1.
random_seed : int, optional
0 indicates using machine time as seed.
Defaults to 0.
Examples
Example 1: Univariate time series fitted and transformed by MiniRocket Input DataFrame:
>>> df.collect() RECORD_ID VAL_1 VAL_2 VAL_3 VAL_4 VAL_5 VAL_6 ... VAL_10 VAL_11 VAL_12 VAL_13 VAL_14 VAL_15 VAL_16 0 0 1.598 1.599 1.571 1.550 1.507 1.434 ... 1.117 1.024 0.926 0.828 0.739 0.643 0.556 1 1 1.701 1.671 1.619 1.547 1.475 1.391 ... 1.070 0.985 0.899 0.816 0.733 0.658 0.581 ... 11 11 1.652 1.665 1.656 1.623 1.571 1.499 ... 1.155 1.058 0.973 0.877 0.797 0.704 0.609
The Dataframe of label of time series:
>>> label_df.collect() DATA_ID LABEL 0 0 A 1 1 B ... 11 11 A
Create an instance of TimeSeriesClassification:
>>> tsc = TimeSeriesClassification(classification_method="LogisticRegression", transform_method="MiniRocket")
Perform fit():
>>> tsc.fit(data=df, label=label_df)
Output:
>>> tsc.model_.collect() ID MODEL_CONTENT 0 -1 MiniRocket 1 0 {"SERIES_LENGTH":16,"NUM_CHANNELS":1,"BIAS_SIZ... 2 1 3005121315,1.685720499622002,2.819106917236017... .. ... ... 121 120 90682684,0.05522285367663122,0.0,0.0,0.0,0.0,0... >>> tsc.statistics_.collect() STAT_NAME STAT_VALUE 0 MINIROCKET_TRANSFORM_TIME 0.010s 1 TRAINING_TIME 0.043s 2 TRAINING_ACCURACY 1 3 TRAINING_OBJ 6.45594e-14 4 TRAINING_ITER 56
Perform predict():
>>> result = tsc.predict(data=df) >>> result.collect() ID CLASS PROBABILITY 0 0 A 1.0 1 1 B 1.0 ... 11 11 A 1.0
Example 2: Multivariate time series (with dimensionality 8) fitted and transformed by MultiRocket Input DataFrame:
>>> df.collect() RECORD_ID VAL_1 VAL_2 VAL_3 VAL_4 VAL_5 VAL_6 ... VAL_10 VAL_11 VAL_12 VAL_13 VAL_14 VAL_15 VAL_16 0 0 1.645 1.646 1.621 1.585 1.540 1.470 ... 1.161 1.070 0.980 0.893 0.798 0.705 0.620 1 1 1.704 1.705 1.706 1.680 1.632 1.560 ... 1.186 1.090 0.994 0.895 0.799 0.702 0.605 ... 31 31 1.708 1.663 1.595 1.504 1.411 1.318 ... 0.951 0.861 0.794 0.704 0.614 0.529 0.446
The Dataframe of label of time series:
>>> label_df.collect() DATA_ID LABEL 0 0 A 1 1 B 2 2 C 3 3 A
Create an instance of TimeSeriesClassification:
>>> tscm = TimeSeriesClassification(classification_method="LogisticRegression", transform_method = "MultiRocket", data_dim=8, random_seed=1)
Perform fit():
>>> tscm.fit(data=df, label=label_df)
Output:
>>> tscm.model_.collect() ID MODEL_CONTENT 0 -1 MultiRocket 1 0 {"SERIES_LENGTH":16,"NUM_CHANNELS":8,"BIAS_SIZ... 2 1 HANNELS":[6]},{"ID":77,"CHANNELS":[1,4,7,6,5]}... ... 537 536 99,-0.000006703700582543153,0.0,-0.00132556403...
>>> tscm.statistics_.collect() STAT_NAME STAT_VALUE 0 MULTIROCKET_TRANSFORM_TIME 0.005s 1 TRAINING_TIME 0.147s 2 TRAINING_ACCURACY 1 3 TRAINING_OBJ 7.96585e-14 4 TRAINING_ITER 48
Perform predict():
>>> tscm.predict(data=df_predict).collect() ID CLASS PROBABILITY 0 0 A 1.0 1 1 B 1.0 2 2 C 1.0 3 3 A 1.0
- Attributes:
- model_DataFrame
Model content.
- statistics_DataFrame
Statistics.
- forecast_DataFrame
Forecast values.
Methods
fit
(data[, label, key, thread_ratio])Fit the model to the training dataset.
Get the model metrics.
Get the score metrics.
predict
(data[, key, thread_ratio])Generates time series forecasts based on the fitted model.
- fit(data, label=None, key=None, thread_ratio=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Input data. When transform_method="MiniRocket", for univariate time series, each row represents one time series. when transform_method="MultiRocket", for multivariate time series , a fixed number of consecutive rows forms one time series, and that number is designated by the parameter
data_dim
when initializing a TimeSeriesClassification instance.- labelDataFrame, optional
The label of time series. If classification_method is "LogisticRegression" and transform_method is "MiniRocket"/"MultiRocket", label is a mandatory parameter.
- keystr, optional
The ID column.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 1.0.
- Returns:
- A fitted object of class "TimeSeriesClassification".
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
- predict(data, key=None, thread_ratio=None)
Generates time series forecasts based on the fitted model.
- Parameters:
- dataDataFrame
Input data. For univariate time series, each row represents one time series, while for multivariate time series, a fixed number of consecutive rows forms one time series, and that number is designated by the parameter
data_dim
when initializing a TimeSeriesClassification instance.- keystr, optional
The ID column.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 1.0.
- Returns:
- DataFrame
Prediction.
Inherited Methods from PALBase
Besides those methods mentioned above, the TimeSeriesClassification class also inherits methods from PALBase class, please refer to PAL Base for more details.