TimeSeriesClassification
- class hana_ml.algorithms.pal.tsa.classification.TimeSeriesClassification(classification_method='LogisticRegression', transform_method='MiniRocket', **kwargs)
Time series classification.
- Parameters
- classification_methodstr, optional
The options is "LogisticRegression".
Defaults to "LogisticRegression".
- transform_methodstr, optional
The options are "MiniRocket" and "MultiRocket".
Defaults to "MiniRocket".
- **kwargskeyword arguments
Arbitrary keyword arguments and please referred to the responding algorithm for the parameters' key-value pair.
For "MiniRocket"/ "MultiRocket":
- num_featuresint, optional, Number of transformed features for each time series.
Defaults to 9996 when
transform_method
= "MiniRocket", 49728 whentransform_method
= "MultiRocket".
- data_dimint, optional
Dimensionality of the multivariate time series. 1 means univariate time series and others for multivariate. Cannot be smaller than 1. Defaults to 1.
- random_seedint, optional
0 indicates using machine time as seed. Defaults to 0.
Examples
Example 1: Univariate time series fitted and transformed by MiniRocket Input dataframe is df:
>>> df.collect() RECORD_ID VAL_1 VAL_2 VAL_3 VAL_4 VAL_5 VAL_6 ... VAL_10 VAL_11 VAL_12 VAL_13 VAL_14 VAL_15 VAL_16 0 0 1.598 1.599 1.571 1.550 1.507 1.434 ... 1.117 1.024 0.926 0.828 0.739 0.643 0.556 1 1 1.701 1.671 1.619 1.547 1.475 1.391 ... 1.070 0.985 0.899 0.816 0.733 0.658 0.581 2 2 1.722 1.695 1.657 1.606 1.512 1.414 ... 1.015 0.920 0.828 0.740 0.658 0.586 0.501 3 3 1.726 1.660 1.573 1.496 1.409 1.332 ... 0.987 0.901 0.815 0.730 0.644 0.558 0.484 4 4 1.779 1.761 1.703 1.611 1.492 1.369 ... 0.900 0.786 0.679 0.580 0.502 0.415 0.333 5 5 1.800 1.743 1.686 1.633 1.532 1.423 ... 0.979 0.872 0.767 0.664 0.561 0.453 0.355 6 6 1.749 1.727 1.659 1.560 1.457 1.355 ... 0.961 0.864 0.771 0.682 0.595 0.513 0.427 7 7 1.348 1.237 1.129 1.022 0.939 0.847 ... 0.474 0.388 0.306 0.218 0.133 0.061 0.009 8 8 1.696 1.634 1.596 1.507 1.414 1.323 ... 1.048 0.966 0.890 0.805 0.719 0.632 0.553 9 9 1.723 1.713 1.665 1.587 1.495 1.404 ... 1.041 0.955 0.870 0.787 0.706 0.622 0.547 10 10 1.614 1.574 1.557 1.521 1.460 1.406 ... 1.045 0.957 0.862 0.771 0.681 0.587 0.497 11 11 1.652 1.665 1.656 1.623 1.571 1.499 ... 1.155 1.058 0.973 0.877 0.797 0.704 0.609
The Dataframe of label of time series:
>>> label_df.collect() DATA_ID LABEL 0 0 A 1 1 B 2 2 C 3 3 A 4 4 B 5 5 C 6 6 A 7 7 B 8 8 C 9 9 B 10 10 C 11 11 A
Create an instance of TimeSeriesClassification:
>>> tsc = TimeSeriesClassification(classification_method="LogisticRegression", transform_method = "MiniRocket", random_seed=1)
Performing fit() on the given dataframe:
>>> tsc.fit(data=df)
Output:
>>> tsc.model_.collect() ID MODEL_CONTENT 0 -1 MiniRocket 1 0 {"SERIES_LENGTH":16,"NUM_CHANNELS":1,"BIAS_SIZ... 2 1 3005121315,1.685720499622002,2.819106917236017... 3 2 00610192183,1.4931236298379538,-4.462113103585... 4 3 9374860881,-6.2434692203217339,0.6595998500205... .. ... ... 117 116 0.0,-0.17856812090682684,0.05522285367663122,0... 118 117 0800557345022,0.3662488788249087,0.0,-0.062115... 119 118 41608,0.0,0.0,0.0,0.0,0.06350326307975242,0.77... 120 119 53600703,0.0,-0.7431206589182244,0.72227213245... 121 120 90682684,0.05522285367663122,0.0,0.0,0.0,0.0,0...
>> tsc.statistics_.collect()
STAT_NAME STAT_VALUE
0 MINIROCKET_TRANSFORM_TIME 0.010s 1 TRAINING_TIME 0.043s 2 TRAINING_ACCURACY 1 3 TRAINING_OBJ 6.45594e-14 4 TRAINING_ITER 56
Make a prediction:
>>> result = tsc.predict(data=df) >>> result.collect() ID CLASS PROBABILITY 0 0 A 1.0 1 1 B 1.0 2 2 C 1.0 3 3 A 1.0 4 4 B 1.0 5 5 C 1.0 6 6 A 1.0 7 7 B 1.0 8 8 C 1.0 9 9 B 1.0 10 10 C 1.0 11 11 A 1.0
Example 2: Multivariate time series (with dimensionality 8) fitted and transformed by MultiRocket Input dataframe is df:
>>> df.collect() RECORD_ID VAL_1 VAL_2 VAL_3 VAL_4 VAL_5 VAL_6 ... VAL_10 VAL_11 VAL_12 VAL_13 VAL_14 VAL_15 VAL_16 0 0 1.645 1.646 1.621 1.585 1.540 1.470 ... 1.161 1.070 0.980 0.893 0.798 0.705 0.620 1 1 1.704 1.705 1.706 1.680 1.632 1.560 ... 1.186 1.090 0.994 0.895 0.799 0.702 0.605 2 2 1.699 1.666 1.621 1.538 1.454 1.357 ... 0.979 0.885 0.793 0.706 0.623 0.541 0.460 3 3 1.709 1.663 1.580 1.497 1.413 1.330 ... 0.997 0.913 0.831 0.748 0.665 0.582 0.509 4 4 1.687 1.688 1.674 1.619 1.531 1.439 ... 1.069 0.977 0.900 0.810 0.722 0.644 0.557 ...... 27 27 1.697 1.665 1.590 1.508 1.424 1.341 ... 1.009 0.926 0.844 0.760 0.678 0.595 0.513 28 28 1.406 1.320 1.234 1.148 1.063 0.978 ... 0.642 0.558 0.477 0.396 0.314 0.234 0.153 29 29 1.592 1.593 1.571 1.551 1.527 1.475 ... 1.160 1.058 0.956 0.859 0.763 0.668 0.574 30 30 1.688 1.648 1.570 1.490 1.408 1.327 ... 1.011 0.930 0.849 0.768 0.687 0.606 0.524 31 31 1.708 1.663 1.595 1.504 1.411 1.318 ... 0.951 0.861 0.794 0.704 0.614 0.529 0.446
The Dataframe of label of time series:
>>> label_df.collect() DATA_ID LABEL 0 0 A 1 1 B 2 2 C 3 3 A
Create an instance of TimeSeriesClassification:
>>> tscm = TimeSeriesClassification(classification_method="LogisticRegression", transform_method = "MultiRocket", data_dim=8, random_seed=1)
Performing fit() on the given dataframe:
>>> tscm.fit(data=df)
Output:
>>> tscm.model_.collect() ID MODEL_CONTENT 0 -1 MultiRocket 1 0 {"SERIES_LENGTH":16,"NUM_CHANNELS":8,"BIAS_SIZ... 2 1 HANNELS":[6]},{"ID":77,"CHANNELS":[1,4,7,6,5]}... 3 2 340878522815215,7.959895076819708,5.8147048859... 4 3 944001223,-18.05915327183857,9.197905923784694... .. ... ... 533 532 .007475253911975343,0.0,-0.0004262211051778646... 534 533 4830497,-0.000050048412881394739,0.0,0.0000110... 535 534 -0.00244554242820342,0.0037744973091916455,0.0... 536 535 97,0.0,-0.00013759337618238812,0.0000837962572... 537 536 99,-0.000006703700582543153,0.0,-0.00132556403...
>>> tscm.statistics_.collect() STAT_NAME STAT_VALUE 0 MULTIROCKET_TRANSFORM_TIME 0.005s 1 TRAINING_TIME 0.147s 2 TRAINING_ACCURACY 1 3 TRAINING_OBJ 7.96585e-14 4 TRAINING_ITER 48
Make a prediction:
>>> result = tscm.predict(data=df) >>> result.collect() ID CLASS PROBABILITY 0 0 A 1.0 1 1 B 1.0 2 2 C 1.0 3 3 A 1.0
- Attributes
- model_DataFrame
Trained model content.
- statistics_DataFrame
Names and values of statistics.
- forecast_DataFrame
Forecast values.
Methods
fit
(data[, label, key, thread_ratio])Trains a time series classification model with given time series and labels.
predict
(data[, key, thread_ratio])Predicts the classes of given time series.
- fit(data, label=None, key=None, thread_ratio=None)
Trains a time series classification model with given time series and labels.
- Parameters
- dataDataFrame
Input data. When transform_method="MiniRocket", for univariate time series, each row represents one time series. when transform_method="MultiRocket", for multivariate time series , a fixed number of consecutive rows forms one time series, and that number is designated by the parameter
data_dim
when initialize a TimeSeriesClassification instance.- labelDataFrame, optional
The label of time series. If classification_method = "LogisticRegression" and transform_method is "MiniRocket"/"MultiRocket", label is a mandatory parameter.
- keystr, optional
The ID column.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- thread_ratiofloat, optional
Controls the proportion of available threads to use. The ratio of available threads.
0: single thread.
0~1: percentage.
Others: heuristically determined.
Defaults to 1.0.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- predict(data, key=None, thread_ratio=None)
Predicts the classes of given time series.
- Parameters
- dataDataFrame
Input data.
For univariate time series, each row represents one time series, while for multivariate time series, a fixed number of consecutive rows forms one time series, and that number is designated by the parameter
data_dim
when transform_method = "rocket" is used.- keystr, optional
The ID column.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- thread_ratiofloat, optional
Controls the proportion of available threads to use. The ratio of available threads.
0: single thread.
0~1: percentage.
Others: heuristically determined.
Defaults to 1.0.
- Returns
- DataFrame
Prediction.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.