LTSF

class hana_ml.algorithms.pal.tsa.ltsf.LTSF(batch_size=None, num_epochs=None, random_seed=None, network_type=None, adjust_learning_rate=None, learning_rate=None, num_levels=None, kernel_size=None, hidden_expansion=None, position_encoding=None, dropout_prob=None)

Long-term time series forecasting (LTSF) is a specialized approach within the realm of predictive analysis, focusing on making predictions for extended periods into the long future. Although traditional algorithms are capable of predicting values in the near future, their performance will deteriorate greatly when it comes to long-term series forecasting. With the help of deep learning, this function implements a novel neural network architecture to achieve the state-of-the-art performance among the PAL family.

Parameters:

network_typestr, optional

The type of network:

'NLinear'.
'DLinear'.
'XLinear'.
'SCINet'.

Defaults to 'NLinear'.

batch_sizeint, optional

The number of pieces of data for training in one iteration.

Defaults to 8.

num_epochsint, optional

The number of training epochs.

Defaults to 1.

random_seedint, optional

0 indicates using machine time as seed.

Defaults to 0.

adjust_learning_rate: bool, optional

Decays the learning rate to its half after every epoch.

False: Do not use.
True: Use.

Defaults to True.

learning_ratefloat, optional

The initial learning rate for Adam optimizer.

Defaults to 0.005.

num_levelsint, optional

The number of levels in the network architecture. This parameter is valid when network_type is 'SCINet'.

Note that if warm_start = True in fit(), then this parameter is not valid.

Defaults to 2.

kernel_sizeint, optional

Kernel size of Conv1d layer. This parameter is valid when network_type is 'SCINet'.

Note that if warm_start = True in fit(), then this parameter is not valid.

Defaults to 3.

hidden_expansionint, optional

Expands the input channel size of Conv1d layer. This parameter is valid when network_type is 'SCINet'. Note that if warm_start = True in fit(), then this parameter is not valid.

Defaults to 3.

position_encoding: bool, optional

Position encoding adds extra positional embeddings to the training series.

False: Do not use.
True: Use.

This parameter is valid when network_type is 'SCINet'.

Defaults to True.

dropout_probfloat, optional

Dropout probability of Dropout layer. This parameter is valid when network_type is 'SCINet'.

Defaults to 0.05.

Examples

Input DataFrame is df_fit and create an instance of LTSF:

>>> ltsf = LTSF(batch_size = 8,
                num_epochs = 2,
                adjust_learning_rate = True,
                learning_rate = 0.005,
                random_seed = 1)

Performing fit():

>>> ltsf.fit(data=df_fit,
             train_length=32,
             forecast_length=16,
             key="TIME_STAMP",
             endog="TARGET",
             exog=["FEAT1", "FEAT2", "FEAT3", "FEAT4"])
>>> ltsf.loss_.collect()
    EPOCH          BATCH      LOSS
0       1              0  1.177407
1       1              1  0.925078
...
12      2              5  0.571699
13      2  epoch average  0.618181

Input DataFrame df_predict and perform predict():

>>> result = ltsf.predict(data=df_predict)
>>> result.collect()
   ID  FORECAST
1   0  52.28396
2   1  57.03466
...
16 15  69.33713

We also provide the continuous training which uses a parameter warm_start to control. The model used in the training is the attribute of model_ of a "LTSF" object. You could also use load_model() to load a trained model for continous training.

>>> ltsf.num_epochs = 2
>>> ltsf.learning_rate = 0.002
>>> ltsf.fit(data=df_fit,
             key="TIME_STAMP",
             endog="TARGET",
             exog=["FEAT1", "FEAT2", "FEAT3", "FEAT4"],
             warm_start=True)

Attributes:

model_DataFrame

Model content.

loss_DataFrame

Indicates the information of training loss either batch ID or average batch loss indicator.

explainer_DataFrame

The explanations with decomposition of exogenous variables. The attribute only appear when show_explainer=True and network_type is 'XLinear' in the predict() function.

permutation_importance_DataFrame

The importance of exogenous variables as determined by permutation importance analysis. The attribute only appear when invoking get_permutation_importance() function after a trained model is obtained, structured as follows:

1st column : PAIR, measure name.
2nd column : NAME, exogenous regressor name.
3rd column : VALUE, the importance of the exogenous regressor.

Methods

`fit`(data[, train_length, forecast_length, ...])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_permutation_importance`(data[, model, ...])	Please see Permutation Feature Importance for Time Series for details.
`get_score_metrics`()	Get the score metrics.
`predict`(data[, key, endog, allow_new_index, ...])	Generates time series forecasts based on the fitted model.

fit(data, train_length=None, forecast_length=None, key=None, endog=None, exog=None, warm_start=False)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Input data.

train_lengthint

Length of training series inputted to the network.

Note that if warm_start = True, then this parameter is not valid.

forecast_lengthint

Length of predictions.

The constraint is that train_length + forecat_length <= data.count()`.

Note that if warm_start = True, then this parameter is not valid.

keystr, optional

The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

endogstr, optional

The endogenous variable, i.e. target time series. The type of endog column could be INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to the first non-key column.

exogstr or a list of str, optional

An optional array of exogenous variables. The type of exog column could be INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to None. Please set this parameter explicitly if you have exogenous variables.

warm_startbool, optional

When set to True, reuse the model_ of current object to continuously train the model. We provide a method called load_model() to load a pretrain model. Otherwise, just to train a new model.

Defaults to False.

Returns:

A fitted object of class "LTSF".

predict(data, key=None, endog=None, allow_new_index=True, show_explainer=False, reference_dict=None)

Generates time series forecasts based on the fitted model. The number of rows of input predict data must be equal to the value of train_length during training and the length of predictions is equal to the value of forecast_length.

Parameters:

dataDataFrame

Input data for making forecasts.

Formally, data should contain an ID column, the target time series and exogenous features specified in the training phase(i.e. endog and exog in fit() function), but no other columns.

The length of data must be equal to the value of parameter train_length in fit().

keystr, optional

Name of the ID column.

Mandatory if data is not indexed, or the index of data contains multiple columns.

Defaults to the single index column of data if not provided.

endogstr, optional

The endogenous variable, i.e. target time series. The type of endog column could be INTEGER, DOUBLE or DECIMAL(p,s).

Defaults to the first non-key column of data.

allow_new_indexbool, optional

Indicates whether a new index column is allowed in the result. - True: return the result with new integer or timestamp index column. - False: return the result with index column starting from 0.

Defaults to True.

show_explainerbool, optional

Indicates whether to invoke the LTSF with explanations function in the predict.

If True, the contributions of each exog and its value and percentage are shown in a attribute called explainer_ of a LTSF instance.

Only valid when network_type is 'XLinear'.

Defaults to False.

reference_dictdict, optional

Define the reference value of an exogenous variable. The type of reference value need to be the same as the type of exogenous variable.

Only valid when show_explainer is True.

Defaults to the average value of exogenous variable in the training data if not provided.

Returns:

DataFrame 1

Forecasted values, structured as follows:

ID: type INTEGER, timestamp.
VALUE: type DOUBLE, forecast value.

DataFrame 2 (optional)

The explanations with decomposition of exogenous variables. Only valid if show_explainer is True and network_type is 'XLinear'.

get_permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None): Please see Permutation Feature Importance for Time Series for details.

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the LTSF class also inherits methods from PALBase class, please refer to PAL Base for more details.