LTSF
- class hana_ml.algorithms.pal.tsa.ltsf.LTSF(num_levels=None, kernel_size=None, hidden_expansion=None, batch_size=None, num_epochs=None, random_seed=None, position_encoding=None, adjust_learning_rate=None, learning_rate=None, dropout_prob=None)
Long-Term Series Forecasting (LTSF).
Although traditional algorithms are capable of predicting values in the near future, their performance will deteriorate greatly when it comes to long-term series forecasting. With the help of deep learning, this function implements a novel neural network architecture to achieve the state-of-the-art performance among the PAL family.
- Parameters
- num_levelsint, optional
Number of levels in the network architecture.
Note that if
warm_start = True
in fit(), then this parameter is not valid.Defaults to 2.
- kernel_sizeint, optional
Kernel size of Conv1d layer. Note that if
warm_start = True
in fit(), then this parameter is not valid.Defaults to 3.
- hidden_expansionint, optional
Expands the input channel size of Conv1d layer.
Note that if
warm_start = True
in fit(), then this parameter is not valid.Defaults to 3.
- batch_sizeint, optional
Number of pieces of data for training in one iteration.
Defaults to 8.
- num_epochsint, optional
Number of training epochs.
Defaults to 1.
- random_seedint, optional
0 indicates using machine time as seed.
Defaults to 0.
- position_encoding: bool, optional
Position encoding adds extra positional embeddings to the training series.
False: Do not use.
True: Use.
Defaults to True.
- adjust_learning_rate: bool, optional
Decays the learning rate to its half after every epoch.
False: Do not use.
True: Use.
Defaults to True.
- learning_ratefloat, optional
Initial learning rate for Adam optimizer.
Defaults to 0.005.
- dropout_probfloat, optional
Dropout probability of Dropout layer.
Defaults to 0.05.
Examples
Input dataframe is df_fit and create an instance of LTSF:
>>> ltsf = LTSF(num_levels = 2, kernel_size = 3, hidden_expansion = 3, batch_size = 8, num_epochs = 2, position_encoding = True, adjust_learning_rate = True, learning_rate = 0.005, dropout_prob = 0.2, random_seed = 1)
Performing fit() on the given dataframe:
>>> ltsf.fit(data=df.fit, train_length=32, forecast_length=16, key="TIME_STAMP", endog="TARGET", exog=["FEAT1", "FEAT2", "FEAT3", "FEAT4"]) >>> ltsf.loss_.collect() EPOCH BATCH LOSS 0 1 0 1.177407 1 1 1 0.925078 2 1 2 0.798042 3 1 3 0.712275 4 1 4 0.702966 5 1 5 0.703366 6 1 epoch average 0.836522 7 2 0 0.664331 8 2 1 0.608385 9 2 2 0.614841 10 2 3 0.626234 11 2 4 0.623597 12 2 5 0.571699 13 2 epoch average 0.618181
Input dataframe for predict is df_predict and performing predict() on given dataframe:
>>> result = ltsf.predict(data=df_predict) >>> result.collect() ID FORECAST 1 0 52.28396 2 1 57.03466 3 2 69.49162 4 3 68.06987 5 4 40.43507 6 5 55.53528 7 6 54.17256 8 7 39.32336 9 8 25.51410 10 9 102.11331 11 10 134.10745 12 11 48.32333 13 12 46.47223 14 13 72.44048 15 14 65.29192 16 15 69.33713
We also provide the continuous training which uses a parameter warm_start to control. The model used in the training is the attribute of model_ of a "LTSF" object. You could also use load_model() to load a trained model for continous training.
>>> ltsf.num_epochs = 1 >>> ltsf.dropout_prob = 0.2 >>> ltsf.learning_rate = 0.002 >>> ltsf.fit(df_fit, key="TIME_STAMP", endog="TARGET", exog=["FEAT1", "FEAT2", "FEAT3", "FEAT4"], warm_start=True)
- Attributes
- model_DataFrame
Trained model content.
- loss_DataFrame
Indicates the information of training loss either batch ID or average batch loss indicator.
Methods
fit
(data[, train_length, forecast_length, ...])Train a LTSF model with given parameters.
predict
(data[, key, endog, allow_new_index])Makes time series forecast based on a LTSF model.
- fit(data, train_length=None, forecast_length=None, key=None, endog=None, exog=None, warm_start=False)
Train a LTSF model with given parameters.
- Parameters
- dataDataFrame
Input data.
- train_lengthint
Length of training series inputted to the network.
The constraint is that
train_length
\(\geq 2^{\text{num_levels}}\), wherenum_levels
is specified in the initialization function; in the meantime, it cannot be larger than the length ofdata
.Note that if
warm_start = True
, then this parameter is not valid.- forecast_lengthint
Length of predictions.
The constraint is that
train_length + forecat_length <= data.count()`
.Note that if
warm_start = True
, then this parameter is not valid.- keystr, optional
The timestamp column of data. The type of key column should be INTEGER, TIMESTAMP, DATE or SECONDDATE.
Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
- endogstr, optional
The endogenous variable, i.e. target time series. The type of endog column could be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to the first non-key column.
- exogstr or a list of str, optional
An optional array of exogenous variables. The type of exog column could be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to None. Please set this parameter explicitly if you have exogenous variables.
- warm_startbool, optional
When set to True, reuse the
model_
of current object to continuously train the model. We provide a method called load_model() to load a pretrain model.Defaults to False.
- Returns
- A fitted object of class "LTSF".
- predict(data, key=None, endog=None, allow_new_index=True)
Makes time series forecast based on a LTSF model. The number of rows of input predict data must be equal to the value of
train_length
during training and the length of predictions is equal to the value offorecast_length
.- Parameters
- dataDataFrame
Input data for making forecasts.
Formally,
data
should contain an ID column, the target time series and exogenous features specified in the training phase(i.e.endog
andexog
in fit() function), but no other columns.The length of
data
must be equal to the value of parametertrain_length
in fit().- keystr, optional
Name of the ID column.
Mandatory if
data
is not indexed, or the index ofdata
contains multiple columns.Defaults to the single index column of
data
if not provided.- endogstr, optional
The endogenous variable, i.e. target time series. The type of endog column could be INTEGER, DOUBLE or DECIMAL(p,s).
Defaults to the first non-key column of
data
.- allow_new_indexbool, optional
Indicates whether a new index column is allowed in the result. - True: return the result with new integer or timestamp index column. - False: return the result with index column starting from 0.
Defaults to True.
- Returns
- DataFrame
Forecasted values, structured as follows:
ID, type INTEGER, timestamp.
VALUE, type DOUBLE, forecast value.
- property fit_hdbprocedure
Returns the generated hdbprocedure for fit.
- property predict_hdbprocedure
Returns the generated hdbprocedure for predict.
Inherited Methods from PALBase
Besides those methods mentioned above, the LTSF class also inherits methods from PALBase class, please refer to PAL Base for more details.