LSTM

class hana_ml.algorithms.pal.tsa.lstm.LSTM(learning_rate=None, gru=None, batch_size=None, time_dim=None, hidden_dim=None, num_layers=None, max_iter=None, interval=None, optimizer_type=None, stateful=None, bidirectional=None)

Long short-term memory (LSTM).

Parameters:

learning_ratefloat, optional

Learning rate for gradient descent

Defaults to 0.01.

gru{'gru', 'lstm'}, optional

Choose GRU or LSTM.

Defaults to 'lstm'.

batch_sizeint, optional

Number of pieces of data for training in one iteration.

Defaults to 32.

time_dimint, optional

It specifies how many time steps in a sequence that will be trained by LSTM/GRU and then for time series prediction.

The value of it must be smaller than the length of input time series minus 1.

Defaults to 16.

hidden_dimint, optional

Number of hidden neuron in LSTM/GRU unit.

Defaults to 128.

num_layersint, optional

Number of layers in LSTM/GRU unit.

Defaults to 1.

max_iterint, optional

Number of batches of data by which LSTM/GRU is trained.

Defaults to 1000.

intervalint, optional

Output the average loss within every INTERVAL iterations.

Defaults to 100.

optimizer_type{'SGD', 'RMSprop', 'Adam', 'Adagrad'}, optional

Choose the optimizer.

Defaults to 'Adam'.

statefulbool, optional

If the value is True, it enables stateful LSTM/GRU.

Defaults to True.

bidirectionalbool, optional

If the value is True, it uses BiLSTM/BiGRU. Otherwise, it uses LSTM/GRU.

Defaults to False.

Examples

Input dataframe df:

>>> df.head(3).collect()
    TIMESTAMP  SERIES
0          0    20.7
1          1    17.9
2          2    18.8

Create LSTM model:

>>> lstm = lstm.LSTM(gru='lstm',
                     bidirectional=False,
                     time_dim=16,
                     max_iter=1000,
                     learning_rate=0.01,
                     batch_size=32,
                     hidden_dim=128,
                     num_layers=1,
                     interval=1,
                     stateful=False,
                     optimizer_type='Adam')

Perform fit on the given data:

>>> lstm.fit(df)

Perform predict on the fitted model:

>>> res = lstm.predict(df_predict)

Expected output:

>>> res.head(3).collect()
   ID      VALUE                                        REASON_CODE
0   0  11.673560  [{"attr":"T=0","pct":28.926935203430372,"val":...
1   1  14.057195  [{"attr":"T=3","pct":24.729787064691735,"val":...
2   2  15.119411  [{"attr":"T=2","pct":41.616207151605458,"val":...

Attributes:

loss_DateFrame: LOSS.
model_DataFrame: Model content.

Methods

`build_report`()	Generate time series report.
`fit`(data[, key, endog, exog])	Generates LSTM models with given parameters.
`generate_html_report`([filename])	Display function.
`generate_notebook_iframe_report`()	Display function.
`predict`(data[, top_k_attributions])	Makes time series forecast based on the LSTM model.

fit(data, key=None, endog=None, exog=None)

Generates LSTM models with given parameters.

Parameters:

dataDataFrame

Input data, structured as follows.

The 1st column : index/timestamp, type INTEGER.

The 2nd column : time-series value, type INTEGER, DOUBLE, or DECIMAL(p,s).

Other columns : external data(regressors), type INTEGER, DOUBLE, DECIMAL(p,s), VARCHAR or NVARCHAR.

keystr, optional

The timestamp column of data. The type of key column is INTEGER.

Defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

endogstr, optional

The endogenous variable, i.e. time series. The type of endog column is INTEGER, DOUBLE, or DECIMAL(p, s).

Defaults to the first non-key column of data if not provided.

exogstr or a list of str, optional

An optional array of exogenous variables. The type of exog column is INTEGER, DOUBLE, or DECIMAL(p, s).

Defaults to None. Please set this parameter explicitly if you have exogenous variables.

Returns:

A fitted object of class "LSTM".

predict(data, top_k_attributions=None)

Makes time series forecast based on the LSTM model.

Parameters:

dataDataFrame

Data for prediction. Every row in the data should contain one piece of record data for prediction, i.e. it should be structured as follows:

First column: Record ID, type INTEGER.

Other columns : Time-series and external data values, arranged in time order.

The number of all columns but the first id column should be equal to the value of time_dim * (M-1), where M is the number of columns of the input data in the training phase.

top_k_attributionsint, optional

Specifies the number of features with highest attributions to output.

Defaults to 10 or 0 depending on the SAP HANA version.

Returns:

DataFrame

The aggregated forecasted values. Forecasted values, structured as follows:

ID, type INTEGER, timestamp.

VALUE, type DOUBLE, forecast value.

REASON_CODE, type NCLOB, Sorted SHAP values for test data at each time step.

build_report(): Generate time series report.

generate_html_report(filename=None): Display function.

generate_notebook_iframe_report(): Display function.

property fit_hdbprocedure: Returns the generated hdbprocedure for fit.

property predict_hdbprocedure: Returns the generated hdbprocedure for predict.

Inherited Methods from PALBase

Besides those methods mentioned above, the LSTM class also inherits methods from PALBase class, please refer to PAL Base for more details.