AdditiveModelForecast

class hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast(growth=None, logistic_growth_capacity=None, seasonality_mode=None, seasonality=None, num_changepoints=None, changepoint_range=None, regressor=None, changepoints=None, yearly_seasonality=None, weekly_seasonality=None, daily_seasonality=None, seasonality_prior_scale=None, holiday_prior_scale=None, changepoint_prior_scale=None, massive=False, group_params=None)

Additive Model Time Series Analysis (AMTSA) uses an additive model to forecast time series data. It effectively handles data with strong seasonal effects and is adaptable to shifts in historical trends. AMTSA uses a decomposable time series model with three main components: trend, seasonality, and holidays or events.

Parameters:

growth{'linear', 'logistic'}, optional

Specifies a trend, which could be either linear or logistic.

Defaults to 'linear'.

logistic_growth_capacityfloat, optional

Specifies the carrying capacity for logistic growth. Mandatory and valid only when growth is 'logistic'.

No default value.

seasonality_mode{'additive', 'multiplicative'}, optional

Mode for seasonality.

Defaults to 'additive'.

seasonalitystr or a list of str, optional

Adds seasonality to the model in a json format, include:

NAME

PERIOD

FOURIER_ORDER

PRIOR_SCALE, optional

MODE, optional

Each str is in json format such as '{ "NAME": "MONTHLY", "PERIOD":30, "FOURIER_ORDER":5 }'. FOURIER_ORDER determines how quickly the seasonality can change. PRIOR_SCALE controls the amount of regularization. No seasonality will be added to the model if this parameter is not provided.

No default value.

num_changepointsint, optional

The number of potential changepoints. Not effective if changepoints is provided.

Defaults to 25 if not provided.

changepoint_rangefloat, optional

Proportion of history in which trend changepoints will be estimated. Not effective if changepoints is provided.

Defaults to 0.8.

regressora list of str, optional

Specifies the regressor, include:

NAME

PRIOR_SCALE

STANDARDIZE

MODE: "additive" or 'multiplicative'.

Each str is json format such as '{ "NAME": "X1", "PRIOR_SCALE":4, "MODE": "additive" }'. PRIOR_SCALE controls for the amount of regularization; STANDARDIZE Specifies whether or not the regressor is standardized.

No default value.

changepointslist of str, optional,

Specifies a list of changepoints in the format of timestamp, such as ['2019-01-01 00:00:00, '2019-02-04 00:00:00']

No default value.

yearly_seasonality{'auto', 'false', 'true'}, optional

Specifies whether or not to fit yearly seasonality.

'false' and 'true' simply corresponds to their logical meaning, while 'auto' means automatically determined from the input data.

Defaults to 'auto'.

weekly_seasonality{'auto', 'false', 'true'}, optional

Specifies whether or not to fit the weekly seasonality.

'auto' means automatically determined from input data.

Defaults to 'auto'.

daily_seasonality{'auto', 'false', 'true'}, optional

Specifies whether or not to fit the daily seasonality.

'auto' means automatically determined from input data.

Defaults to 'auto'.

seasonality_prior_scalefloat, optional

Parameter modulating the strength of the seasonality model.

Defaults to 10.

holiday_prior_scalefloat, optional

Parameter modulating the strength of the holiday components model.

Defaults to 10.

changepoint_prior_scalefloat, optional

Parameter modulating the flexibility of the automatic changepoint selection.

Defaults to 0.05.

massivebool, optional

Specifies whether or not to activate the massive mode.

True : massive mode.
False : single mode.

For parameter setting in the massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.

An example is as follows:

In this example, as seasonality_mode is set in group_params for Group_1, parameter setting of changepoint_prior_scale is not applicable to Group_1.

Defaults to False.

group_paramsdict, optional

If the massive mode is activated (massive is True), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when massive is True and defaults to None.

References

Seasonalities in Additive Model Forecast

Examples

Input DataFrame df_fit:

>>> df_fit.head(5).collect()
        ts         y
2007-12-10  9.590761
2007-12-11  8.519590
2007-12-12  8.183677
2007-12-13  8.072467
2007-12-14  7.893572

Create an Additive Model Forecast model:

>>> amf = additive_model_forecast.AdditiveModelForecast(growth='linear')

Perform fit():

>>> amf.fit(data=df_fit)

Output:

>>> amf.model_.collect()
   ROW_INDEX                                      MODEL_CONTENT
0          0  {"GROWTH":"linear","FLOOR":0.0,"SEASONALITY_MO...

Perform predict():

Input DataFrame df_predict:

>>> df_predict.head(5).collect()
            ts    y
 2008-03-09  0.0
 2008-03-10  0.0
 2008-03-11  0.0
 2008-03-12  0.0
 2008-03-13  0.0

>>> result = amf.predict(data=df_predict)

Output:

>>> result.collect()
            ts      YHAT  YHAT_LOWER  YHAT_UPPER
 2008-03-09  7.676880    6.930349    8.461546
 2008-03-10  8.147574    7.387315    8.969112
 2008-03-11  7.410452    6.630115    8.195562
 2008-03-12  7.198807    6.412776    7.977391
 2008-03-13  7.087702    6.310826    7.837083

If you want to see the decomposed result of predict result, you could set show_explainer = True:

>>> result = amf.predict(data=df_predict,
                         show_explainer=True,
                         decompose_seasonality=False,
                         decompose_holiday=False)

Show the attribute explainer_:

>>> amf.explainer_.head(5).collect()
            ts     TREND                                SEASONAL HOLIDAY EXOGENOUS
 2008-03-09  7.432172   {"seasonalities":0.24470822257259804}      {}        {}
 2008-03-10  7.390030     {"seasonalities":0.757544365973254}      {}        {}
 2008-03-11  7.347887   {"seasonalities":0.06256440574150749}      {}        {}
 2008-03-12  7.305745  {"seasonalities":-0.10693834906369426}      {}        {}
 2008-03-13  7.263603  {"seasonalities":-0.17590059499681369}      {}        {}

Attributes:

model_DataFrame

Model content.

explainer_DataFrame

The decomposition of trend, seasonal, holiday and exogenous variables.

In single mode, only contains value when show_explainer=True in the predict() function.
In massive mode, this attribute always contains value.

error_msg_DataFrame

Error message. Only valid if massive is True when initializing an 'AdditiveModelForecast' instance.

permutation_importance_DataFrame

The importance of exogenous variables as determined by permutation importance analysis. The attribute only appear when invoking get_permutation_importance() function after a trained model is obtained, structured as follows:

1st column : PAIR, measure name.
2nd column : NAME, exogenous regressor name.
3rd column : VALUE, the importance of the exogenous regressor.

Methods

`fit`(data[, key, endog, exog, holiday, ...])	Fit the model to the training dataset.
`get_model_metrics`()	Get the model metrics.
`get_permutation_importance`(data[, model, ...])	Please see Permutation Feature Importance for Time Series for details.
`get_score_metrics`()	Get the score metrics.
`make_future_dataframe`([data, key, ...])	Create a new dataframe for time series prediction.
`predict`(data[, key, exog, group_key, ...])	Generates time series forecasts based on the fitted model.

fit(data, key=None, endog=None, exog=None, holiday=None, group_key=None, group_params=None, categorical_variable=None)

Fit the model to the training dataset.

Parameters:

dataDataFrame

Input data. The structure is as follows.

The first column: index (ID), type TIMESTAMP, SECONDDATE or DATE.
The second column: raw data, type INTEGER or DECIMAL(p,s).
Other columns: external data, type INTEGER, DOUBLE or DECIMAL(p,s).

keystr, optional

The timestamp column of data. The type of key column is TIMESTAMP, SECONDDATE, or DATE.

In the single mode, defaults to the first column of data if the index column of data is not provided; otherwise, defaults to the index column of data.

In the massive mode, defaults to the first-non group key column of data if the index columns of data is not provided; otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.

endogstr, optional

The endogenous variable, i.e. time series. The type of endog column is INTEGER, DOUBLE, or DECIMAL(p, s).

In single mode, defaults to the first non-key column.
In massive mode, defaults to the first non group_key, non key column.

exogstr or a list of str, optional

An optional array of exogenous variables. The type of exog column is INTEGER, DOUBLE, or DECIMAL(p, s).

Defaults to None. Please set this parameter explicitly if you have exogenous variables.

holidayDataFrame, optional

Input holiday data. The structure is as follows.

1st column : timestamp/key, TIMESTAMP, SECONDDATE, DATE
2nd column : holiday name, VARCHAR, NVARCHAR
3rd column : lower window of holiday, less than 0, INTEGER, optional
4th column : upper window of holiday, greater than 0, INTEGER, optional

if massive is True, the structure of input holiday data is as follows:

1st column: group_key, INTEGER, VRACHAR or NVARCHAR
2nd column: timestamp/key, TIMESTAMP, SECONDDATE, DATE
3rd column : holiday name, VARCHAR, NVARCHAR
4th column : lower window of holiday, less than 0, INTEGER, optional
3th column : upper window of holiday, greater than 0, INTEGER, optional

Defaults to None.

group_keystr, optional

The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.

This parameter is only valid when self.massive is True.

Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.

group_paramsdict, optional

If massive mode is activated (massive is True), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when self.massive is True.

Defaults to None.

categorical_variablestr or ist of str, optional

Specifies INTEGER columns specified that should be be treated as categorical.

Other INTEGER columns will be treated as continuous.

Defaults to None.

Returns:

A fitted object of class "AdditiveModelForecast".

make_future_dataframe(data=None, key=None, group_key=None, periods=1)

Create a new dataframe for time series prediction.

Parameters:

dataDataFrame, optional

The training data contains the index.

Defaults to the data used in the fit().

keystr, optional

The index defined in the training data.

Defaults to the specified key in fit() or the value in data.index or the PAL's default key column position.

group_keystr, optional

Specify the group id column.

This parameter is only valid when massive is True.

Defaults to the specified group_key in fit() or the first column of the dataframe.

periodsint, optional

The number of rows created in the predict dataframe.

Defaults to 1.

Returns:

DataFrame

predict(data, key=None, exog=None, group_key=None, group_params=None, logistic_growth_capacity=None, interval_width=None, uncertainty_samples=None, show_explainer=False, decompose_seasonality=None, decompose_holiday=None)

Generates time series forecasts based on the fitted model.

Parameters:

dataDataFrame, optional

Index and exogenous variables for forecast. The structure is as follows.

First column: Index (ID), type TIMESTAMP, SECONDDATE or DATE.

Second column: Placeholder column for forecast values, type DOUBLE or DECIMAL(p,s).

Other columns : external data, type INTEGER, DOUBLE or DECIMAL(p,s).

if massive is True, the structure of data is as follows:

First column: Group_key, type INTEGER, VRACHAR or NVARCHAR.

Second column: Index (ID), type TIMESTAMP, SECONDDATE or DATE.

Third column : Placeholder column for forecast values, type DOUBLE or DECIMAL(p,s).

Other columns: external data, type INTEGER, DOUBLE or DECIMAL(p,s).

keystr, optional

The timestamp column of data. The data type of key column should be TIMESTAMP, DATE or SECONDDATE.

In single mode, defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.

In massive mode, defaults to the first-non group key column of data if the index columns of data is not provided; otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.

group_keystr, optional

The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.

This parameter is only valid when massive is True.

Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.

group_paramsdict, optional

If massive mode is activated (massive is True), input data is divided into different groups with different parameters applied.

An example with group_params is as follows:

Valid only when massive is True and defaults to None.

logistic_growth_capacity: float, optional

Specifies the carrying capacity for logistic growth. Mandatory and valid only when growth is 'logistic'.

Defaults to None.

interval_widthfloat, optional

Width of the uncertainty intervals.

Defaults to 0.8.

uncertainty_samplesint, optional

Number of simulated draws used to estimate uncertainty intervals.

Defaults to 1000.

show_explainerbool, optional

Indicates whether to invoke the AdditiveModelForecast with explanations function in the predict. If true, the contributions of trend, seasonal, holiday and exogenous variables are shown in a attribute called explainer_ of the AdditiveModelForecast instance.

Defaults to False.

decompose_seasonalitybool, optional

Specifies whether or not seasonal component will be decomposed. Valid only when show_explainer is True.

Defaults to False.

decompose_holidaybool, optional

Specifies whether or not holiday component will be decomposed. Valid only when show_explainer is True.

Defaults to False.

Returns:

DataFrame 1

Forecasted values, structured as follows:

ID, type timestamp.
YHAT, type DOUBLE, forecast value.
YHAT_LOWER, type DOUBLE, lower bound of confidence region.
YHAT_UPPER, type DOUBLE, higher bound of confidence region.

DataFrame 2

The decomposition of trend, seasonal, holiday and exogenous variables.

DataFrame 3 (optional)

Error message. Only valid if massive is True when initializing an 'AdditiveModelForecast' instance.

get_model_metrics()

Get the model metrics.

Returns:

DataFrame: The model metrics.

get_permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None): Please see Permutation Feature Importance for Time Series for details.

get_score_metrics()

Get the score metrics.

Returns:

DataFrame: The score metrics.

Inherited Methods from PALBase

Besides those methods mentioned above, the AdditiveModelForecast class also inherits methods from PALBase class, please refer to PAL Base for more details.