AdditiveModelForecast
- class hana_ml.algorithms.pal.tsa.additive_model_forecast.AdditiveModelForecast(growth=None, logistic_growth_capacity=None, seasonality_mode=None, seasonality=None, num_changepoints=None, changepoint_range=None, regressor=None, changepoints=None, yearly_seasonality=None, weekly_seasonality=None, daily_seasonality=None, seasonality_prior_scale=None, holiday_prior_scale=None, changepoint_prior_scale=None, massive=False, group_params=None)
Additive Model Time Series Analysis (AMTSA) uses an additive model to forecast time series data. It effectively handles data with strong seasonal effects and is adaptable to shifts in historical trends. AMTSA uses a decomposable time series model with three main components: trend, seasonality, and holidays or events.
- Parameters:
- growth{'linear', 'logistic'}, optional
Specifies a trend, which could be either linear or logistic.
Defaults to 'linear'.
- logistic_growth_capacityfloat, optional
Specifies the carrying capacity for logistic growth. Mandatory and valid only when
growth
is 'logistic'.No default value.
- seasonality_mode{'additive', 'multiplicative'}, optional
Mode for seasonality.
Defaults to 'additive'.
- seasonalitystr or a list of str, optional
Adds seasonality to the model in a json format, include:
NAME
PERIOD
FOURIER_ORDER
PRIOR_SCALE, optional
MODE, optional
Each str is in json format such as '{ "NAME": "MONTHLY", "PERIOD":30, "FOURIER_ORDER":5 }'. FOURIER_ORDER determines how quickly the seasonality can change. PRIOR_SCALE controls the amount of regularization. No seasonality will be added to the model if this parameter is not provided.
No default value.
- num_changepointsint, optional
The number of potential changepoints. Not effective if
changepoints
is provided.Defaults to 25 if not provided.
- changepoint_rangefloat, optional
Proportion of history in which trend changepoints will be estimated. Not effective if
changepoints
is provided.Defaults to 0.8.
- regressora list of str, optional
Specifies the regressor, include:
NAME
PRIOR_SCALE
STANDARDIZE
MODE: "additive" or 'multiplicative'.
Each str is json format such as '{ "NAME": "X1", "PRIOR_SCALE":4, "MODE": "additive" }'. PRIOR_SCALE controls for the amount of regularization; STANDARDIZE Specifies whether or not the regressor is standardized.
No default value.
- changepointslist of str, optional,
Specifies a list of changepoints in the format of timestamp, such as ['2019-01-01 00:00:00, '2019-02-04 00:00:00']
No default value.
- yearly_seasonality{'auto', 'false', 'true'}, optional
Specifies whether or not to fit yearly seasonality.
'false' and 'true' simply corresponds to their logical meaning, while 'auto' means automatically determined from the input data.
Defaults to 'auto'.
- weekly_seasonality{'auto', 'false', 'true'}, optional
Specifies whether or not to fit the weekly seasonality.
'auto' means automatically determined from input data.
Defaults to 'auto'.
- daily_seasonality{'auto', 'false', 'true'}, optional
Specifies whether or not to fit the daily seasonality.
'auto' means automatically determined from input data.
Defaults to 'auto'.
- seasonality_prior_scalefloat, optional
Parameter modulating the strength of the seasonality model.
Defaults to 10.
- holiday_prior_scalefloat, optional
Parameter modulating the strength of the holiday components model.
Defaults to 10.
- changepoint_prior_scalefloat, optional
Parameter modulating the flexibility of the automatic changepoint selection.
Defaults to 0.05.
- massivebool, optional
Specifies whether or not to activate the massive mode.
True : massive mode.
False : single mode.
For parameter setting in the massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.
An example is as follows:
In this example, as
seasonality_mode
is set in group_params for Group_1, parameter setting ofchangepoint_prior_scale
is not applicable to Group_1.Defaults to False.
- group_paramsdict, optional
If the massive mode is activated (
massive
is True), input data is divided into different groups with different parameters applied.An example with group_params is as follows:
Valid only when
massive
is True and defaults to None.
References
Seasonalities in Additive Model Forecast
Examples
Input DataFrame df_fit:
>>> df_fit.head(5).collect() ts y 2007-12-10 9.590761 2007-12-11 8.519590 2007-12-12 8.183677 2007-12-13 8.072467 2007-12-14 7.893572
Create an Additive Model Forecast model:
>>> amf = additive_model_forecast.AdditiveModelForecast(growth='linear')
Perform fit():
>>> amf.fit(data=df_fit)
Output:
>>> amf.model_.collect() ROW_INDEX MODEL_CONTENT 0 0 {"GROWTH":"linear","FLOOR":0.0,"SEASONALITY_MO...
Perform predict():
Input DataFrame df_predict:
>>> df_predict.head(5).collect() ts y 0 2008-03-09 0.0 1 2008-03-10 0.0 2 2008-03-11 0.0 3 2008-03-12 0.0 4 2008-03-13 0.0
>>> result = amf.predict(data=df_predict)
Output:
>>> result.collect() ts YHAT YHAT_LOWER YHAT_UPPER 0 2008-03-09 7.676880 6.930349 8.461546 1 2008-03-10 8.147574 7.387315 8.969112 2 2008-03-11 7.410452 6.630115 8.195562 3 2008-03-12 7.198807 6.412776 7.977391 4 2008-03-13 7.087702 6.310826 7.837083
If you want to see the decomposed result of predict result, you could set
show_explainer = True
:>>> result = amf.predict(data=df_predict, show_explainer=True, decompose_seasonality=False, decompose_holiday=False)
Show the attribute
explainer_
:>>> amf.explainer_.head(5).collect() ts TREND SEASONAL HOLIDAY EXOGENOUS 0 2008-03-09 7.432172 {"seasonalities":0.24470822257259804} {} {} 1 2008-03-10 7.390030 {"seasonalities":0.757544365973254} {} {} 2 2008-03-11 7.347887 {"seasonalities":0.06256440574150749} {} {} 3 2008-03-12 7.305745 {"seasonalities":-0.10693834906369426} {} {} 4 2008-03-13 7.263603 {"seasonalities":-0.17590059499681369} {} {}
- Attributes:
- model_DataFrame
Model content.
- explainer_DataFrame
The decomposition of trend, seasonal, holiday and exogenous variables.
In single mode, only contains value when
show_explainer=True
in the predict() function.In massive mode, this attribute always contains value.
- error_msg_DataFrame
Error message. Only valid if
massive
is True when initializing an 'AdditiveModelForecast' instance.- permutation_importance_DataFrame
The importance of exogenous variables as determined by permutation importance analysis. The attribute only appear when invoking get_permutation_importance() function after a trained model is obtained, structured as follows:
1st column : PAIR, measure name.
2nd column : NAME, exogenous regressor name.
3rd column : VALUE, the importance of the exogenous regressor.
Methods
fit
(data[, key, endog, exog, holiday, ...])Fit the model to the training dataset.
Get the model metrics.
get_permutation_importance
(data[, model, ...])Please see Permutation Feature Importance for Time Series for details.
Get the score metrics.
make_future_dataframe
([data, key, ...])Create a new dataframe for time series prediction.
predict
(data[, key, exog, group_key, ...])Generates time series forecasts based on the fitted model.
- fit(data, key=None, endog=None, exog=None, holiday=None, group_key=None, group_params=None, categorical_variable=None)
Fit the model to the training dataset.
- Parameters:
- dataDataFrame
Input data. The structure is as follows.
The first column: index (ID), type TIMESTAMP, SECONDDATE or DATE.
The second column: raw data, type INTEGER or DECIMAL(p,s).
Other columns: external data, type INTEGER, DOUBLE or DECIMAL(p,s).
- keystr, optional
The timestamp column of data. The type of key column is TIMESTAMP, SECONDDATE, or DATE.
In the single mode, defaults to the first column of data if the index column of data is not provided; otherwise, defaults to the index column of data.
In the massive mode, defaults to the first-non group key column of data if the index columns of data is not provided; otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- endogstr, optional
The endogenous variable, i.e. time series. The type of endog column is INTEGER, DOUBLE, or DECIMAL(p, s).
In single mode, defaults to the first non-key column.
In massive mode, defaults to the first non group_key, non key column.
- exogstr or a list of str, optional
An optional array of exogenous variables. The type of exog column is INTEGER, DOUBLE, or DECIMAL(p, s).
Defaults to None. Please set this parameter explicitly if you have exogenous variables.
- holidayDataFrame, optional
Input holiday data. The structure is as follows.
1st column : timestamp/key, TIMESTAMP, SECONDDATE, DATE
2nd column : holiday name, VARCHAR, NVARCHAR
3rd column : lower window of holiday, less than 0, INTEGER, optional
4th column : upper window of holiday, greater than 0, INTEGER, optional
if
massive
is True, the structure of input holiday data is as follows:1st column: group_key, INTEGER, VRACHAR or NVARCHAR
2nd column: timestamp/key, TIMESTAMP, SECONDDATE, DATE
3rd column : holiday name, VARCHAR, NVARCHAR
4th column : lower window of holiday, less than 0, INTEGER, optional
3th column : upper window of holiday, greater than 0, INTEGER, optional
Defaults to None.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.
This parameter is only valid when self.massive is True.
Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massive
is True), input data is divided into different groups with different parameters applied.An example with group_params is as follows:
Valid only when self.massive is True.
Defaults to None.
- categorical_variablestr or ist of str, optional
Specifies INTEGER columns specified that should be be treated as categorical.
Other INTEGER columns will be treated as continuous.
Defaults to None.
- Returns:
- A fitted object of class "AdditiveModelForecast".
- make_future_dataframe(data=None, key=None, group_key=None, periods=1)
Create a new dataframe for time series prediction.
- Parameters:
- dataDataFrame, optional
The training data contains the index.
Defaults to the data used in the fit().
- keystr, optional
The index defined in the training data.
Defaults to the specified key in fit() or the value in data.index or the PAL's default key column position.
- group_keystr, optional
Specify the group id column.
This parameter is only valid when
massive
is True.Defaults to the specified group_key in fit() or the first column of the dataframe.
- periodsint, optional
The number of rows created in the predict dataframe.
Defaults to 1.
- Returns:
- DataFrame
- predict(data, key=None, exog=None, group_key=None, group_params=None, logistic_growth_capacity=None, interval_width=None, uncertainty_samples=None, show_explainer=False, decompose_seasonality=None, decompose_holiday=None)
Generates time series forecasts based on the fitted model.
- Parameters:
- dataDataFrame, optional
Index and exogenous variables for forecast. The structure is as follows.
First column: Index (ID), type TIMESTAMP, SECONDDATE or DATE.
Second column: Placeholder column for forecast values, type DOUBLE or DECIMAL(p,s).
Other columns : external data, type INTEGER, DOUBLE or DECIMAL(p,s).
if massive is True, the structure of data is as follows:
First column: Group_key, type INTEGER, VRACHAR or NVARCHAR.
Second column: Index (ID), type TIMESTAMP, SECONDDATE or DATE.
Third column : Placeholder column for forecast values, type DOUBLE or DECIMAL(p,s).
Other columns: external data, type INTEGER, DOUBLE or DECIMAL(p,s).
- keystr, optional
The timestamp column of data. The data type of key column should be TIMESTAMP, DATE or SECONDDATE.
In single mode, defaults to the first column of data if the index column of data is not provided. Otherwise, defaults to the index column of data.
In massive mode, defaults to the first-non group key column of data if the index columns of data is not provided; otherwise, defaults to the second of index columns of data and the first column of index columns is group_key.
- group_keystr, optional
The column of group_key. Data type can be INT or NVARCHAR/VARCHAR. If data type is INT, only parameters set in the group_params are valid.
This parameter is only valid when
massive
is True.Defaults to the first column of data if the index columns of data is not provided. Otherwise, defaults to the first column of index columns.
- group_paramsdict, optional
If massive mode is activated (
massive
is True), input data is divided into different groups with different parameters applied.An example with
group_params
is as follows:Valid only when
massive
is True and defaults to None.- logistic_growth_capacity: float, optional
Specifies the carrying capacity for logistic growth. Mandatory and valid only when
growth
is 'logistic'.Defaults to None.
- interval_widthfloat, optional
Width of the uncertainty intervals.
Defaults to 0.8.
- uncertainty_samplesint, optional
Number of simulated draws used to estimate uncertainty intervals.
Defaults to 1000.
- show_explainerbool, optional
Indicates whether to invoke the AdditiveModelForecast with explanations function in the predict. If true, the contributions of trend, seasonal, holiday and exogenous variables are shown in a attribute called
explainer_
of the AdditiveModelForecast instance.Defaults to False.
- decompose_seasonalitybool, optional
Specifies whether or not seasonal component will be decomposed. Valid only when
show_explainer
is True.Defaults to False.
- decompose_holidaybool, optional
Specifies whether or not holiday component will be decomposed. Valid only when
show_explainer
is True.Defaults to False.
- Returns:
- DataFrame 1
Forecasted values, structured as follows:
ID, type timestamp.
YHAT, type DOUBLE, forecast value.
YHAT_LOWER, type DOUBLE, lower bound of confidence region.
YHAT_UPPER, type DOUBLE, higher bound of confidence region.
- DataFrame 2
The decomposition of trend, seasonal, holiday and exogenous variables.
- DataFrame 3 (optional)
Error message. Only valid if
massive
is True when initializing an 'AdditiveModelForecast' instance.
- get_model_metrics()
Get the model metrics.
- Returns:
- DataFrame
The model metrics.
- get_permutation_importance(data, model=None, key=None, endog=None, exog=None, repeat_time=None, random_state=None, thread_ratio=None, partition_ratio=None, regressor_top_k=None, accuracy_measure=None, ignore_zero=None)
Please see Permutation Feature Importance for Time Series for details.
- get_score_metrics()
Get the score metrics.
- Returns:
- DataFrame
The score metrics.
Inherited Methods from PALBase
Besides those methods mentioned above, the AdditiveModelForecast class also inherits methods from PALBase class, please refer to PAL Base for more details.