intermittent_forecast
- hana_ml.algorithms.pal.tsa.intermittent_forecast.intermittent_forecast(data, key=None, endog=None, p=None, q=None, forecast_num=None, optimizer=None, method=None, grid_size=None, optimize_step=None, accuracy_measure=None, ignore_zero=None, expost_flag=None, thread_ratio=None, iter_count=None, random_state=None, penalty=None)
Intermittent Time Series Forecast (ITSF) is a forecast strategy for products with intermittent demand.
Difference to constant weight of the croston method:
ITSF provides a exponential weight to estimate, which means the closer the data, the greater the weight.
ITSF does not need the initial value of non-zero demands and time interval between non-zero demands.
- Parameters:
- dataDataFrame
Data that contains the time-series analysis.
- keystr, optional
Specifies the ID(representing time-order) column of
data
.Required if a single ID column cannot be inferred from the index of
data
.If there is a single column name in the index of
data
, thenkey
defaults to that column; otherwisekey
is mandatory.- endogstr, optional
Specifies the name of the column for intermittent demand values.
Defaults to the 1st non-key column of
data
.- pint, optional
The smoothing parameter for demand, where:
-1 : optimizing this parameter automatically
positive integers : the specified value for smoothing ([1,n]), forecast by manually specifying value.
The specified value cannot exceed the length of time-series for analysis.
Defaults to -1.
- qint, optional
The smoothing parameter for the time-intervals between intermittent demands, where:
-1 : optimizing this parameter automatically
Non-negative values ([1,P]): forecast by manually specifying value.
Defaults to -1.
- forecast_numint, optional
Forecast length. When it is set to 1, the algorithm only forecasts one value.
Defaults to 1.
- optimizer{'lbfgsb', 'brute', 'sim_annealing'}, optional
Specifies the optimization algorithm for automatically identifying parameters
p
andq
.'lbfgsb' : Bounded Limited-memory Broyden-Fletcher-Goldfarb-Shanno(LBFGSB) method with parameters
p
andq
initialized by default scheme.'brute' : Brute method, LBFGSB with parameter
p
andq
initialized by grid search.'sim_annealing' : Simulated annealing method.
Defaults to 'lbfgsb'.
- methodstr, optional
Specifies the method(or mode) for the output:
'sporadic': Use the sporadic method.
'constant': Use the constant method.
Defaults to 'constant'.
- grid_sizeint, optional
Specifies the number of steps from the start point to the length of data for grid search.
Only valid for when
optimizer
is set as 'brute'.Defaults to 20.
- optimize_stepfloat, optional
Specifies the minimum step for each iteration of LBFGSB method.
Defaults to 0.001.
- accuracy_measurestr or a list of str, optional
The metric to quantify how well a model fits input data. Options: 'mse', 'rmse', 'mae', 'mape', 'smape', 'mase'.
Defaults to 'mse'.
Note
Specify a measure name if you want the corresponding measure value to be reflected in the output statistics (The second DataFrame in the return).
- ignore_zerobool, optional
False: Uses zero values in the input dataset when calculating 'mape'.
True: Ignores zero values in the input dataset when calculating 'mape'.
Only valid when
accuracy_measure
is 'mape'.Defaults to False.
- expost_flagbool, optional
False: Does not output the expost forecast, and just outputs the forecast values.
True: Outputs the expost forecast and the forecast values.
Defaults to True.
- thread_ratiofloat, optional
Adjusts the percentage of available threads to use, from 0 to 1. A value of 0 indicates the use of a single thread, while 1 implies the use of all possible current threads. Values outside the range will be ignored and this function heuristically determines the number of threads to use.
Defaults to 0.
- iter_countint, optional
A positive integer that controls the iteration of the simulated annealing.
Defaults to 1000.
- random_stateint, optional
Specifies the seed for random number generator. Valid for Simulated annealing method.
Defaults to 1.
- penaltyfloat, optional
A penalty is applied to the cost function to avoid over-fitting.
Defaults to 1.0.
- Returns:
- A tuple of two DataFrames
- - 1st DateFrameForecast values.
- - 2nd DataFrameRelated statistics.
Examples
>>> forecasts, stats = intermittent_forecast(data=df, p=3, forecast_num=3, optimizer='lbfgsb_grid', grid_size=20, optimize_step = 0.011, expost_flag=False, accuracy_measure='mse', ignore_zero=False, thread_ratio=0.5) >>> forecasts.collect() >>> stats.collect()