intermittent_forecast

hana_ml.algorithms.pal.tsa.intermittent_forecast.intermittent_forecast(data, key=None, endog=None, p=None, q=None, forecast_num=None, optimizer=None, method=None, grid_size=None, optimize_step=None, accuracy_measure=None, ignore_zero=None, expost_flag=None, thread_ratio=None, iter_count=None, random_state=None, penalty=None)

This function is a wrapper for PAL Intermittent Time Series Forecast (ITSF), which is a new forecast strategy for products with intermittent demand.

Difference to constant weight of the croston method:

ITSF provides a exponential weight to estimate, which means the closer the data, the greater the weight

ITSF does not need the initial value of non-zero demands and time interval between non-zero demands

Parameters:

dataDataFrame

Data that contains the time-series analysis.

keystr, optional

Specifies the ID(representing time-order) column of data.

Required if a single ID column cannot be inferred from the index of data.

If there is a single column name in the index of data, then key defaults to that column; otherwise key is mandatory.

endogstr, optional

Specifies name of the column for intermittent demand values.

Defaults to the 1st non-key column of data.

pint, optional

The smoothing parameter for demand, where:

-1 : optimizing this parameter automatically

positive integers : the specified value for smoothing, cannot exceed length of time-series for analysis.

The specified value cannot exceed the length of time-series for analysis.

Defaults to -1.

qint, optional

The smoothing parameter for the time-intervals between intermittent demands, where:

-1 : optimizing this parameter automatically

positive integers : the specified value for smoothing, cannot exceed the value of p.

Defaults to -1.

forecast_numint, optional

Number of values to be forecast.

When it is set to 1, the algorithm only forecasts one value.

Defaults to 1.

optimizer{'lbfgsb', 'brute', 'sim_annealing'}, optional

Specifies the optimization algorithm for automatically identifying parameters p and q.

'lbfgsb' : Bounded Limited-memory Broyden-Fletcher-Goldfarb-Shanno(LBFGSB) method with parameters p and q initialized by default scheme.

'brute' : Brute method, LBFGSB with parameter p and q initialized by grid search.

'sim_annealing' : Simulated annealing method.

Defaults to 'lbfgsb'.

methodstr, optional

Specifies the method(or mode) for the output:

'sporadic': Use the sporadic method.

'constant': Use the constant method.

Defaults to 'constant'.

grid_sizeint, optional

Specifying the number of steps from the start point to the length of data for grid search.

Only valid for when optimizer is set as 'brute'.

Defaults to 20.

optimize_stepfloat, optional

Specifying minimum step for each iteration of LBFGSB method.

Defaults to 0.001.

accuracy_measurestr or list of str, optional

The metric to quantify how well a model fits input data. Options: 'mse', 'rmse', 'mae', 'mape', 'smape', 'mase'.

Defaults to 'mse'.

Note

Specify a measure name if you want the corresponding measure value to be reflected in the output statistics (The second DataFrame in the return).

ignore_zerobool, optional

False: Uses zero values in the input dataset when calculating 'mape'.
True: Ignores zero values in the input dataset when calculating 'mape'.

Only valid when accuracy_measure is 'mape'.

Defaults to False.

expost_flagbool, optional

False: Does not output the expost forecast, and just outputs the forecast values.
True: Outputs the expost forecast and the forecast values.

Defaults to True.

thread_ratiofloat, optional

Specify the ratio of available threads for performing ITSF.

0 : single thread

0~1 : percentage

Defaults to 0.

iter_countint, optional

A positive integer that controls the iteration of the simulated annealing.

Defaults to 1000.

random_stateint, optional

Specifies the seed to initialize the random number generator. It can be set to 0 or a positive value.

0: Uses the system time

others: Uses the specified seed

penaltyfloat, optional

A penalty is applied to the cost function to avoid over-fitting. Defaults to

1.6 for the sporadic mode;

0.4 for the constant mode.

Returns:

A tuple of two DataFrames

the 1st DateFrame stores forecast values.
the 2nd DataFrame stores related statistics.

Examples

Time-series data for intermittent forecast:

>>> data.collect()
    ID  RAWDATA
  1      0.0
  2      1.0
  3      4.0
  4      0.0
  5      0.0
  6      0.0
  7      5.0
  8      3.0
  9      0.0
 10      0.0
11      0.0

Apply intermittent forecast to the given time-series data:

>>> forecasts, stats = intermittent_forecast(data=data, p=3, forecast_num=3,
...                                          optimizer='lbfgsb_grid', grid_size=20,
...                                          optimize_step = 0.011, expost_flag=False,
...                                          accuracy_measure='mse', ignore_zero=False,
...                                          thread_ratio=0.5)

Check the output DataFrames:

>>> forecasts.collect()
   ID   RAWDATA
0  12  2.831169
1  13  2.831169
2  14  2.831169
>>> stats.collect()
       STAT_NAME  STAT_VALUE
0            MSE   10.650383
1    LAST_DEMAND    3.892857
2  LAST_INTERVAL    0.000000
3          OPT_P    3.000000
4          OPT_Q    0.000000
5      OPT_STATE    0.000000