accuracy_measure

hana_ml.algorithms.pal.tsa.accuracy_measure.accuracy_measure(data, evaluation_metric=None, ignore_zero=None, alpha1=None, alpha2=None, massive=False, group_params=None)

Measures are used to check the accuracy of the forecast made by PAL algorithms.

Parameters:
dataDataFrame

Input data. In single mode:

  • If data contains 2 columns:

    • 1st column : actual data.

    • 2nd column : forecasted data.

  • If data contains 3 columns:

    • 1st column : ID.

    • 2nd column : actual data.

    • 3rd column : forecasted data.

In massive mode (when massive is True):

  • If data contains 3 columns:

    • 1st column : Group ID.

    • 2nd column : actual data.

    • 3rd column : forecasted data.

  • If data contains 4 columns

    • 1st column : Group ID.

    • 2nd column : ID.

    • 3rd column : actual data.

    • 4th column : forecasted data.

evaluation_metricstr or a list of str

Specifies the accuracy measure name(s), with valid options listed as follows:

  • 'mpe': mean percentage error (MPE)

  • 'mse': mean square error (MSE)

  • 'rmse': root mean square error (RMSE)

  • 'et': error total (ET)

  • 'mad': mean absolute deviation (MAD)

  • 'mase': out-of-sample mean absolute scaled error (MASE)

  • 'wmape': weighted mean absolute percentage error (WMAPE)

  • 'smape': symmetric mean absolute percentage error (SMAPE)

  • 'mape': mean absolute percentage error (MAPE)

  • 'spec': stock-keeping-oriented prediction error costs (SPEC)

Note

In single mode, if evaluation_metric is specified as 'spec' or contains 'spec' as one of its element, then data must have 3 columns (i.e. contain an ID column). In massive mode, similarly, data must have 4 columns (i.e. contain a Group ID column and an ID column)

ignore_zerobool, optional

Specifies whether or not to ignore zero values in data when calculating MPE or MAPE.

Valid only when 'mpe' or 'mape' is specified/included in evaluation_metric.

Defaults to False, i.e. use the zero values in data when calculating MPE or MAPE.

alpha1float, optional

Specifies unit opportunity cost parameter in SPEC measure, should be no less than 0.

Valid only when 'spec' is specified/included in evaluation_metric.

Defaults to 0.5.

alpha2float, optional

Specifies the unit stock-keeping cost parameter in SPEC measure, should be no less than 0.

Valid only when 'spec' is specified/included in evaluation_metric.

Defaults to 0.5.

massivebool, optional

Specifies whether or not to use massive mode.

  • True : massive mode.

  • False : single mode.

For parameter setting in massive mode, you could use both group_params (please see the example below) or the original parameters. Using original parameters will apply for all groups. However, if you define some parameters of a group, the value of all original parameter setting will be not applicable to such group.

An example is as follows:

In this example, as alpha1 and evaluation_metricis is set in group_params for Group_1, alpha2 and evaluation_metricis not applicable to Group_1.

Defaults to False.

group_paramsdict, optional

If massive mode is activated (massive is True), input data for accuracy_measure shall be divided into different groups with different parameters applied. This parameter specifies the parameter values of different groups in a dict format, where keys corresponding to group ids while values should be a dict for parameter value assignments.

An example is as follows:

Valid only when massive is True and defaults to None.

Returns:
DataFrame 1

Result of the forecast accuracy measurement, structured as follows:

  • STAT_NAME: Name of accuracy measures.

  • STAT_VALUE: Value of accuracy measures.

DataFrame 2 (optional)

Error message. Only valid if massive is True.

Examples

Input data df:

>>> df.collect()
    ACTUAL  FORECAST
0   1130.0    1270.0
1   2410.0    2340.0
2   2210.0    2310.0
3   2500.0    2340.0
4   2432.0    2348.0
5   1980.0    1890.0
6   2045.0    2100.0
7   2340.0    2231.0
8   2460.0    2401.0
9   2350.0    2310.0
10  2345.0    2340.0
11  2650.0    2560.0

Perform accuracy measurement:

>>> res = accuracy_measure(data=df,
                           evaluation_metric=['mse', 'rmse', 'mpe', 'et',
                                              'mad', 'mase', 'wmape', 'smape',
                                              'mape'])
>>> res.collect()
  STAT_NAME   STAT_VALUE
0        ET   412.000000
1       MAD    83.500000
2      MAPE     0.041063
3      MASE     0.287931
4       MPE     0.008390
5       MSE  8614.000000
6      RMSE    92.811637
7     SMAPE     0.040876
8     WMAPE     0.037316