interval_quality

hana_ml.algorithms.pal.stats.interval_quality(data, significance_level, score_type=None, ave_abs_error=None, percent=None, check_consistency=None, thread_ratio=None)

Provides a method to evaluate the quality of interval forecasts, which defined as:

\[score_{\alpha} = (U-L) + \frac{2}{\alpha}(L-V)1_{V<L} + \frac{2}{\alpha}(V-U)1_{V>U}\]

where V denotes the true value, U and L are upper and lower bounds of individual prediction respectively, while the \(\alpha\) represents the significance level for the prediction interval(i.e. 1-\(\alpha\) is the nominial coverage of the prediction interval), and \(1_{*}\) is simply the indicator function for condition *.

The computed quality measurements are most favorable for comparing the quality of prediction intervals of the same testing dataset using different approaches, yet under the same significance(or equivalently, coverage) level.

Parameters:

dataDataFrame

Input data for the prediction interval quality evaluation.

This DataFrame must be structured as follows:

1st column : type INT, VARCHAR or NVARCHAR, record ID.
2nd column : type DOUBLE or DECIMAL(p,s), true value.
3rd column : type DOUBLE or DECIMAL(p,s), lower bound of the prediction interval.
4th column : type DOUBLE or DECIMAL(p,s), upper bound of the prediction interval.

significance_levelfloat

Specifies the significance level of prediction intervals, i.e. nominial probability that the true value falling outside the prediction intervals.

score_type{'classical', 'msis'}, optional

Specifies the type of the interval score.

'classical': classical interval score
'msis': mean-scaled interval score

Defaults to 'classical'.

ave_abs_errorfloat, optional

Specifies the factor to be divided by the classical interval score to obtain the mean-scaled interval score. So this parameter is valid only when score_type is specified as 'msis'.

Defaults to 1.0.

percentbool, optional

Specifies whether or not to output the result table in percentage format.

True : Output the result in percentage format.
False : Do not output he result in percentage format.

Defaults to False.

check_consistencybool, optional

Specifies whether nor not to check the consistency of the prediction intervals.

True : Check the consistency of prediction intervals.
False : Do not check the consistency of prediction intervals.

Defaults to True.

Returns:

DataFrames

DataFrame 1 : interval score result, structured as follows:

1st column : Same name and dtype as the 1st column of data.
2nd column : SCORE, type DOUBLE, the interval score.
3rd column : REASON, type NVARCHAR, explaining the interval score by dispersion, upper_score, and lower_score components, in json format.

DataFrame 2 : statistics for interval score.

Examples

>>> df.collect()
   ID  TRUE  LOWER  UPPER
0   0   5.0    1.0   10.0
...
9   9  14.0   10.0   19.0
>>> res, stat = interval_quality(data=df,
                                 significance_level=0.1,
                                 score_type='classical',
                                 percent=False,
                                 check_consistency=True)
>>> res.collect()
   ID  SCORE                                             REASON
0   0    9.0  {"dispersion":9.0,"lower_score":0.0,"upper_sco...
1   1    9.0  {"dispersion":9.0,"lower_score":0.0,"upper_sco...
...
8   8    9.0  {"dispersion":9.0,"lower_score":0.0,"upper_sco...
9   9    9.0  {"dispersion":9.0,"lower_score":0.0,"upper_sco...
>>> stat.collect()
     STAT_NAME STAT_VALUE
0  TOTAL_SCORE        110
1     COVERAGE        0.9
2          ACD          0