interval_quality
- hana_ml.algorithms.pal.stats.interval_quality(data, significance_level, score_type=None, ave_abs_error=None, percent=None, check_consistency=None, thread_ratio=None)
Provides a method to evaluate the quality of interval forecasts, which defined as:
\[score_{\alpha} = (U-L) + \frac{2}{\alpha}(L-V)1_{V<L} + \frac{2}{\alpha}(V-U)1_{V>U}\]where V denotes the true value, U and L are upper and lower bounds of individual prediction respectively, while the \(\alpha\) represents the significance level for the prediction interval(i.e. 1-\(\alpha\) is the nominial coverage of the prediction interval), and \(1_{*}\) is simply the indicator function for condition *.
The computed quality measurements are most favorable for comparing the quality of prediction intervals of the same testing dataset using different approaches, yet under the same significance(or equivalently, coverage) level.
- Parameters:
- dataDataFrame
Input data for the prediction interval quality evaluation.
This DataFrame must be structured as follows:
1st column : type INT, VARCHAR or NVARCHAR, record ID.
2nd column : type DOUBLE or DECIMAL(p,s), true value.
3rd column : type DOUBLE or DECIMAL(p,s), lower bound of the prediction interval.
4th column : type DOUBLE or DECIMAL(p,s), upper bound of the prediction interval.
- significance_levelfloat
Specifies the significance level of prediction intervals, i.e. nominial probability that the true value falling outside the prediction intervals.
- score_type{'classical', 'msis'}, optional
Specifies the type of the interval score.
'classical': classical interval score
'msis': mean-scaled interval score
Defaults to 'classical'.
- ave_abs_errorfloat, optional
Specifies the factor to be divided by the classical interval score to obtain the mean-scaled interval score. So this parameter is valid only when
score_type
is specified as 'msis'.Defaults to 1.0.
- percentbool, optional
Specifies whether or not to output the result table in percentage format.
True : Output the result in percentage format.
False : Do not output he result in percentage format.
Defaults to False.
- check_consistencybool, optional
Specifies whether nor not to check the consistency of the prediction intervals.
True : Check the consistency of prediction intervals.
False : Do not check the consistency of prediction intervals.
Defaults to True.
- Returns:
- DataFrames
DataFrame 1 : interval score result, structured as follows:
1st column : Same name and dtype as the 1st column of
data
.2nd column : SCORE, type DOUBLE, the interval score.
3rd column : REASON, type NVARCHAR, explaining the interval score by dispersion, upper_score, and lower_score components, in json format.
DataFrame 2 : statistics for interval score.
Examples
>>> df.collect() ID TRUE LOWER UPPER 0 0 5.0 1.0 10.0 ... 9 9 14.0 10.0 19.0 >>> res, stat = interval_quality(data=df, significance_level=0.1, score_type='classical', percent=False, check_consistency=True) >>> res.collect() ID SCORE REASON 0 0 9.0 {"dispersion":9.0,"lower_score":0.0,"upper_sco... 1 1 9.0 {"dispersion":9.0,"lower_score":0.0,"upper_sco... ... 8 8 9.0 {"dispersion":9.0,"lower_score":0.0,"upper_sco... 9 9 9.0 {"dispersion":9.0,"lower_score":0.0,"upper_sco... >>> stat.collect() STAT_NAME STAT_VALUE 0 TOTAL_SCORE 110 1 COVERAGE 0.9 2 ACD 0