Skip to contents

Proper Scoring Rule to score quantile predictions, following Gneiting and Raftery (2007). Smaller values are better.

The score is computed as

$$ \textrm{score} = (\textrm{upper} - \textrm{lower}) + \frac{2}{\alpha}(\textrm{lower} - \textrm{true\_value}) * \mathbf{1}(\textrm{true\_value} < \textrm{lower}) + \frac{2}{\alpha}(\textrm{true\_value} - \textrm{upper}) * \mathbf{1}(\textrm{true\_value} > \textrm{upper}) $$ where \(\mathbf{1}()\) is the indicator function and indicates how much is outside the prediction interval. \(\alpha\) is the decimal value that indicates how much is outside the prediction interval.

To improve usability, the user is asked to provide an interval range in percentage terms, i.e. interval_range = 90 (percent) for a 90 percent prediction interval. Correspondingly, the user would have to provide the 5% and 95% quantiles (the corresponding alpha would then be 0.1). No specific distribution is assumed, but the range has to be symmetric (i.e you can't use the 0.1 quantile as the lower bound and the 0.7 quantile as the upper). Non-symmetric quantiles can be scored using the function quantile_score().

Usage

interval_score(
  true_values,
  lower,
  upper,
  interval_range,
  weigh = TRUE,
  separate_results = FALSE
)

Arguments

true_values

A vector with the true observed values of size n

lower

vector of size n with the prediction for the lower quantile of the given range

upper

vector of size n with the prediction for the upper quantile of the given range

interval_range

the range of the prediction intervals. i.e. if you're forecasting the 0.05 and 0.95 quantile, the interval_range would be 90. Can be either a single number or a vector of size n, if the range changes for different forecasts to be scored. This corresponds to (100-alpha)/100 in Gneiting and Raftery (2007). Internally, the range will be transformed to alpha.

weigh

if TRUE, weigh the score by alpha / 2, so it can be averaged into an interval score that, in the limit, corresponds to CRPS. Alpha is the decimal value that represents how much is outside a central prediction interval (e.g. for a 90 percent central prediction interval, alpha is 0.1) Default: TRUE.

separate_results

if TRUE (default is FALSE), then the separate parts of the interval score (dispersion penalty, penalties for over- and under-prediction get returned as separate elements of a list). If you want a data.frame instead, simply call as.data.frame() on the output.

Value

vector with the scoring values, or a list with separate entries if separate_results is TRUE.

References

Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007 - Issue 477

Evaluating epidemic forecasts in an interval format, Johannes Bracher, Evan L. Ray, Tilmann Gneiting and Nicholas G. Reich, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008618 # nolint

Examples

true_values <- rnorm(30, mean = 1:30)
interval_range <- rep(90, 30)
alpha <- (100 - interval_range) / 100
lower <- qnorm(alpha / 2, rnorm(30, mean = 1:30))
upper <- qnorm((1 - alpha / 2), rnorm(30, mean = 1:30))

interval_score(
  true_values = true_values,
  lower = lower,
  upper = upper,
  interval_range = interval_range
)
#>  [1] 0.12152111 0.30674862 0.61630650 0.31762968 0.53419475 0.13568351
#>  [7] 0.12787033 0.22441248 0.08377166 0.29999300 0.15130816 0.14739603
#> [13] 0.09337709 0.35942924 0.22186556 0.18394683 0.23957059 0.15963205
#> [19] 0.15330573 0.24674441 1.35733969 0.23488440 0.51859724 1.08285596
#> [25] 0.12168661 0.17193263 0.18634977 0.22495856 0.13530807 0.13661409

# gives a warning, as the interval_range should likely be 50 instead of 0.5
interval_score(true_value = 4, upper = 2, lower = 8, interval_range = 0.5)
#> Warning: Found interval ranges between 0 and 1. Are you sure that's right? An interval range of 0.5 e.g. implies a (49.75%, 50.25%) prediction interval. If you want to score a (25%, 75%) prediction interval, set interval_range = 50.
#> This warning is displayed once per session.
#> [1] 3.015

# example with missing values and separate results
interval_score(
  true_values = c(true_values, NA),
  lower = c(lower, NA),
  upper = c(NA, upper),
  separate_results = TRUE,
  interval_range = 90
)
#> $interval_score
#>  [1]         NA 1.52162591 0.64317756 0.20700373 0.88495702 0.13104189
#>  [7] 0.12284631 0.12560968 0.11982092 0.72877424 0.18193755 0.11929032
#> [13] 0.07761187 0.12917548 0.25629706 0.15241987 0.14835180 0.23454363
#> [19] 1.38269068 0.16095675 0.27343909 2.96528255 0.50406232 0.93629287
#> [25] 0.17754129 0.19969567 0.65548273 0.16266840 0.11212462 0.06803927
#> [31]         NA
#> 
#> $dispersion
#>  [1]         NA 0.10751404 0.15171168 0.20700373 0.17981061 0.13104189
#>  [7] 0.12284631 0.12560968 0.11982092 0.07395199 0.18193755 0.11929032
#> [13] 0.07761187 0.12917548 0.25629706 0.15241987 0.14835180 0.23454363
#> [19] 0.08308640 0.16095675 0.08982921 0.01757765 0.16908912 0.11396869
#> [25] 0.17754129 0.19969567 0.07432238 0.16266840 0.11212462 0.06803927
#> [31]         NA
#> 
#> $underprediction
#>  [1]        NA 1.4141119 0.0000000 0.0000000 0.7051464 0.0000000 0.0000000
#>  [8] 0.0000000 0.0000000 0.6548223 0.0000000 0.0000000 0.0000000 0.0000000
#> [15] 0.0000000 0.0000000 0.0000000 0.0000000 1.2996043 0.0000000 0.1836099
#> [22] 2.9477049 0.0000000 0.0000000 0.0000000 0.0000000 0.5811604 0.0000000
#> [29] 0.0000000 0.0000000        NA
#> 
#> $overprediction
#>  [1] 0.0000000 0.0000000 0.4914659 0.0000000 0.0000000 0.0000000 0.0000000
#>  [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> [22] 0.0000000 0.3349732 0.8223242 0.0000000 0.0000000 0.0000000 0.0000000
#> [29] 0.0000000 0.0000000        NA
#>