score()
applies a selection of scoring metrics to a forecast
object (a data.table with forecasts and observations) (see as_forecast()
).
score()
is a generic that dispatches to different methods depending on the
class of the input data.
See the Forecast types and input formats section for more information on forecast types and input formats. For additional help and examples, check out the Getting Started Vignette as well as the paper Evaluating Forecasts with scoringutils in R.
Usage
score(forecast, metrics, ...)
# S3 method for class 'forecast_binary'
score(forecast, metrics = get_metrics(forecast), ...)
# S3 method for class 'forecast_nominal'
score(forecast, metrics = get_metrics(forecast), ...)
# S3 method for class 'forecast_point'
score(forecast, metrics = get_metrics(forecast), ...)
# S3 method for class 'forecast_sample'
score(forecast, metrics = get_metrics(forecast), ...)
# S3 method for class 'forecast_quantile'
score(forecast, metrics = get_metrics(forecast), ...)
Arguments
- forecast
A forecast object (a validated data.table with predicted and observed values, see
as_forecast()
).- metrics
A named list of scoring functions. Names will be used as column names in the output. See
get_metrics()
for more information on the default metrics used. See the Customising metrics section below for information on how to pass custom arguments to scoring functions.- ...
Currently unused. You cannot pass additional arguments to scoring functions via
...
. See the Customising metrics section below for details on how to usepurrr::partial()
to pass arguments to individual metrics.
Value
An object of class scores
. This object is a data.table with
unsummarised scores (one score per forecast) and has an additional attribute
metrics
with the names of the metrics used for scoring. See
summarise_scores()
) for information on how to summarise
scores.
Details
Customising metrics
If you want to pass arguments to a scoring function, you need change the
scoring function itself via e.g. purrr::partial()
and pass an updated list
of functions with your custom metric to the metrics
argument in score()
.
For example, to use interval_coverage()
with interval_range = 90
, you
would define a new function, e.g.
interval_coverage_90 <- purrr::partial(interval_coverage, interval_range = 90)
and pass this new function to metrics
in score()
.
Note that if you want to pass a variable as an argument, you can
unquote it with !!
to make sure the value is evaluated only once when the
function is created. Consider the following example:
Forecast types and input formats
Various different forecast types / forecast formats are supported. At the moment, those are:
point forecasts
binary forecasts ("soft binary classification")
nominal forecasts ("soft classification with multiple unordered classes")
Probabilistic forecasts in a quantile-based format (a forecast is represented as a set of predictive quantiles)
Probabilistic forecasts in a sample-based format (a forecast is represented as a set of predictive samples)
Forecast types are determined based on the columns present in the input data. Here is an overview of the required format for each forecast type:
All forecast types require a data.frame or similar with columns observed
predicted
, and model
.
Point forecasts require a column observed
of type numeric and a column
predicted
of type numeric.
Binary forecasts require a column observed
of type factor with exactly
two levels and a column predicted
of type numeric with probabilities,
corresponding to the probability that observed
is equal to the second
factor level. See details here for more information.
Nominal forecasts require a column observed
of type factor with N levels,
(where N is the number of possible outcomes), a column predicted
of type
numeric with probabilities (which sum to one across all possible outcomes),
and a column predicted_label
of type factor with N levels, denoting the
outcome for which a probability is given. Forecasts must be complete, i.e.
there must be a probability assigned to every possible outcome.
Quantile-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column quantile_level
of type
numeric with quantile-levels (between 0 and 1).
Sample-based forecasts require a column observed
of type numeric,
a column predicted
of type numeric, and a column sample_id
of type
numeric with sample indices.
For more information see the vignettes and the example data
(example_quantile, example_sample_continuous, example_sample_discrete,
example_point()
, example_binary, and example_nominal).
Forecast unit
In order to score forecasts, scoringutils
needs to know which of the rows
of the data belong together and jointly form a single forecasts. This is
easy e.g. for point forecast, where there is one row per forecast. For
quantile or sample-based forecasts, however, there are multiple rows that
belong to a single forecast.
The forecast unit or unit of a single forecast is then described by the
combination of columns that uniquely identify a single forecast.
For example, we could have forecasts made by different models in various
locations at different time points, each for several weeks into the future.
The forecast unit could then be described as
forecast_unit = c("model", "location", "forecast_date", "forecast_horizon")
.
scoringutils
automatically tries to determine the unit of a single
forecast. It uses all existing columns for this, which means that no columns
must be present that are unrelated to the forecast unit. As a very simplistic
example, if you had an additional row, "even", that is one if the row number
is even and zero otherwise, then this would mess up scoring as scoringutils
then thinks that this column was relevant in defining the forecast unit.
In order to avoid issues, we recommend setting the forecast unit explicitly,
usually through the forecast_unit
argument in the as_forecast()
functions. This will drop unneeded columns, while making sure that all
necessary, 'protected columns' like "predicted" or "observed" are retained.
References
Bosse NI, Gruson H, Cori A, van Leeuwen E, Funk S, Abbott S (2022) Evaluating Forecasts with scoringutils in R. doi:10.48550/arXiv.2205.07090
Author
Nikos Bosse nikosbosse@gmail.com
Examples
library(magrittr) # pipe operator
validated <- as_forecast_quantile(example_quantile)
#> ℹ Some rows containing NA values may be removed. This is fine if not
#> unexpected.
score(validated) %>%
summarise_scores(by = c("model", "target_type"))
#> model target_type wis overprediction underprediction
#> <char> <char> <num> <num> <num>
#> 1: EuroCOVIDhub-ensemble Cases 17943.82383 10043.121943 4237.177310
#> 2: EuroCOVIDhub-baseline Cases 28483.57465 14096.100883 10284.972826
#> 3: epiforecasts-EpiNow2 Cases 20831.55662 11906.823030 3260.355639
#> 4: EuroCOVIDhub-ensemble Deaths 41.42249 7.138247 4.103261
#> 5: EuroCOVIDhub-baseline Deaths 159.40387 65.899117 2.098505
#> 6: UMass-MechBayes Deaths 52.65195 8.978601 16.800951
#> 7: epiforecasts-EpiNow2 Deaths 66.64282 18.892583 15.893314
#> dispersion bias interval_coverage_50 interval_coverage_90 ae_median
#> <num> <num> <num> <num> <num>
#> 1: 3663.52458 -0.05640625 0.3906250 0.8046875 24101.07031
#> 2: 4102.50094 0.09796875 0.3281250 0.8203125 38473.60156
#> 3: 5664.37795 -0.07890625 0.4687500 0.7890625 27923.81250
#> 4: 30.18099 0.07265625 0.8750000 1.0000000 53.13281
#> 5: 91.40625 0.33906250 0.6640625 1.0000000 233.25781
#> 6: 26.87239 -0.02234375 0.4609375 0.8750000 78.47656
#> 7: 31.85692 -0.00512605 0.4201681 0.9075630 104.74790
# set forecast unit manually (to avoid issues with scoringutils trying to
# determine the forecast unit automatically)
example_quantile %>%
as_forecast_quantile(
forecast_unit = c(
"location", "target_end_date", "target_type", "horizon", "model"
)
) %>%
score()
#> ℹ Some rows containing NA values may be removed. This is fine if not
#> unexpected.
#> location target_end_date target_type horizon model
#> <char> <Date> <char> <num> <char>
#> 1: DE 2021-05-08 Cases 1 EuroCOVIDhub-ensemble
#> 2: DE 2021-05-08 Cases 1 EuroCOVIDhub-baseline
#> 3: DE 2021-05-08 Cases 1 epiforecasts-EpiNow2
#> 4: DE 2021-05-08 Deaths 1 EuroCOVIDhub-ensemble
#> 5: DE 2021-05-08 Deaths 1 EuroCOVIDhub-baseline
#> ---
#> 883: IT 2021-07-24 Deaths 2 EuroCOVIDhub-baseline
#> 884: IT 2021-07-24 Deaths 3 UMass-MechBayes
#> 885: IT 2021-07-24 Deaths 2 UMass-MechBayes
#> 886: IT 2021-07-24 Deaths 3 epiforecasts-EpiNow2
#> 887: IT 2021-07-24 Deaths 2 epiforecasts-EpiNow2
#> wis overprediction underprediction dispersion bias
#> <num> <num> <num> <num> <num>
#> 1: 7990.854783 2.549870e+03 0.0000000 5440.985217 0.50
#> 2: 16925.046957 1.527583e+04 0.0000000 1649.220870 0.95
#> 3: 25395.960870 1.722226e+04 0.0000000 8173.700000 0.90
#> 4: 53.880000 0.000000e+00 0.6086957 53.271304 -0.10
#> 5: 46.793043 2.130435e+00 0.0000000 44.662609 0.30
#> ---
#> 883: 80.336957 3.608696e+00 0.0000000 76.728261 0.20
#> 884: 4.881739 4.347826e-02 0.0000000 4.838261 0.10
#> 885: 25.581739 1.782609e+01 0.0000000 7.755652 0.80
#> 886: 19.762609 5.478261e+00 0.0000000 14.284348 0.50
#> 887: 66.161739 4.060870e+01 0.0000000 25.553043 0.90
#> interval_coverage_50 interval_coverage_90 ae_median
#> <lgcl> <lgcl> <num>
#> 1: TRUE TRUE 12271
#> 2: FALSE FALSE 25620
#> 3: FALSE TRUE 44192
#> 4: TRUE TRUE 14
#> 5: TRUE TRUE 15
#> ---
#> 883: TRUE TRUE 53
#> 884: TRUE TRUE 1
#> 885: FALSE TRUE 46
#> 886: TRUE TRUE 26
#> 887: FALSE TRUE 108
# forecast formats with different metrics
if (FALSE) { # \dontrun{
score(as_forecast_binary(example_binary))
score(as_forecast_quantile(example_quantile))
score(as_forecast_point(example_point))
score(as_forecast_sample(example_sample_discrete))
score(as_forecast_sample(example_sample_continuous))
} # }