Summarise scores as produced by score()
Usage
summarise_scores(
scores,
by = NULL,
across = NULL,
fun = mean,
relative_skill = FALSE,
relative_skill_metric = "auto",
metric = deprecated(),
baseline = NULL,
...
)
summarize_scores(
scores,
by = NULL,
across = NULL,
fun = mean,
relative_skill = FALSE,
relative_skill_metric = "auto",
metric = deprecated(),
baseline = NULL,
...
)
Arguments
- scores
A data.table of scores as produced by
score()
.- by
character vector with column names to summarise scores by. Default is
NULL
, meaning that the only summary that takes is place is summarising over samples or quantiles (in case of quantile-based forecasts), such that there is one score per forecast as defined by the unit of a single forecast (rather than one score for every sample or quantile). The unit of a single forecast is determined by the columns present in the input data that do not correspond to a metric produced byscore()
, which indicate indicate a grouping of forecasts (for example there may be one forecast per day, location and model). Adding additional, unrelated, columns may alter results in an unpredictable way.- across
character vector with column names from the vector of variables that define the unit of a single forecast (see above) to summarise scores across (meaning that the specified columns will be dropped). This is an alternative to specifying
by
directly. IfNULL
(default), thenby
will be used or inferred internally if also not specified. Only one ofacross
andby
may be used at a time.- fun
a function used for summarising scores. Default is
mean
.- relative_skill
logical, whether or not to compute relative performance between models based on pairwise comparisons. If
TRUE
(default isFALSE
), then a column called 'model' must be present in the input data. For more information on the computation of relative skill, seepairwise_comparison()
. Relative skill will be calculated for the aggregation level specified inby
.- relative_skill_metric
character with the name of the metric for which a relative skill shall be computed. If equal to 'auto' (the default), then this will be either interval score, CRPS or Brier score (depending on which of these is available in the input data)
- metric
- baseline
character string with the name of a model. If a baseline is given, then a scaled relative skill with respect to the baseline will be returned. By default (
NULL
), relative skill will not be scaled with respect to a baseline model.- ...
additional parameters that can be passed to the summary function provided to
fun
. For more information see the documentation of the respective function.
Examples
# \dontshow{
data.table::setDTthreads(2) # restricts number of cores used on CRAN
# }
library(magrittr) # pipe operator
scores <- score(example_continuous)
#> The following messages were produced when checking inputs:
#> 1. 144 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
summarise_scores(scores)
#> location location_name target_end_date target_type forecast_date
#> 1: DE Germany 2021-05-08 Cases 2021-05-03
#> 2: DE Germany 2021-05-08 Cases 2021-05-03
#> 3: DE Germany 2021-05-08 Cases 2021-05-03
#> 4: DE Germany 2021-05-08 Deaths 2021-05-03
#> 5: DE Germany 2021-05-08 Deaths 2021-05-03
#> ---
#> 883: IT Italy 2021-07-24 Deaths 2021-07-12
#> 884: IT Italy 2021-07-24 Deaths 2021-07-05
#> 885: IT Italy 2021-07-24 Deaths 2021-07-12
#> 886: IT Italy 2021-07-24 Deaths 2021-07-05
#> 887: IT Italy 2021-07-24 Deaths 2021-07-12
#> model horizon mad bias dss crps
#> 1: EuroCOVIDhub-ensemble 1 17641.24334 0.55 20.601386 7482.975177
#> 2: EuroCOVIDhub-baseline 1 19341.68942 0.95 22.443370 20371.250988
#> 3: epiforecasts-EpiNow2 1 32348.79978 0.80 22.310568 24810.424753
#> 4: EuroCOVIDhub-ensemble 1 267.13585 0.20 11.112388 67.510511
#> 5: EuroCOVIDhub-baseline 1 397.09371 -0.05 11.541040 86.462930
#> ---
#> 883: EuroCOVIDhub-baseline 2 168.03093 0.45 12.913561 66.515150
#> 884: UMass-MechBayes 3 24.76457 0.10 6.353679 6.616381
#> 885: UMass-MechBayes 2 39.96542 0.80 8.765716 29.446723
#> 886: epiforecasts-EpiNow2 3 107.93293 0.10 10.283832 28.863742
#> 887: epiforecasts-EpiNow2 2 93.54245 0.65 10.563729 49.998906
#> log_score ae_median se_mean
#> 1: 10.989568 9413.839675 2.303980e+08
#> 2: 11.941135 29378.807513 9.128538e+08
#> 3: 12.007007 36512.810659 1.603609e+09
#> 4: 6.516954 77.077318 8.593954e+03
#> 5: 6.913879 27.429626 2.123067e+03
#> ---
#> 883: 6.067282 58.606302 6.065218e+04
#> 884: 4.242600 5.071742 6.043161e+01
#> 885: 5.321013 42.271396 2.153122e+03
#> 886: 5.709216 7.760566 3.212150e+03
#> 887: 5.736238 73.670518 1.263027e+04
# summarise over samples or quantiles to get one score per forecast
scores <- score(example_quantile)
#> The following messages were produced when checking inputs:
#> 1. 144 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
summarise_scores(scores)
#> location target_end_date target_type location_name forecast_date
#> 1: DE 2021-05-08 Cases Germany 2021-05-03
#> 2: DE 2021-05-08 Cases Germany 2021-05-03
#> 3: DE 2021-05-08 Cases Germany 2021-05-03
#> 4: DE 2021-05-08 Deaths Germany 2021-05-03
#> 5: DE 2021-05-08 Deaths Germany 2021-05-03
#> ---
#> 883: IT 2021-07-24 Deaths Italy 2021-07-05
#> 884: IT 2021-07-24 Deaths Italy 2021-07-12
#> 885: IT 2021-07-24 Deaths Italy 2021-07-12
#> 886: IT 2021-07-24 Deaths Italy 2021-07-12
#> 887: IT 2021-07-24 Deaths Italy 2021-07-12
#> model horizon interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline 1 16925.04696 1649.220870 0.0000000
#> 2: EuroCOVIDhub-ensemble 1 7990.85478 5440.985217 0.0000000
#> 3: epiforecasts-EpiNow2 1 25395.96087 8173.700000 0.0000000
#> 4: EuroCOVIDhub-baseline 1 46.79304 44.662609 0.0000000
#> 5: EuroCOVIDhub-ensemble 1 53.88000 53.271304 0.6086957
#> ---
#> 883: epiforecasts-EpiNow2 3 19.76261 14.284348 0.0000000
#> 884: EuroCOVIDhub-baseline 2 80.33696 76.728261 0.0000000
#> 885: EuroCOVIDhub-ensemble 2 18.65870 13.354348 0.0000000
#> 886: UMass-MechBayes 2 25.58174 7.755652 0.0000000
#> 887: epiforecasts-EpiNow2 2 66.16174 25.553043 0.0000000
#> overprediction coverage_deviation bias ae_median
#> 1: 15275.826087 -0.38521739 0.95 25620
#> 2: 2549.869565 0.04956522 0.50 12271
#> 3: 17222.260870 -0.29826087 0.90 44192
#> 4: 2.130435 0.22347826 0.30 15
#> 5: 0.000000 0.39739130 -0.10 14
#> ---
#> 883: 5.478261 0.04956522 0.50 26
#> 884: 3.608696 0.31043478 0.20 53
#> 885: 5.304348 0.13652174 0.40 30
#> 886: 17.826087 -0.21130435 0.80 46
#> 887: 40.608696 -0.29826087 0.90 108
# get scores by model
summarise_scores(scores,by = "model")
#> model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline 14321.48926 2096.95360 5143.53567
#> 2: EuroCOVIDhub-ensemble 8992.62316 1846.85278 2120.64029
#> 3: epiforecasts-EpiNow2 10827.40786 2950.73422 1697.23411
#> 4: UMass-MechBayes 52.65195 26.87239 16.80095
#> overprediction coverage_deviation bias ae_median
#> 1: 7081.000000 0.00201087 0.21851562 19353.42969
#> 2: 5025.130095 0.04871603 0.00812500 12077.10156
#> 3: 6179.439535 -0.05516986 -0.04336032 14521.10526
#> 4: 8.978601 -0.02312500 -0.02234375 78.47656
# get scores by model and target type
summarise_scores(scores, by = c("model", "target_type"))
#> model target_type interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline Cases 28483.57465 4102.50094 10284.972826
#> 2: EuroCOVIDhub-ensemble Cases 17943.82383 3663.52458 4237.177310
#> 3: epiforecasts-EpiNow2 Cases 20831.55662 5664.37795 3260.355639
#> 4: EuroCOVIDhub-baseline Deaths 159.40387 91.40625 2.098505
#> 5: EuroCOVIDhub-ensemble Deaths 41.42249 30.18099 4.103261
#> 6: UMass-MechBayes Deaths 52.65195 26.87239 16.800951
#> 7: epiforecasts-EpiNow2 Deaths 66.64282 31.85692 15.893314
#> overprediction coverage_deviation bias ae_median
#> 1: 14096.100883 -0.11211957 0.09796875 38473.60156
#> 2: 10043.121943 -0.09785326 -0.05640625 24101.07031
#> 3: 11906.823030 -0.06660326 -0.07890625 27923.81250
#> 4: 65.899117 0.11614130 0.33906250 233.25781
#> 5: 7.138247 0.19528533 0.07265625 53.13281
#> 6: 8.978601 -0.02312500 -0.02234375 78.47656
#> 7: 18.892583 -0.04287176 -0.00512605 104.74790
# Get scores summarised across horizon, forecast date, and target end date
summarise_scores(
scores, across = c("horizon", "forecast_date", "target_end_date")
)
#> location target_type location_name model interval_score
#> 1: DE Cases Germany EuroCOVIDhub-baseline 14506.65500
#> 2: DE Cases Germany EuroCOVIDhub-ensemble 6286.66495
#> 3: DE Cases Germany epiforecasts-EpiNow2 11684.72865
#> 4: DE Deaths Germany EuroCOVIDhub-baseline 155.91235
#> 5: DE Deaths Germany EuroCOVIDhub-ensemble 44.46077
#> 6: DE Deaths Germany UMass-MechBayes 68.91583
#> 7: DE Deaths Germany epiforecasts-EpiNow2 93.33921
#> 8: FR Cases France EuroCOVIDhub-baseline 54147.68308
#> 9: FR Cases France EuroCOVIDhub-ensemble 44537.04769
#> 10: FR Cases France epiforecasts-EpiNow2 50141.70361
#> 11: FR Deaths France EuroCOVIDhub-baseline 187.63842
#> 12: FR Deaths France EuroCOVIDhub-ensemble 56.70677
#> 13: FR Deaths France UMass-MechBayes 74.17250
#> 14: FR Deaths France epiforecasts-EpiNow2 96.69760
#> 15: GB Cases United Kingdom EuroCOVIDhub-baseline 36032.52912
#> 16: GB Cases United Kingdom EuroCOVIDhub-ensemble 16010.55516
#> 17: GB Cases United Kingdom epiforecasts-EpiNow2 18303.21825
#> 18: GB Deaths United Kingdom EuroCOVIDhub-baseline 95.17583
#> 19: GB Deaths United Kingdom EuroCOVIDhub-ensemble 20.04649
#> 20: GB Deaths United Kingdom UMass-MechBayes 33.45330
#> 21: GB Deaths United Kingdom epiforecasts-EpiNow2 25.77830
#> 22: IT Cases Italy EuroCOVIDhub-baseline 9247.43141
#> 23: IT Cases Italy EuroCOVIDhub-ensemble 4941.02753
#> 24: IT Cases Italy epiforecasts-EpiNow2 3196.57595
#> 25: IT Deaths Italy EuroCOVIDhub-baseline 198.88887
#> 26: IT Deaths Italy EuroCOVIDhub-ensemble 44.47594
#> 27: IT Deaths Italy UMass-MechBayes 34.06615
#> 28: IT Deaths Italy epiforecasts-EpiNow2 59.20908
#> location target_type location_name model interval_score
#> dispersion underprediction overprediction coverage_deviation bias
#> 1: 2923.63054 1.147921e+02 1.146823e+04 -0.08630435 0.49062500
#> 2: 2286.83071 4.768886e+02 3.522946e+03 -0.06184783 0.17375000
#> 3: 3382.48817 9.200027e+02 7.382238e+03 -0.14608696 0.13375000
#> 4: 78.26697 0.000000e+00 7.764538e+01 0.02510870 0.52812500
#> 5: 35.27327 4.533967e+00 4.653533e+00 0.23842391 -0.06875000
#> 6: 31.75822 3.715761e+01 0.000000e+00 -0.07271739 -0.63312500
#> 7: 44.37726 3.803397e+01 1.092799e+01 -0.06184783 -0.45937500
#> 8: 6294.16678 8.578344e+03 3.927517e+04 -0.13521739 0.16250000
#> 9: 4834.10747 6.054179e+03 3.364876e+04 -0.13521739 0.02281250
#> 10: 7852.48758 7.281398e+03 3.500782e+04 -0.06456522 -0.05718750
#> 11: 101.19685 0.000000e+00 8.644158e+01 0.07945652 0.46562500
#> 12: 41.62796 7.942935e+00 7.135870e+00 0.20445652 0.06562500
#> 13: 40.59505 2.091984e+01 1.265761e+01 0.05635870 0.08812500
#> 14: 60.31008 1.061815e+01 2.576938e+01 0.02310019 -0.07173913
#> 15: 4300.45982 3.166997e+04 6.210326e+01 -0.18141304 -0.63156250
#> 16: 6016.72500 8.891671e+03 1.102159e+03 -0.09445652 -0.50937500
#> 17: 10391.44923 3.617322e+03 4.294447e+03 0.01695652 -0.28125000
#> 18: 86.38914 8.266304e+00 5.203804e-01 0.29956522 -0.10625000
#> 19: 16.20003 1.241848e+00 2.604620e+00 0.19358696 0.15937500
#> 20: 14.34596 4.891304e-02 1.905842e+01 -0.12978261 0.64468750
#> 21: 13.71580 1.086957e-02 1.205163e+01 -0.01701087 0.54875000
#> 22: 2891.74663 7.767894e+02 5.578895e+03 -0.04554348 0.37031250
#> 23: 1516.43514 1.525970e+03 1.898622e+03 -0.09989130 0.08718750
#> 24: 1031.08682 1.222700e+03 9.427894e+02 -0.07271739 -0.11093750
#> 25: 99.77202 1.277174e-01 9.898913e+01 0.06043478 0.46875000
#> 26: 27.62268 2.694293e+00 1.415897e+01 0.14467391 0.13437500
#> 27: 20.79034 9.077446e+00 4.198370e+00 0.05364130 -0.18906250
#> 28: 17.02701 1.342663e+01 2.875543e+01 -0.09717391 -0.05687500
#> dispersion underprediction overprediction coverage_deviation bias
#> ae_median
#> 1: 21109.81250
#> 2: 9949.12500
#> 3: 18315.09375
#> 4: 261.03125
#> 5: 51.93750
#> 6: 124.50000
#> 7: 154.62500
#> 8: 67726.87500
#> 9: 52432.93750
#> 10: 60549.84375
#> 11: 293.59375
#> 12: 72.12500
#> 13: 88.53125
#> 14: 143.00000
#> 15: 49934.03125
#> 16: 26422.84375
#> 17: 27901.06250
#> 18: 59.03125
#> 19: 22.96875
#> 20: 55.34375
#> 21: 40.62500
#> 22: 15123.68750
#> 23: 7599.37500
#> 24: 4929.25000
#> 25: 319.37500
#> 26: 65.50000
#> 27: 45.53125
#> 28: 91.50000
#> ae_median
# get standard deviation
summarise_scores(scores, by = "model", fun = sd)
#> model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline 43157.488 2682.7722 18417.2012
#> 2: EuroCOVIDhub-ensemble 38788.740 3863.3379 7179.5954
#> 3: epiforecasts-EpiNow2 42478.208 6698.7470 7629.9180
#> 4: UMass-MechBayes 42.813 25.2045 30.8519
#> overprediction coverage_deviation bias ae_median
#> 1: 38723.6131 0.2589550 0.5628605 50186.94967
#> 2: 36978.7730 0.2726646 0.5707267 42500.79359
#> 3: 38832.5598 0.2838865 0.6632842 49800.66917
#> 4: 18.3498 0.2705116 0.6457779 74.79904
# round digits
summarise_scores(scores,by = "model") %>%
summarise_scores(fun = signif, digits = 2)
#> model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline 14000 2100 5100
#> 2: EuroCOVIDhub-ensemble 9000 1800 2100
#> 3: epiforecasts-EpiNow2 11000 3000 1700
#> 4: UMass-MechBayes 53 27 17
#> overprediction coverage_deviation bias ae_median
#> 1: 7100 0.002 0.2200 19000
#> 2: 5000 0.049 0.0081 12000
#> 3: 6200 -0.055 -0.0430 15000
#> 4: 9 -0.023 -0.0220 78
# get quantiles of scores
# make sure to aggregate over ranges first
summarise_scores(scores,
by = "model", fun = quantile,
probs = c(0.25, 0.5, 0.75)
)
#> model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline 130.23793 87.81804 0.00000000
#> 2: EuroCOVIDhub-baseline 934.41870 163.28739 0.00000000
#> 3: EuroCOVIDhub-baseline 10795.61967 3476.52750 4.42391304
#> 4: EuroCOVIDhub-ensemble 28.33707 23.67087 0.00000000
#> 5: EuroCOVIDhub-ensemble 300.30739 168.95543 0.00000000
#> 6: EuroCOVIDhub-ensemble 5532.28565 2030.41870 44.10869565
#> 7: epiforecasts-EpiNow2 50.94935 16.90326 0.00000000
#> 8: epiforecasts-EpiNow2 442.81696 152.12565 0.08695652
#> 9: epiforecasts-EpiNow2 5902.85957 2033.24370 155.76086957
#> 10: UMass-MechBayes 18.49435 7.73837 0.00000000
#> 11: UMass-MechBayes 46.05413 17.15413 0.02173913
#> 12: UMass-MechBayes 73.27087 38.04620 18.41304348
#> overprediction coverage_deviation bias ae_median
#> 1: 0.0000000 -0.1243478 -0.20 166.00
#> 2: 54.1304348 -0.0373913 0.40 763.50
#> 3: 496.7065217 0.2234783 0.60 17709.00
#> 4: 0.0000000 -0.1243478 -0.40 27.00
#> 5: 0.2826087 0.1365217 0.10 277.00
#> 6: 43.5217391 0.2452174 0.40 8100.00
#> 7: 0.0000000 -0.2982609 -0.70 68.00
#> 8: 0.0000000 -0.0373913 -0.10 404.00
#> 9: 45.6956522 0.1365217 0.60 7177.00
#> 10: 0.0000000 -0.2113043 -0.60 24.75
#> 11: 0.0000000 -0.0373913 -0.05 52.00
#> 12: 10.2500000 0.2234783 0.60 106.00
# get ranges
# summarise_scores(scores, by = "range")