Skip to contents

Summarise scores as produced by score()

Usage

summarise_scores(
  scores,
  by = NULL,
  across = NULL,
  fun = mean,
  relative_skill = FALSE,
  relative_skill_metric = "auto",
  metric = deprecated(),
  baseline = NULL,
  ...
)

summarize_scores(
  scores,
  by = NULL,
  across = NULL,
  fun = mean,
  relative_skill = FALSE,
  relative_skill_metric = "auto",
  metric = deprecated(),
  baseline = NULL,
  ...
)

Arguments

scores

A data.table of scores as produced by score().

by

character vector with column names to summarise scores by. Default is NULL, meaning that the only summary that takes is place is summarising over samples or quantiles (in case of quantile-based forecasts), such that there is one score per forecast as defined by the unit of a single forecast (rather than one score for every sample or quantile). The unit of a single forecast is determined by the columns present in the input data that do not correspond to a metric produced by score(), which indicate indicate a grouping of forecasts (for example there may be one forecast per day, location and model). Adding additional, unrelated, columns may alter results in an unpredictable way.

across

character vector with column names from the vector of variables that define the unit of a single forecast (see above) to summarise scores across (meaning that the specified columns will be dropped). This is an alternative to specifying by directly. If NULL (default), then by will be used or inferred internally if also not specified. Only one of across and by may be used at a time.

fun

a function used for summarising scores. Default is mean.

relative_skill

logical, whether or not to compute relative performance between models based on pairwise comparisons. If TRUE (default is FALSE), then a column called 'model' must be present in the input data. For more information on the computation of relative skill, see pairwise_comparison(). Relative skill will be calculated for the aggregation level specified in by.

relative_skill_metric

character with the name of the metric for which a relative skill shall be computed. If equal to 'auto' (the default), then this will be either interval score, CRPS or Brier score (depending on which of these is available in the input data)

metric

[Deprecated] Deprecated in 1.1.0. Use relative_skill_metric instead.

baseline

character string with the name of a model. If a baseline is given, then a scaled relative skill with respect to the baseline will be returned. By default (NULL), relative skill will not be scaled with respect to a baseline model.

...

additional parameters that can be passed to the summary function provided to fun. For more information see the documentation of the respective function.

Examples

# \dontshow{
  data.table::setDTthreads(2) # restricts number of cores used on CRAN
# }
library(magrittr) # pipe operator

scores <- score(example_continuous)
#> The following messages were produced when checking inputs:
#> 1.  144 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
summarise_scores(scores)
#>      location location_name target_end_date target_type forecast_date
#>   1:       DE       Germany      2021-05-08       Cases    2021-05-03
#>   2:       DE       Germany      2021-05-08       Cases    2021-05-03
#>   3:       DE       Germany      2021-05-08       Cases    2021-05-03
#>   4:       DE       Germany      2021-05-08      Deaths    2021-05-03
#>   5:       DE       Germany      2021-05-08      Deaths    2021-05-03
#>  ---                                                                 
#> 883:       IT         Italy      2021-07-24      Deaths    2021-07-12
#> 884:       IT         Italy      2021-07-24      Deaths    2021-07-05
#> 885:       IT         Italy      2021-07-24      Deaths    2021-07-12
#> 886:       IT         Italy      2021-07-24      Deaths    2021-07-05
#> 887:       IT         Italy      2021-07-24      Deaths    2021-07-12
#>                      model horizon         mad  bias       dss         crps
#>   1: EuroCOVIDhub-ensemble       1 17641.24334  0.55 20.601386  7482.975177
#>   2: EuroCOVIDhub-baseline       1 19341.68942  0.95 22.443370 20371.250988
#>   3:  epiforecasts-EpiNow2       1 32348.79978  0.80 22.310568 24810.424753
#>   4: EuroCOVIDhub-ensemble       1   267.13585  0.20 11.112388    67.510511
#>   5: EuroCOVIDhub-baseline       1   397.09371 -0.05 11.541040    86.462930
#>  ---                                                                       
#> 883: EuroCOVIDhub-baseline       2   168.03093  0.45 12.913561    66.515150
#> 884:       UMass-MechBayes       3    24.76457  0.10  6.353679     6.616381
#> 885:       UMass-MechBayes       2    39.96542  0.80  8.765716    29.446723
#> 886:  epiforecasts-EpiNow2       3   107.93293  0.10 10.283832    28.863742
#> 887:  epiforecasts-EpiNow2       2    93.54245  0.65 10.563729    49.998906
#>      log_score    ae_median      se_mean
#>   1: 10.989568  9413.839675 2.303980e+08
#>   2: 11.941135 29378.807513 9.128538e+08
#>   3: 12.007007 36512.810659 1.603609e+09
#>   4:  6.516954    77.077318 8.593954e+03
#>   5:  6.913879    27.429626 2.123067e+03
#>  ---                                    
#> 883:  6.067282    58.606302 6.065218e+04
#> 884:  4.242600     5.071742 6.043161e+01
#> 885:  5.321013    42.271396 2.153122e+03
#> 886:  5.709216     7.760566 3.212150e+03
#> 887:  5.736238    73.670518 1.263027e+04


# summarise over samples or quantiles to get one score per forecast
scores <- score(example_quantile)
#> The following messages were produced when checking inputs:
#> 1.  144 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
summarise_scores(scores)
#>      location target_end_date target_type location_name forecast_date
#>   1:       DE      2021-05-08       Cases       Germany    2021-05-03
#>   2:       DE      2021-05-08       Cases       Germany    2021-05-03
#>   3:       DE      2021-05-08       Cases       Germany    2021-05-03
#>   4:       DE      2021-05-08      Deaths       Germany    2021-05-03
#>   5:       DE      2021-05-08      Deaths       Germany    2021-05-03
#>  ---                                                                 
#> 883:       IT      2021-07-24      Deaths         Italy    2021-07-05
#> 884:       IT      2021-07-24      Deaths         Italy    2021-07-12
#> 885:       IT      2021-07-24      Deaths         Italy    2021-07-12
#> 886:       IT      2021-07-24      Deaths         Italy    2021-07-12
#> 887:       IT      2021-07-24      Deaths         Italy    2021-07-12
#>                      model horizon interval_score  dispersion underprediction
#>   1: EuroCOVIDhub-baseline       1    16925.04696 1649.220870       0.0000000
#>   2: EuroCOVIDhub-ensemble       1     7990.85478 5440.985217       0.0000000
#>   3:  epiforecasts-EpiNow2       1    25395.96087 8173.700000       0.0000000
#>   4: EuroCOVIDhub-baseline       1       46.79304   44.662609       0.0000000
#>   5: EuroCOVIDhub-ensemble       1       53.88000   53.271304       0.6086957
#>  ---                                                                         
#> 883:  epiforecasts-EpiNow2       3       19.76261   14.284348       0.0000000
#> 884: EuroCOVIDhub-baseline       2       80.33696   76.728261       0.0000000
#> 885: EuroCOVIDhub-ensemble       2       18.65870   13.354348       0.0000000
#> 886:       UMass-MechBayes       2       25.58174    7.755652       0.0000000
#> 887:  epiforecasts-EpiNow2       2       66.16174   25.553043       0.0000000
#>      overprediction coverage_deviation  bias ae_median
#>   1:   15275.826087        -0.38521739  0.95     25620
#>   2:    2549.869565         0.04956522  0.50     12271
#>   3:   17222.260870        -0.29826087  0.90     44192
#>   4:       2.130435         0.22347826  0.30        15
#>   5:       0.000000         0.39739130 -0.10        14
#>  ---                                                  
#> 883:       5.478261         0.04956522  0.50        26
#> 884:       3.608696         0.31043478  0.20        53
#> 885:       5.304348         0.13652174  0.40        30
#> 886:      17.826087        -0.21130435  0.80        46
#> 887:      40.608696        -0.29826087  0.90       108

# get scores by model
summarise_scores(scores,by = "model")
#>                    model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline    14321.48926 2096.95360      5143.53567
#> 2: EuroCOVIDhub-ensemble     8992.62316 1846.85278      2120.64029
#> 3:  epiforecasts-EpiNow2    10827.40786 2950.73422      1697.23411
#> 4:       UMass-MechBayes       52.65195   26.87239        16.80095
#>    overprediction coverage_deviation        bias   ae_median
#> 1:    7081.000000         0.00201087  0.21851562 19353.42969
#> 2:    5025.130095         0.04871603  0.00812500 12077.10156
#> 3:    6179.439535        -0.05516986 -0.04336032 14521.10526
#> 4:       8.978601        -0.02312500 -0.02234375    78.47656

# get scores by model and target type
summarise_scores(scores, by = c("model", "target_type"))
#>                    model target_type interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline       Cases    28483.57465 4102.50094    10284.972826
#> 2: EuroCOVIDhub-ensemble       Cases    17943.82383 3663.52458     4237.177310
#> 3:  epiforecasts-EpiNow2       Cases    20831.55662 5664.37795     3260.355639
#> 4: EuroCOVIDhub-baseline      Deaths      159.40387   91.40625        2.098505
#> 5: EuroCOVIDhub-ensemble      Deaths       41.42249   30.18099        4.103261
#> 6:       UMass-MechBayes      Deaths       52.65195   26.87239       16.800951
#> 7:  epiforecasts-EpiNow2      Deaths       66.64282   31.85692       15.893314
#>    overprediction coverage_deviation        bias   ae_median
#> 1:   14096.100883        -0.11211957  0.09796875 38473.60156
#> 2:   10043.121943        -0.09785326 -0.05640625 24101.07031
#> 3:   11906.823030        -0.06660326 -0.07890625 27923.81250
#> 4:      65.899117         0.11614130  0.33906250   233.25781
#> 5:       7.138247         0.19528533  0.07265625    53.13281
#> 6:       8.978601        -0.02312500 -0.02234375    78.47656
#> 7:      18.892583        -0.04287176 -0.00512605   104.74790

# Get scores summarised across horizon, forecast date, and target end date
summarise_scores(
 scores, across = c("horizon", "forecast_date", "target_end_date")
)
#>     location target_type  location_name                 model interval_score
#>  1:       DE       Cases        Germany EuroCOVIDhub-baseline    14506.65500
#>  2:       DE       Cases        Germany EuroCOVIDhub-ensemble     6286.66495
#>  3:       DE       Cases        Germany  epiforecasts-EpiNow2    11684.72865
#>  4:       DE      Deaths        Germany EuroCOVIDhub-baseline      155.91235
#>  5:       DE      Deaths        Germany EuroCOVIDhub-ensemble       44.46077
#>  6:       DE      Deaths        Germany       UMass-MechBayes       68.91583
#>  7:       DE      Deaths        Germany  epiforecasts-EpiNow2       93.33921
#>  8:       FR       Cases         France EuroCOVIDhub-baseline    54147.68308
#>  9:       FR       Cases         France EuroCOVIDhub-ensemble    44537.04769
#> 10:       FR       Cases         France  epiforecasts-EpiNow2    50141.70361
#> 11:       FR      Deaths         France EuroCOVIDhub-baseline      187.63842
#> 12:       FR      Deaths         France EuroCOVIDhub-ensemble       56.70677
#> 13:       FR      Deaths         France       UMass-MechBayes       74.17250
#> 14:       FR      Deaths         France  epiforecasts-EpiNow2       96.69760
#> 15:       GB       Cases United Kingdom EuroCOVIDhub-baseline    36032.52912
#> 16:       GB       Cases United Kingdom EuroCOVIDhub-ensemble    16010.55516
#> 17:       GB       Cases United Kingdom  epiforecasts-EpiNow2    18303.21825
#> 18:       GB      Deaths United Kingdom EuroCOVIDhub-baseline       95.17583
#> 19:       GB      Deaths United Kingdom EuroCOVIDhub-ensemble       20.04649
#> 20:       GB      Deaths United Kingdom       UMass-MechBayes       33.45330
#> 21:       GB      Deaths United Kingdom  epiforecasts-EpiNow2       25.77830
#> 22:       IT       Cases          Italy EuroCOVIDhub-baseline     9247.43141
#> 23:       IT       Cases          Italy EuroCOVIDhub-ensemble     4941.02753
#> 24:       IT       Cases          Italy  epiforecasts-EpiNow2     3196.57595
#> 25:       IT      Deaths          Italy EuroCOVIDhub-baseline      198.88887
#> 26:       IT      Deaths          Italy EuroCOVIDhub-ensemble       44.47594
#> 27:       IT      Deaths          Italy       UMass-MechBayes       34.06615
#> 28:       IT      Deaths          Italy  epiforecasts-EpiNow2       59.20908
#>     location target_type  location_name                 model interval_score
#>      dispersion underprediction overprediction coverage_deviation        bias
#>  1:  2923.63054    1.147921e+02   1.146823e+04        -0.08630435  0.49062500
#>  2:  2286.83071    4.768886e+02   3.522946e+03        -0.06184783  0.17375000
#>  3:  3382.48817    9.200027e+02   7.382238e+03        -0.14608696  0.13375000
#>  4:    78.26697    0.000000e+00   7.764538e+01         0.02510870  0.52812500
#>  5:    35.27327    4.533967e+00   4.653533e+00         0.23842391 -0.06875000
#>  6:    31.75822    3.715761e+01   0.000000e+00        -0.07271739 -0.63312500
#>  7:    44.37726    3.803397e+01   1.092799e+01        -0.06184783 -0.45937500
#>  8:  6294.16678    8.578344e+03   3.927517e+04        -0.13521739  0.16250000
#>  9:  4834.10747    6.054179e+03   3.364876e+04        -0.13521739  0.02281250
#> 10:  7852.48758    7.281398e+03   3.500782e+04        -0.06456522 -0.05718750
#> 11:   101.19685    0.000000e+00   8.644158e+01         0.07945652  0.46562500
#> 12:    41.62796    7.942935e+00   7.135870e+00         0.20445652  0.06562500
#> 13:    40.59505    2.091984e+01   1.265761e+01         0.05635870  0.08812500
#> 14:    60.31008    1.061815e+01   2.576938e+01         0.02310019 -0.07173913
#> 15:  4300.45982    3.166997e+04   6.210326e+01        -0.18141304 -0.63156250
#> 16:  6016.72500    8.891671e+03   1.102159e+03        -0.09445652 -0.50937500
#> 17: 10391.44923    3.617322e+03   4.294447e+03         0.01695652 -0.28125000
#> 18:    86.38914    8.266304e+00   5.203804e-01         0.29956522 -0.10625000
#> 19:    16.20003    1.241848e+00   2.604620e+00         0.19358696  0.15937500
#> 20:    14.34596    4.891304e-02   1.905842e+01        -0.12978261  0.64468750
#> 21:    13.71580    1.086957e-02   1.205163e+01        -0.01701087  0.54875000
#> 22:  2891.74663    7.767894e+02   5.578895e+03        -0.04554348  0.37031250
#> 23:  1516.43514    1.525970e+03   1.898622e+03        -0.09989130  0.08718750
#> 24:  1031.08682    1.222700e+03   9.427894e+02        -0.07271739 -0.11093750
#> 25:    99.77202    1.277174e-01   9.898913e+01         0.06043478  0.46875000
#> 26:    27.62268    2.694293e+00   1.415897e+01         0.14467391  0.13437500
#> 27:    20.79034    9.077446e+00   4.198370e+00         0.05364130 -0.18906250
#> 28:    17.02701    1.342663e+01   2.875543e+01        -0.09717391 -0.05687500
#>      dispersion underprediction overprediction coverage_deviation        bias
#>       ae_median
#>  1: 21109.81250
#>  2:  9949.12500
#>  3: 18315.09375
#>  4:   261.03125
#>  5:    51.93750
#>  6:   124.50000
#>  7:   154.62500
#>  8: 67726.87500
#>  9: 52432.93750
#> 10: 60549.84375
#> 11:   293.59375
#> 12:    72.12500
#> 13:    88.53125
#> 14:   143.00000
#> 15: 49934.03125
#> 16: 26422.84375
#> 17: 27901.06250
#> 18:    59.03125
#> 19:    22.96875
#> 20:    55.34375
#> 21:    40.62500
#> 22: 15123.68750
#> 23:  7599.37500
#> 24:  4929.25000
#> 25:   319.37500
#> 26:    65.50000
#> 27:    45.53125
#> 28:    91.50000
#>       ae_median

# get standard deviation
summarise_scores(scores, by = "model", fun = sd)
#>                    model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline      43157.488  2682.7722      18417.2012
#> 2: EuroCOVIDhub-ensemble      38788.740  3863.3379       7179.5954
#> 3:  epiforecasts-EpiNow2      42478.208  6698.7470       7629.9180
#> 4:       UMass-MechBayes         42.813    25.2045         30.8519
#>    overprediction coverage_deviation      bias   ae_median
#> 1:     38723.6131          0.2589550 0.5628605 50186.94967
#> 2:     36978.7730          0.2726646 0.5707267 42500.79359
#> 3:     38832.5598          0.2838865 0.6632842 49800.66917
#> 4:        18.3498          0.2705116 0.6457779    74.79904

# round digits
summarise_scores(scores,by = "model") %>%
  summarise_scores(fun = signif, digits = 2)
#>                    model interval_score dispersion underprediction
#> 1: EuroCOVIDhub-baseline          14000       2100            5100
#> 2: EuroCOVIDhub-ensemble           9000       1800            2100
#> 3:  epiforecasts-EpiNow2          11000       3000            1700
#> 4:       UMass-MechBayes             53         27              17
#>    overprediction coverage_deviation    bias ae_median
#> 1:           7100              0.002  0.2200     19000
#> 2:           5000              0.049  0.0081     12000
#> 3:           6200             -0.055 -0.0430     15000
#> 4:              9             -0.023 -0.0220        78

# get quantiles of scores
# make sure to aggregate over ranges first
summarise_scores(scores,
  by = "model", fun = quantile,
  probs = c(0.25, 0.5, 0.75)
)
#>                     model interval_score dispersion underprediction
#>  1: EuroCOVIDhub-baseline      130.23793   87.81804      0.00000000
#>  2: EuroCOVIDhub-baseline      934.41870  163.28739      0.00000000
#>  3: EuroCOVIDhub-baseline    10795.61967 3476.52750      4.42391304
#>  4: EuroCOVIDhub-ensemble       28.33707   23.67087      0.00000000
#>  5: EuroCOVIDhub-ensemble      300.30739  168.95543      0.00000000
#>  6: EuroCOVIDhub-ensemble     5532.28565 2030.41870     44.10869565
#>  7:  epiforecasts-EpiNow2       50.94935   16.90326      0.00000000
#>  8:  epiforecasts-EpiNow2      442.81696  152.12565      0.08695652
#>  9:  epiforecasts-EpiNow2     5902.85957 2033.24370    155.76086957
#> 10:       UMass-MechBayes       18.49435    7.73837      0.00000000
#> 11:       UMass-MechBayes       46.05413   17.15413      0.02173913
#> 12:       UMass-MechBayes       73.27087   38.04620     18.41304348
#>     overprediction coverage_deviation  bias ae_median
#>  1:      0.0000000         -0.1243478 -0.20    166.00
#>  2:     54.1304348         -0.0373913  0.40    763.50
#>  3:    496.7065217          0.2234783  0.60  17709.00
#>  4:      0.0000000         -0.1243478 -0.40     27.00
#>  5:      0.2826087          0.1365217  0.10    277.00
#>  6:     43.5217391          0.2452174  0.40   8100.00
#>  7:      0.0000000         -0.2982609 -0.70     68.00
#>  8:      0.0000000         -0.0373913 -0.10    404.00
#>  9:     45.6956522          0.1365217  0.60   7177.00
#> 10:      0.0000000         -0.2113043 -0.60     24.75
#> 11:      0.0000000         -0.0373913 -0.05     52.00
#> 12:     10.2500000          0.2234783  0.60    106.00

# get ranges
# summarise_scores(scores, by = "range")