This is our evaluation of forecasting models and individual forecasters who have contributed forecasts of Covid-19 case and death numbers in Germany and Poland. These forecasts were submitted to the German and Polish Forecast Hub each week. Currently, we submit an unweighted mean ensemble to the Forecast Hub, but this will likely be changed in the future. We will transition to submitting forecasts to the European Forecast Hub in the future. The evaluations are our own and not authorised by the German Forecast Hub team. We cannot rule out mistakes and the analyses are subject to change.
Note that forecasts have now shifted to the European Forecast Hub. You can find our evaluation for these forecasts here.
If you have questions or want to give feedback, please create an issue on our github repository
You can register to become a forecaster here.
Forecaster ranking
Here is an overall ranking of all forecasters. The ranking is made according to relative skill. Relative skill is calculated by looking at all pairwise comparisons between forecasters in terms of the weighted interval score (WIS). See below for a more detailed explanation of the scoring metrics used. ‘Overall’ shows the complete ranking, ‘latest’ only spans the last 5-6 weeks of data. ‘Detailed’ represents the full data set that you can download for your own analysis.
The following metrics are used:
- Relative skill is a metric based on the weighted interval score (WIS) that is using a ‘pairwise comparison tournament’. All pairs of forecasters are compared against each other in terms of the weighted interval score. The mean score of both models based on the set of common targets for which both models have made a prediction are calculated to obtain mean score ratios. The relative skill is the geometric mean of these mean score ratios. Smaller values are better and a value smaller than one means that the model beats the average forecasting model.
- The weighted interval score is a proper scoring rule (meaning you can’t cheat it) suited to scoring forecasts in an interval format. It has three components: sharpness, underprediction and overprediction. Sharpness is the width of your prediction interval. Over- and underprediction only come into play if the prediction interval does not cover the true value. They are the absolute value of the difference between the upper or lower bound of your prediction interval (depending on whether your forecast is too high or too low).
- coverage deviation is the average difference between nominal and empirical interval coverage. Say your 50 percent prediction interval covers only 20 percent of all true values, then your coverage deviation is 0.5 - 0.2 = -0.3. The coverage deviation value in the table is calculated by averaging over the coverage deviation calculated for all possible prediction intervals. If the value is negative you have covered less then you should. If it is positve, then your forecasts could be a little more confident.
- bias is a measure between -1 and 1 that expresses your tendency to underpredict (-1) or overpredict (1). In contrast to the over- and underprediction components of the WIS it is bound between -1 and 1 and cannot go to infinity. It is therefore less susceptible to outliers.
- aem is the absolute error of your median forecasts. A high aem means your median forecasts tend to be far away from the true values.
Forecast visualisation
This is a visualisation of all forecasts made so far.
2021-04-05
Germany
case

death

Poland
case

death

2021-03-29
Germany
case

death

Poland
case

death

2021-03-22
Germany
case

death

Poland
case

death

2021-03-15
Germany
case

death

Poland
case

death

2021-03-08
Germany
case

death
