This project is under active development. Current results are preliminary and may change as more data is collected or as the methodology is improved.

Background

COVID-19 hospitalisations in Germany are released by date of positive test rather than by date of admission. This has some advantages when they are used as a tool for surveillance as these data are closer to the date of infection and so easier to link to underlying transmission dynamics and public health interventions. Unfortunately, however, when released in this way the latest data are right-censored meaning that final hospitalisations for a given day are initially underreported. This issue is often found in data sets used for the surveillance of infectious diseases and can lead to delayed or biased decision making. Fortunately, when data from a series of days is available we can estimate the level of censoring and provide estimates for the truncated hospitalisations adjusted for truncation with appropriate uncertainty. This is usually known as a nowcast.

In this work, we aim to evaluate a series of novel semi-parametric nowcasting model formulations in real-time and provide an example workflow to allow others to do similarly using German COVID-19 hospitalisations by date of positive test at the national level both overall and by age group, and at the state level. This project is part of a wider collaboration assessing a range of nowcasting methods whilst providing an ensemble nowcast of COVID-19 Hospital admissions in Germany by date of positive test. This ensemble should be used for any policy-related work rather than the nowcasts provided in this repository. See here for more on this nowcasting collaboration.

Methods

We follow the approach of Höhle and Heiden[1] and consider the distribution of notifications by date of positive test and reporting delay conditional on the final observed count for each target dataset. This results in an estimation process where expected hospitalisations by date of positive test are estimated jointly with the delay distribution for each date of positive test. When combined this gives the expected final hospitalisations, and once an observation model has been assumed (in our case a negative binomial model) the estimated hospitalisations by date of report can be recovered. Aggregating these estimates then gives estimated hospitalisations by date of positive test adjusted for right censoring.

We explore two primary models and submit nowcasts from these models to the nowcasting hub. The first of these is fit independently to each data set by age and location. Hospitalisations are modelled using a random walk on the log scale. Reporting delays are then modelled parametrically using a lognormal distribution with the log mean and log standard deviation each modelled using a weekly random walk with a pooled standard deviation, and a random effect for the day of the week (introduced on the 6th of December 2021) with public holidays assumed to be reported like Sundays. Report date effects are again modelled using a random effect for day of the week with public holidays assumed to be reported like Sundays. The second model is fit jointly to age groups but is otherwise structured in the same way as the unpooled model except that report day of the week effects and the observation overdispersion are assumed to be joint across age groups, age groups are assumed to have a random intercept for both the log mean and the log standard deviation of the reporting delay distribution, and there is no random effect for reference day of the week. We also consider a series of pooled models which sequentially include the features of our most complex model. These are: age groups are fit jointly, day of the week reporting effects, a random intercept for each age group, and a random walk by positive test week shared across age groups.

We evaluate these models first visually across a range of nowcasting dates and then quantitatively using proper scoring rules[2] on both the natural and log scales (corresponding to absolute and relative performance) aggregating scores first across all targets and then stratifying in turn by age group, nowcast horizon, date of postive test, and date of report. We also explore other aspects of our models performance by highlighting models that have problematic fitting diagnostics and summarising the estimation time for each model. We provide a report of this evaluation that is updated in real-time as new data and nowcasts become available.

All models are implemented using the epinowcast R package[3]. The nowcasting and evaluation pipeline is implemented using the targets R package[4]. All input data, interim data, and output data are available and should also be fully reproducible from the provided code. Please see the resources section for details. Further details on our methodology are included in our paper.

Resources

Document Purpose
Summary A summary of this work.
Paper The academic paper write up of this work.
Supplementary information The supplementary information for the write up of this work.
Real-time model evaluation A report visualising and evaluating nowcasts from the various model configurations considered here in real-time.
Real-time method evaluation A report visualising and evaluating nowcasts from the various methods (from this project and other groups) submitted to the Germany nowcasting hub in real-time.
Project README Overarching project README. Includes links to resources, a summary of key files, and reproducibility information.
Analysis pipeline The targets based analysis pipeline.
Analysis archive An archived version of the _targets directory. Download using get_targets_archive().
Data Documentation for input data and summarised output from the analysis.
bin Documentation for orchestration of nowcast estimation, publishing, and archiving.
News Dated development notes.
epinowcast The documentation for epinowcast the R package used to implement the models evaluated here. See this case study for a simplified version of this analysis.
Germany nowcasting hub The homepage (containing a dashboard and information) for the Germany nowcasting hub project to which nowcasts from this evaluation are submitted along with others produced by independent groups.

References

1. Höhle, M., & Heiden, M. an der. (2014). Bayesian nowcasting during the STEC O104:H4 outbreak in Germany, 2011. Biometrics, 70(4), 993–1002. https://doi.org/10.1111/biom.12194
2. Bosse, N. (2020). Scoringutils: A collection of proper scoring rules and metrics to assess predictions. https://github.com/epiforecasts/scoringutils
3. Abbott, S. (2021). Epinowcast: Hierarchical nowcasting of right censored epidemological counts. Zenodo. https://doi.org/10.5281/zenodo.5637165
4. Landau, W. M. (2021). The targets r package: A dynamic make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959. https://doi.org/10.21105/joss.02959
 

Developed by Sam Abbott as a member of EpiForecasts