Quick start
In the following section we give an overview of the simple use case
for epinow()
and regional_epinow()
.
The first step to using the package is to load it as follows.
Reporting delays, incubation period and generation time
Distributions can be supplied in two ways. First, one can supply
delay data to estimate_delay()
, where a subsampled
bootstrapped lognormal will be fit to account for uncertainty in the
observed data without being biased by changes in incidence (see
?EpiNow2::estimate_delay()
).
Second, one can specify predetermined delays with uncertainty using
the distribution functions such as Gamma
or
Lognormal
. An arbitrary number of delay distributions are
supported in dist_spec()
with a common use case being an
incubation period followed by a reporting delay. For more information on
specifying distributions see (see
?EpiNow2::Distributions
).
For example if data on the delay between onset and infection was
available we could fit a distribution to it, using
estimate_delay()
, with appropriate uncertainty as follows
(note this is a synthetic example),
reporting_delay <- estimate_delay(
rlnorm(1000, log(2), 1),
max_value = 14, bootstraps = 1
)
If data was not available we could instead specify an informed
estimate of the likely delay using the distribution functions
Gamma
or LogNormal
. To demonstrate, we choose
a lognormal distribution with mean 2, standard deviation 1 and a maximum
of 10. This is just an example and unlikely to apply in any
particular use case.
reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
reporting_delay
#> - lognormal distribution (max: 10):
#> meanlog:
#> 0.58
#> sdlog:
#> 0.47
For the rest of this vignette, we will use inbuilt example literature estimates for the incubation period and generation time of Covid-19 (see here for the code that generates these estimates). These distributions are unlikely to be applicable for your use case. We strongly recommend investigating what might be the best distributions to use in any given use case.
example_generation_time
#> - gamma distribution (max: 14):
#> shape:
#> - normal distribution:
#> mean:
#> 1.4
#> sd:
#> 0.48
#> rate:
#> - normal distribution:
#> mean:
#> 0.38
#> sd:
#> 0.25
example_incubation_period
#> - lognormal distribution (max: 14):
#> meanlog:
#> - normal distribution:
#> mean:
#> 1.6
#> sd:
#> 0.064
#> sdlog:
#> - normal distribution:
#> mean:
#> 0.42
#> sd:
#> 0.069
Users can also pass a non-parametric delay distribution vector using
the NonParametric
option for both the generation interval
and reporting delays. It is important to note that if doing so, both
delay distributions are 0-indexed, meaning the first element corresponds
to the probability mass at day 0 of an individual’s infection. Because
the discretised renewal equation doesn’t support mass on day 0, the
generation interval should be passed in as a 0-indexed vector with a
mass of zero on day 0.
example_non_parametric_gi <- NonParametric(pmf = c(0, 0.3, 0.5, 0.2))
example_non_parametric_delay <- NonParametric(pmf = c(0.01, 0.1, 0.5, 0.3, 0.09))
These distributions are passed to downstream functions in the same way that the parametric distributions are.
Now, to the functions.
epinow()
This function represents the core functionality of the package and includes results reporting, plotting, and optional saving. It requires a data frame of cases by date of report and the distributions defined above.
Load example case data from EpiNow2.
reported_cases <- example_confirmed[1:60]
head(reported_cases)
#> date confirm
#> <Date> <num>
#> 1: 2020-02-22 14
#> 2: 2020-02-23 62
#> 3: 2020-02-24 53
#> 4: 2020-02-25 97
#> 5: 2020-02-26 93
#> 6: 2020-02-27 78
Estimate cases by date of infection, the time-varying reproduction
number, the rate of growth, and forecast these estimates into the future
by 7 days. Summarise the posterior and return a summary table and plots
for reporting purposes. If a target_folder
is supplied
results can be internally saved (with the option to also turn off
explicit returning of results). Here we use the default model
parameterisation that prioritises real-time performance over run-time or
other considerations. For other formulations see the documentation for
estimate_infections()
.
estimates <- epinow(
data = reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(example_incubation_period + reporting_delay),
rt = rt_opts(prior = list(mean = 2, sd = 0.2)),
stan = stan_opts(cores = 4, control = list(adapt_delta = 0.99)),
verbose = interactive()
)
names(estimates)
#> [1] "estimates" "estimated_reported_cases"
#> [3] "summary" "plots"
#> [5] "timing"
Both summary measures and posterior samples are returned for all
parameters in an easily explored format which can be accessed using
summary
. The default is to return a summary table of
estimates for key parameters at the latest date partially supported by
data.
measure | estimate |
---|---|
New infections per day | 2235 (1258 – 4079) |
Expected change in daily reports | Likely decreasing |
Effective reproduction no. | 0.89 (0.69 – 1.1) |
Rate of growth | -0.029 (-0.1 – 0.049) |
Doubling/halving time (days) | -24 (14 – -6.9) |
Summarised parameter estimates can also easily be returned, either filtered for a single parameter or for all parameters.
head(summary(estimates, type = "parameters", params = "R"))
#> date variable strat type median mean sd lower_90
#> <Date> <char> <char> <char> <num> <num> <num> <num>
#> 1: 2020-02-22 R <NA> estimate 2.295630 2.300011 0.14226384 2.072498
#> 2: 2020-02-23 R <NA> estimate 2.254414 2.260309 0.12775375 2.058350
#> 3: 2020-02-24 R <NA> estimate 2.213859 2.218421 0.11582310 2.039173
#> 4: 2020-02-25 R <NA> estimate 2.167609 2.174463 0.10645004 2.011138
#> 5: 2020-02-26 R <NA> estimate 2.119388 2.128587 0.09939624 1.977938
#> 6: 2020-02-27 R <NA> estimate 2.072939 2.080978 0.09424020 1.942162
#> lower_50 lower_20 upper_20 upper_50 upper_90
#> <num> <num> <num> <num> <num>
#> 1: 2.199171 2.257930 2.331979 2.393862 2.536167
#> 2: 2.170840 2.220883 2.287825 2.346032 2.475402
#> 3: 2.137291 2.182338 2.242451 2.297515 2.410503
#> 4: 2.099378 2.138842 2.196990 2.246167 2.354895
#> 5: 2.058774 2.094525 2.147410 2.192443 2.298576
#> 6: 2.014358 2.047728 2.098028 2.142222 2.244473
Reported cases are returned in a separate data frame in order to streamline the reporting of forecasts and for model evaluation.
head(summary(estimates, output = "estimated_reported_cases"))
#> date type median mean sd lower_90 lower_50 lower_20
#> <Date> <char> <num> <num> <num> <num> <num> <num>
#> 1: 2020-02-22 gp_rt 75 77.7545 22.45421 46 62.00 70
#> 2: 2020-02-23 gp_rt 87 89.3255 24.15807 53 73.00 82
#> 3: 2020-02-24 gp_rt 86 88.5850 23.75963 54 72.00 81
#> 4: 2020-02-25 gp_rt 80 82.0745 22.36889 49 66.00 75
#> 5: 2020-02-26 gp_rt 81 82.8155 22.64243 50 67.75 76
#> 6: 2020-02-27 gp_rt 107 110.9325 29.73030 66 90.00 101
#> upper_20 upper_50 upper_90
#> <num> <num> <num>
#> 1: 81 91 118
#> 2: 93 104 132
#> 3: 92 103 132
#> 4: 86 96 120
#> 5: 86 96 124
#> 6: 115 129 163
A range of plots are returned (with the single summary plot shown
below). These plots can also be generated using the following
plot
method.
plot(estimates)
regional_epinow()
The regional_epinow()
function runs the
epinow()
function across multiple regions in an efficient
manner.
Define cases in multiple regions delineated by the region variable.
reported_cases <- data.table::rbindlist(list(
data.table::copy(reported_cases)[, region := "testland"],
reported_cases[, region := "realland"]
))
head(reported_cases)
#> date confirm region
#> <Date> <num> <char>
#> 1: 2020-02-22 14 testland
#> 2: 2020-02-23 62 testland
#> 3: 2020-02-24 53 testland
#> 4: 2020-02-25 97 testland
#> 5: 2020-02-26 93 testland
#> 6: 2020-02-27 78 testland
Calling regional_epinow()
runs the epinow()
on each region in turn (or in parallel depending on the settings used).
Here we switch to using a weekly random walk rather than the full
Gaussian process model giving us piecewise constant estimates by
week.
estimates <- regional_epinow(
data = reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(example_incubation_period + reporting_delay),
rt = rt_opts(prior = list(mean = 2, sd = 0.2), rw = 7),
gp = NULL,
stan = stan_opts(cores = 4, warmup = 250, samples = 1000)
)
#> INFO [2024-10-15 17:53:24] Producing following optional outputs: regions, summary, samples, plots, latest
#> INFO [2024-10-15 17:53:24] Reporting estimates using data up to: 2020-04-21
#> INFO [2024-10-15 17:53:24] No target directory specified so returning output
#> INFO [2024-10-15 17:53:24] Producing estimates for: testland, realland
#> INFO [2024-10-15 17:53:24] Regions excluded: none
#> INFO [2024-10-15 17:53:46] Completed estimates for: testland
#> INFO [2024-10-15 17:54:09] Completed estimates for: realland
#> INFO [2024-10-15 17:54:09] Completed regional estimates
#> INFO [2024-10-15 17:54:09] Regions with estimates: 2
#> INFO [2024-10-15 17:54:09] Regions with runtime errors: 0
#> INFO [2024-10-15 17:54:09] Producing summary
#> INFO [2024-10-15 17:54:09] No summary directory specified so returning summary output
#> INFO [2024-10-15 17:54:10] No target directory specified so returning timings
Results from each region are stored in a regional
list
with across region summary measures and plots stored in a
summary
list. All results can be set to be internally saved
by setting the target_folder
and summary_dir
arguments. Each region can be estimated in parallel using the
future package (when in most scenarios cores
should be set to 1). For routine use each MCMC chain can also be run in
parallel (with future
= TRUE) with a time out
(max_execution_time
) allowing for partial results to be
returned if a subset of chains is running longer than expected. See the
documentation for the future package for details on
nested futures.
Summary measures that are returned include a table formatted for reporting (along with raw results for further processing). Futures updated will extend the S3 methods used above to smooth access to this output.
knitr::kable(estimates$summary$summarised_results$table)
Region | New infections per day | Expected change in daily reports | Effective reproduction no. | Rate of growth | Doubling/halving time (days) |
---|---|---|---|---|---|
realland | 2107 (1035 – 4584) | Likely decreasing | 0.87 (0.62 – 1.2) | -0.037 (-0.11 – 0.049) | -19 (14 – -6.2) |
testland | 2084 (1062 – 4392) | Likely decreasing | 0.85 (0.61 – 1.2) | -0.04 (-0.11 – 0.048) | -18 (15 – -6.4) |
A range of plots are again returned (with the single summary plot shown below).
estimates$summary$summary_plot