Examples: estimate_infections()
Source:vignettes/estimate_infections_options.Rmd
estimate_infections_options.Rmd
The estimate_infections()
function encodes a range of
different model options. In this vignette we apply some of these to the
example data provided with the EpiNow2 package, highlighting
differences in inference results and run times. It is not meant as a
comprehensive exploration of all the functionality in the package, but
intended to give users a flavour of the kind of model options that exist
for reproduction number estimation and forecasting within the package,
and the differences in computational speed between them. For
mathematical detail on the model please consult the model definition vignette, and for a
more general description of the use of the function, the estimate_infections
workflow vignette.
Set up
We first load the EpiNow2 package and also the rstan package that we will use later to show the differences in run times between different model options.
library("EpiNow2")
library("rstan")
#> Loading required package: StanHeaders
#>
#> rstan version 2.32.6 (Stan version 2.32.2)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)
In this examples we set the number of cores to use to 4 but the optimal value here will depend on the computing resources available.
options(mc.cores = 4)
Data
We will use an example data set that is included in the package, representing an outbreak of COVID-19 with an initial rapid increase followed by decreasing incidence.
library("ggplot2")
reported_cases <- example_confirmed[1:60]
ggplot(reported_cases, aes(x = date, y = confirm)) +
geom_col() +
theme_minimal() +
xlab("Date") +
ylab("Cases")
Parameters
Before running the model we need to decide on some parameter values, in particular any delays, the generation time, and a prior on the initial reproduction number.
Delays: incubation period and reporting delay
Delays reflect the time that passes between infection and reporting, if these exist. In this example, we assume two delays, an incubation period (i.e. delay from infection to symptom onset) and a reporting delay (i.e. the delay from symptom onset to being recorded as a symptomatic case). These delays are usually not the same for everyone and are instead characterised by a distribution. For the incubation period we use an example from the literature that is included in the package.
example_incubation_period
#> - lognormal distribution (max: 14):
#> meanlog:
#> - normal distribution:
#> mean:
#> 1.6
#> sd:
#> 0.064
#> sdlog:
#> - normal distribution:
#> mean:
#> 0.42
#> sd:
#> 0.069
For the reporting delay, we use a lognormal distribution with mean of
2 days and standard deviation of 1 day. Note that the mean and standard
deviation must be converted to the log scale, which can be done using
the convert_log_logmean()
function.
reporting_delay <- LogNormal(mean = 2, sd = 1, max = 10)
reporting_delay
#> - lognormal distribution (max: 10):
#> meanlog:
#> 0.58
#> sdlog:
#> 0.47
EpiNow2 provides a feature that allows us to combine these delays into one by summing them up
delay <- example_incubation_period + reporting_delay
delay
#> Composite distribution:
#> - lognormal distribution (max: 14):
#> meanlog:
#> - normal distribution:
#> mean:
#> 1.6
#> sd:
#> 0.064
#> sdlog:
#> - normal distribution:
#> mean:
#> 0.42
#> sd:
#> 0.069
#> - lognormal distribution (max: 10):
#> meanlog:
#> 0.58
#> sdlog:
#> 0.47
Generation time
If we want to estimate the reproduction number we need to provide a distribution of generation times. Here again we use an example from the literature that is included with the package.
example_generation_time
#> - gamma distribution (max: 14):
#> shape:
#> - normal distribution:
#> mean:
#> 1.4
#> sd:
#> 0.48
#> rate:
#> - normal distribution:
#> mean:
#> 0.38
#> sd:
#> 0.25
Initial reproduction number
Lastly we need to choose a prior for the initial value of the reproduction number. This is assumed by the model to be normally distributed and we can set the mean and the standard deviation. We decide to set the mean to 2 and the standard deviation to 1.
rt_prior <- list(mean = 2, sd = 0.1)
Running the model
We are now ready to run the model and will in the following show a number of possible options for doing so.
Default options
By default the model uses a renewal equation for infections and a Gaussian Process prior for the reproduction number. Putting all the data and parameters together and tweaking the Gaussian Process to have a shorter length scale prior than the default we run.
def <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(prior = rt_prior)
)
#> Warning: There were 11 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
# summarise results
summary(def)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2237 (956 -- 4859)
#> 2: Expected change in daily reports Likely decreasing
#> 3: Effective reproduction no. 0.88 (0.58 -- 1.3)
#> 4: Rate of growth -0.034 (-0.16 -- 0.085)
#> 5: Doubling/halving time (days) -21 (8.2 -- -4.4)
# elapsed time (in seconds)
get_elapsed_time(def$fit)
#> warmup sample
#> chain:1 22.546 17.871
#> chain:2 43.610 35.665
#> chain:3 21.929 18.197
#> chain:4 18.185 17.063
# summary plot
plot(def)
Reducing the accuracy of the approximate Gaussian Process
To speed up the calculation of the Gaussian Process we could decrease its accuracy, e.g. decrease the proportion of time points to use as basis functions from the default of 0.2 to 0.1.
agp <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(prior = rt_prior),
gp = gp_opts(basis_prop = 0.1)
)
#> Warning: There were 19 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
# summarise results
summary(agp)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2229 (1060 -- 4985)
#> 2: Expected change in daily reports Likely decreasing
#> 3: Effective reproduction no. 0.88 (0.61 -- 1.2)
#> 4: Rate of growth -0.036 (-0.15 -- 0.076)
#> 5: Doubling/halving time (days) -19 (9.1 -- -4.8)
# elapsed time (in seconds)
get_elapsed_time(agp$fit)
#> warmup sample
#> chain:1 14.264 17.067
#> chain:2 14.501 16.719
#> chain:3 17.512 24.243
#> chain:4 16.937 17.436
# summary plot
plot(agp)
Adjusting for future susceptible depletion
We might want to adjust for future susceptible depletion. Here, we do so by setting the population to 1000000 and projecting the reproduction number from the latest estimate (rather than the default, which fixes the reproduction number to an earlier time point based on the given reporting delays). Note that this only affects the forecasts and is done using a crude adjustment (see the model definition).
dep <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(
prior = rt_prior,
pop = 1000000, future = "latest"
)
)
#> Warning: There were 9 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
# summarise results
summary(dep)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2219 (934 -- 4735)
#> 2: Expected change in daily reports Likely decreasing
#> 3: Effective reproduction no. 0.88 (0.57 -- 1.2)
#> 4: Rate of growth -0.034 (-0.16 -- 0.076)
#> 5: Doubling/halving time (days) -20 (9.1 -- -4.3)
# elapsed time (in seconds)
get_elapsed_time(dep$fit)
#> warmup sample
#> chain:1 24.086 18.080
#> chain:2 25.069 31.131
#> chain:3 24.617 25.153
#> chain:4 21.874 18.766
# summary plot
plot(dep)
Adjusting for truncation of the most recent data
We might further want to adjust for right-truncation of recent data estimated using the estimate_truncation model. Here, instead of doing so we assume that we know about truncation with mean of 1/2 day, sd 1/2 day, following a lognormal distribution and with a maximum of three days.
trunc_dist <- LogNormal(
mean = Normal(0.5, 0.1),
sd = Normal(0.5, 0.1),
max = 3
)
#> Warning in new_dist_spec(params, "lognormal"): Uncertain lognormal distribution
#> specified in terms of parameters that are not the "natural" parameters of the
#> distribution (meanlog, sdlog). Converting using a crude and very approximate
#> method that is likely to produce biased results. If possible, it is preferable
#> to specify the distribution directly in terms of the natural parameters.
trunc_dist
#> - lognormal distribution (max: 3):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1
#> sd:
#> 0.14
#> sdlog:
#> - normal distribution:
#> mean:
#> 0.83
#> sd:
#> 0.13
We can then use this in the esimtate_infections()
function using the truncation
option.
trunc <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
truncation = trunc_opts(trunc_dist),
rt = rt_opts(prior = rt_prior)
)
#> Error in vapply(delays[parametric], attr, "weight_prior", FUN.VALUE = logical(1)): values must be length 1,
#> but FUN(X[[3]]) result is length 0
# summarise results
summary(trunc)
#> Error in object[[i]]: object of type 'builtin' is not subsettable
# elapsed time (in seconds)
get_elapsed_time(trunc$fit)
#> Error in (function (cond) : error in evaluating the argument 'object' in selecting a method for function 'get_elapsed_time': object of type 'builtin' is not subsettable
# summary plot
plot(trunc)
Projecting the reproduction number with the Gaussian Process
Instead of keeping the reproduction number fixed from a certain time point we might want to extrapolate the Gaussian Process into the future. This will lead to wider uncertainty, and the researcher should check whether this or fixing the reproduction number from an earlier is desirable.
project_rt <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(
prior = rt_prior, future = "project"
)
)
#> Warning: There were 8 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
# summarise results
summary(project_rt)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2200 (1010 -- 4815)
#> 2: Expected change in daily reports Likely decreasing
#> 3: Effective reproduction no. 0.87 (0.59 -- 1.2)
#> 4: Rate of growth -0.036 (-0.15 -- 0.08)
#> 5: Doubling/halving time (days) -19 (8.6 -- -4.5)
# elapsed time (in seconds)
get_elapsed_time(project_rt$fit)
#> warmup sample
#> chain:1 23.881 34.602
#> chain:2 18.612 18.302
#> chain:3 26.161 28.837
#> chain:4 21.156 29.987
# summary plot
plot(project_rt)
Fixed reproduction number
We might want to estimate a fixed reproduction number, i.e. assume that it does not change.
fixed <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
gp = NULL
)
# summarise results
summary(fixed)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 16780 (9407 -- 30551)
#> 2: Expected change in daily reports Increasing
#> 3: Effective reproduction no. 1.2 (1.1 -- 1.3)
#> 4: Rate of growth 0.05 (0.035 -- 0.065)
#> 5: Doubling/halving time (days) 14 (11 -- 20)
# elapsed time (in seconds)
get_elapsed_time(fixed$fit)
#> warmup sample
#> chain:1 1.270 0.951
#> chain:2 1.109 0.804
#> chain:3 1.151 0.893
#> chain:4 1.242 0.926
# summary plot
plot(fixed)
Breakpoints
Instead of assuming the reproduction number varies freely or is
fixed, we can assume that it is fixed but with breakpoints. This can be
done by adding a breakpoint
column to the reported case
data set. e.g. if we think that the reproduction number was constant but
would like to allow it to change on the 16th of March 2020 we would
define a new case data set using
bp_cases <- data.table::copy(reported_cases)
bp_cases <- bp_cases[,
breakpoint := ifelse(date == as.Date("2020-03-16"), 1, 0)
]
We then use this instead of reported_cases
in the
estimate_infections()
function:
bkp <- estimate_infections(bp_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(prior = rt_prior),
gp = NULL
)
# summarise results
summary(bkp)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2356 (1932 -- 2898)
#> 2: Expected change in daily reports Decreasing
#> 3: Effective reproduction no. 0.89 (0.86 -- 0.92)
#> 4: Rate of growth -0.027 (-0.034 -- -0.02)
#> 5: Doubling/halving time (days) -25 (-35 -- -20)
# elapsed time (in seconds)
get_elapsed_time(bkp$fit)
#> warmup sample
#> chain:1 1.819 2.313
#> chain:2 1.927 2.225
#> chain:3 2.038 3.406
#> chain:4 2.024 2.143
# summary plot
plot(bkp)
Weekly random walk
Instead of a smooth Gaussian Process we might want the reproduction
number to change step-wise, e.g. every week. This can be achieved using
the rw
option which defines the length of the time step in
a random walk that the reproduction number is assumed to follow.
rw <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = rt_opts(prior = rt_prior, rw = 7),
gp = NULL
)
# summarise results
summary(rw)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2136 (1056 -- 4315)
#> 2: Expected change in daily reports Likely decreasing
#> 3: Effective reproduction no. 0.87 (0.63 -- 1.2)
#> 4: Rate of growth -0.035 (-0.11 -- 0.044)
#> 5: Doubling/halving time (days) -20 (16 -- -6.3)
# elapsed time (in seconds)
get_elapsed_time(rw$fit)
#> warmup sample
#> chain:1 4.182 6.196
#> chain:2 4.202 6.631
#> chain:3 4.999 7.652
#> chain:4 4.127 4.635
# summary plot
plot(rw)
No delays
Whilst EpiNow2 allows the user to specify delays, it can also run directly on the data as does e.g. the EpiEstim package.
no_delay <- estimate_infections(
reported_cases,
generation_time = gt_opts(example_generation_time)
)
#> Warning: There were 32 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
# summarise results
summary(no_delay)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2759 (2325 -- 3311)
#> 2: Expected change in daily reports Decreasing
#> 3: Effective reproduction no. 0.87 (0.76 -- 0.98)
#> 4: Rate of growth -0.042 (-0.098 -- 0.0045)
#> 5: Doubling/halving time (days) -17 (150 -- -7.1)
# elapsed time (in seconds)
get_elapsed_time(no_delay$fit)
#> warmup sample
#> chain:1 34.378 41.249
#> chain:2 24.455 27.261
#> chain:3 28.171 43.757
#> chain:4 32.226 33.559
# summary plot
plot(no_delay)
Non-parametric infection model
The package also includes a non-parametric infection model. This runs much faster but does not use the renewal equation to generate infections. Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting user choice and model generality. It also means that the model is questionable for forecasting, which is why were here set the predictive horizon to 0.
non_parametric <- estimate_infections(reported_cases,
generation_time = gt_opts(example_generation_time),
delays = delay_opts(delay),
rt = NULL,
backcalc = backcalc_opts(),
horizon = 0
)
# summarise results
summary(non_parametric)
#> measure estimate
#> <char> <char>
#> 1: New infections per day 2539 (2066 -- 3083)
#> 2: Expected change in daily reports Decreasing
#> 3: Effective reproduction no. 0.92 (0.8 -- 0.99)
#> 4: Rate of growth -0.023 (-0.044 -- -0.0015)
#> 5: Doubling/halving time (days) -30 (-460 -- -16)
# elapsed time (in seconds)
get_elapsed_time(non_parametric$fit)
#> warmup sample
#> chain:1 1.631 0.451
#> chain:2 1.862 0.363
#> chain:3 1.969 0.454
#> chain:4 2.113 0.539
# summary plot
plot(non_parametric)