This function uses a non-parametric approach to reconstruct cases by date of infection from reported cases. It can optionally then estimate the time-varying reproduction number and the rate of growth.

estimate_infections(
  reported_cases,
  model,
  samples = 1000,
  stan_args,
  method = "exact",
  family = "negbin",
  generation_time,
  delays,
  horizon = 7,
  gp = list(basis_prop = 0.3, boundary_scale = 2, lengthscale_mean = 0, lengthscale_sd
    = 2),
  rt_prior = list(mean = 1, sd = 1),
  estimate_rt = TRUE,
  estimate_week_eff = TRUE,
  estimate_breakpoints = FALSE,
  stationary = FALSE,
  fixed = FALSE,
  fixed_future_rt = FALSE,
  burn_in = 0,
  prior_smoothing_window = 7,
  future = FALSE,
  max_execution_time = Inf,
  return_fit = FALSE,
  verbose = FALSE
)

Arguments

reported_cases

A data frame of confirmed cases (confirm) by date (date). confirm must be integer and date must be in date format.

model

A compiled stan model. By default uses the internal package model.

samples

Numeric, defaults to 1000. Number of samples post warmup.

stan_args

A list of stan arguments to be passed to rstan::sampling or rstan::vb (when using the "exact" or "approximate" method). For method = approximate an additional argument trials indicates the number of attempts to make using variational inference before returning an error (as stochastic failure is possible). The default for this is 5.

method

A character string defaults to "exact". Also accepts "approximate". Indicates the fitting method to be used this can either be "exact" (NUTs sampling) or "approximate" (variational inference). The exact approach returns samples from the posterior whilst the approximate method returns approximate samples. The approximate method is likely to return results several order of magnitudes faster than the exact method.

family

A character string indicating the reporting model to use. Defaults to negative binomial ("negbin") with poisson ("poisson") also supported.

generation_time

A list containing the mean, standard deviation of the mean (mean_sd), standard deviation (sd), standard deviation of the standard deviation and the maximum allowed value for the generation time (assuming a gamma distribution).

delays

A list of delays (i.e incubation period/reporting delay) between infection and report. Each list entry must also be a list containing the mean, standard deviation of the mean (mean_sd), standard deviation (sd), standard deviation of the standard deviation and the maximum allowed value for the that delay (assuming a lognormal distribution with all parameters excepting the max allowed value on the log scale).

horizon

Numeric, defaults to 7. Number of days into the future to forecast.

gp

List controlling the Gaussian process approximation. Must contain the basis_prop (number of basis functions based on scaling the time points) which defaults to 0.3 and must be between 0 and 1 (increasing this increases the accuracy of the approximation and the cost of additional compute. Must also contain the boundary_scale (multiplied by half the range of the input time series). Increasing this increases the accuracy of the approximation at the cost of additional compute. See here: https://arxiv.org/abs/2004.11408 for more information on setting these parameters. Can optionally also contain the lengthscale_mean and lengthscale_sd. If these are specified this will override the defaults of 0 and 2 (normal distributed truncated at zero).

rt_prior

A list contain the mean and standard deviation (sd) of the gamma distributed prior for Rt. By default this is assumed to be mean 1 with a standard deviation of 1.

estimate_rt

Logical, defaults TRUE. Should Rt be estimated when imputing infections.

estimate_week_eff

Logical, defaults TRUE. Should weekly reporting effects be estimated.

estimate_breakpoints

Logical, defaults to FALSE. Should breakpoints in Rt be estimated. If true then reported_cases must contain a breakpoint variable that is 1 on the dates with breakpoints and otherwise 0. Breakpoints are fit jointly with a global non-parametric effect and so represent a conservative estimate of breakpoint changes.

stationary

Logical, defaults to FALSE. Should Rt be estimated with a global mean. When estimating Rt this should substantially improve run times but will revert to the global average for real time and forecasted estimates. This setting is most appropriate when estimating historic Rt or when combined with breakpoints.

fixed

Logical, defaults to FALSE. If TRUE then a Gaussian process is not used and Rt is assumed to be constant over time (apart from any manually included breakpoints). If estimate_rt is FALSE then this reduces the backcalculation to a simple mean shift. This option can be used to produce a null model estimate, to produce a single Rt estimate for a short timeseries or as part of a wider analysis on the impact of interventions.

fixed_future_rt

Logical, defaults to FALSE. IF TRUE then the estimated Rt from the last time point with data is used for all future time points without data.

burn_in

Numeric, defaults to 0. The number of initial estimates to discard. This argument may be used to reduce spurious findings when running estimate_infections on a partial timeseries (as the earliest estimates will not be informed by all cases that occurred only those supplied to estimate_infections). The combined delays used will inform the appropriate length of this burn in but 7 days is likely a sensible starting point.

prior_smoothing_window

Numeric defaults to 7. The number of days over which to take a rolling average for the prior based on reported cases.

future

Logical, defaults to FALSE. Should stan chains be run in parallel using future. This allows users to have chains fail gracefully (i.e when combined with max_execution_time). Should be combined with a call to future::plan

max_execution_time

Numeric, defaults to Inf. If set will kill off processing of each chain if not finished within the specified timeout. When more than 2 chains finish successfully estimates will still be returned. If less than 2 chains return within the allowed time then estimation will fail with an informative error.

return_fit

Logical, defaults to FALSE. Should the fitted stan model be returned.

verbose

Logical, defaults to FALSE. Should verbose progress messages be printed.

Examples

# \donttest{ # Get example case counts reported_cases <- EpiNow2::example_confirmed[1:50] # Add a dummy breakpoint (used only when optionally estimating breakpoints) reported_cases <- reported_cases[, breakpoint := data.table::fifelse(date == as.Date("2020-03-16"), 1, 0)] # Set up example generation time generation_time <- get_generation_time(disease = "SARS-CoV-2", source = "ganyani") # Set delays between infection and case report incubation_period <- get_incubation_period(disease = "SARS-CoV-2", source = "lauer") reporting_delay <- list(mean = log(5), mean_sd = log(2), sd = log(2), sd_sd = log(1.5), max = 30) # Run model with default settings def <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1))) # Plot output plots <- report_plots(summarised_estimates = def$summarised, reported = reported_cases) plots$summary
# Run the model using the approximate method (variational inference) approx <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), method = "approximate") # Plot output plots <- report_plots(summarised_estimates = approx$summarised, reported = reported_cases) plots$summary
# Run the model with default settings using the future backend ## (combine with a call to future::plan to make this parallel). def_future <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1))) plots <- report_plots(summarised_estimates = def_future$summarised, reported = reported_cases) plots$summary
# Run model with Rt fixed into the future fixed_rt <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), fixed_future_rt = TRUE) # Plot output plots <- report_plots(summarised_estimates = fixed_rt$summarised, reported = reported_cases) plots$summary
# Run the model with default settings on a later snapshot of # data (use burn_in here to remove the first week of # estimates that may be impacted by this most). snapshot_cases <- EpiNow2::example_confirmed[80:130] snapshot <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), burn_in = 7) # Plot output plots <- report_plots(summarised_estimates = snapshot$summarised, reported = snapshot_cases) plots$summary
## Run model with stationary Rt assumption (likely to provide biased real-time estimates) stat <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), stationary = TRUE) # Plot output plots <- report_plots(summarised_estimates = stat$summarised, reported = reported_cases) plots$summary
# Run model with fixed Rt assumption fixed <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), fixed = TRUE) # Plot output plots <- report_plots(summarised_estimates = fixed$summarised, reported = reported_cases) plots$summary
# Run model with no delays no_delay <- estimate_infections(reported_cases, generation_time = generation_time, stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1))) # Plot output plots <- report_plots(summarised_estimates = no_delay$summarised, reported = reported_cases) plots$summary
# Run model with breakpoints bkp <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), estimate_breakpoints = TRUE) # Plot output plots <- report_plots(summarised_estimates = bkp$summarised, reported = reported_cases) plots$summary
# Run model with breakpoints but with constrained non-linear change over time # This formulation may increase the apparent effect of the breakpoint but needs to be tested using # model fit criteria (i.e LFO). cbkp <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), gp = list(basis_prop = 0.3, boundary_scale = 2, lengthscale_mean = 20, lengthscale_sd = 1), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), estimate_breakpoints = TRUE) # Plot output plots <- report_plots(summarised_estimates = cbkp$summarised, reported = reported_cases) plots$summary
# Pull out breakpoint summary cbkp$summarised[variable == "breakpoints"]
#> date variable strat type bottom top lower upper #> 1: <NA> breakpoints 1 <NA> 0.803358 1.10491 0.879774 0.9998565 #> central_lower central_upper median mean sd #> 1: 0.9333897 0.977326 0.9445308 0.9512294 0.09313243
# Run model with breakpoints but otherwise static Rt # This formulation may increase the apparent effect of the breakpoint but needs to be tested using # model fit criteria (i.e LFO). fbkp <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), estimate_breakpoints = TRUE, fixed = TRUE) # Plot output plots <- report_plots(summarised_estimates = fbkp$summarised, reported = reported_cases) plots$summary
# Pull out breakpoint summary fbkp$summarised[variable == "breakpoints"]
#> date variable strat type bottom top lower upper #> 1: <NA> breakpoints 1 <NA> 0.5176252 0.6102732 0.5371678 0.5731114 #> central_lower central_upper median mean sd #> 1: 0.5554257 0.5675351 0.5606624 0.562662 0.02905336
# Run model without Rt estimation (just backcalculation) backcalc <- estimate_infections(reported_cases, generation_time = generation_time, delays = list(incubation_period, reporting_delay), stan_args = list(warmup = 200, cores = ifelse(interactive(), 4, 1)), estimate_rt = FALSE) # plot just infections as report_plots does not support the backcalculation only model plot_estimates(estimate = backcalc$summarised[variable == "infections"], reported = reported_cases, ylab = "Cases")
# }