Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data

A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

truncation

A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

stan

A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.

CrIs

Numeric vector of credible intervals to calculate.

filter_leading_zeros

Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.

zero_threshold

[Experimental] Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.

weigh_delay_priors

Deprecated; use the weight_prior option in trunc_opts() instead.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

obs

Deprecated; use data instead.

Value

An <estimate_truncation> object containing:

  • observations: The input data (list of <data.frame>s).

  • args: A list of arguments used for fitting (stan data).

  • fit: The stan fit object.

Use get_delays() to extract the estimated truncation distribution as a <dist_spec>, which can be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections().

Use get_predictions() to extract truncation-adjusted estimates that can be compared to the observed data.

S3 methods available: summary.estimate_truncation(), plot.estimate_truncation(), get_samples.estimate_truncation(), get_delays(), get_predictions.estimate_truncation().

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# extract the estimated truncation distribution
get_delay(est, "truncation")
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         0.9
#>       sd:
#>         0.005
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         0.6
#>       sd:
#>         0.007
# summarise the truncation distribution parameters
summary(est)
#> Truncation distribution: lognormal (max: 10) 
#> 
#> Parameter estimates:
#>    variable    median      mean          sd  lower_90  lower_50  lower_20
#>      <char>     <num>     <num>       <num>     <num>     <num>     <num>
#> 1:  meanlog 0.9006507 0.9006794 0.005389990 0.8918413 0.8969591 0.8992442
#> 2:    sdlog 0.5995325 0.5995243 0.006772229 0.5883754 0.5949558 0.5978502
#>     upper_20  upper_50  upper_90
#>        <num>     <num>     <num>
#> 1: 0.9021545 0.9043472 0.9094739
#> 2: 0.6012994 0.6041179 0.6106544
# validation plot of observations vs estimates
plot(est)
#> Ignoring unknown labels:
#>  fill : "Type"


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  generation_time = generation_time_opts(example_generation_time),
  example_truncated[[5]],
  truncation = trunc_opts(get_delay(est, "truncation"))
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmplTgdXf/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmplTgdXf/epinow/2020-04-21.log.
plot(out)

options(old_opts)
# }