Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  obs,
  max_truncation,
  trunc_max = 10,
  trunc_dist = "lognormal",
  truncation = dist_spec(mean = 0, sd = 0, mean_sd = 1, sd_sd = 1, max = 10),
  model = NULL,
  CrIs = c(0.2, 0.5, 0.9),
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

obs

A list of <data.frame>s each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Deprecated; use truncation instead.

trunc_max

Deprecated; use truncation instead.

trunc_dist

Deprecated; use truncation instead.

truncation

A call to trunc_opts() defining the truncation of observed data. Defaults to trunc_opts(). See estimate_truncation() for an approach to estimating truncation from data.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

CrIs

Numeric vector of credible intervals to calculate.

weigh_delay_priors

Logical. If TRUE, all delay distribution priors will be weighted by the number of observation data points, in doing so approximately placing an independent prior at each time step and usually preventing the posteriors from shifting. If FALSE (default), no weight will be applied, i.e. delay distributions will be treated as a single parameters.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Author

Sam Abbott

Sebastian Funk

Examples

# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# get example case counts
reported_cases <- example_confirmed[1:60]

# define example truncation distribution (note not integer adjusted)
trunc <- dist_spec(
  mean = convert_to_logmean(3, 2),
  mean_sd = 0.1,
  sd = convert_to_logsd(3, 2),
  sd_sd = 0.1,
  max = 10
)

# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
  set.seed(index)
  if (dist$dist == 0) {
    dfunc <- dlnorm
  } else {
    dfunc <- dgamma
  }
  cmf <- cumsum(
    dfunc(
      1:(dist$max + 1),
      rnorm(1, dist$mean_mean, dist$mean_sd),
      rnorm(1, dist$sd_mean, dist$sd_sd)
    )
  )
  cmf <- cmf / cmf[dist$max + 1]
  cmf <- rev(cmf)[-1]
  trunc_cases <- data.table::copy(cases)[1:(.N - index)]
  trunc_cases[
    (.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)
  ]
  return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
  construct_truncation,
  cases = reported_cases,
  dist = trunc
)

# fit model to example data
est <- estimate_truncation(example_data,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# summary of the distribution
est$dist
#> 
#>   Uncertain lognormal distribution with (untruncated) logmean 0.091 (SD 0.14) and logSD 1.5 (SD 0.32)
#> 
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>  1:     1 1.0000000 2.775896e-18 1.212176e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9921205 8.568034e-05 2.788720e-03 0.9872529 0.9902813 0.9914826
#>  3:     3 0.9824764 1.780379e-04 5.844760e-03 0.9724098 0.9785708 0.9810700
#>  4:     4 0.9704462 2.768270e-04 9.182233e-03 0.9547612 0.9642469 0.9681107
#>  5:     5 0.9550902 3.808010e-04 1.279491e-02 0.9336809 0.9463992 0.9516969
#>  6:     6 0.9349227 4.866058e-04 1.662972e-02 0.9069603 0.9234404 0.9303761
#>  7:     7 0.9074571 5.864466e-04 2.053122e-02 0.8735577 0.8932890 0.9018734
#>  8:     8 0.8682170 6.632202e-04 2.413223e-02 0.8281059 0.8513502 0.8615467
#>  9:     9 0.8083018 6.966849e-04 2.672062e-02 0.7651308 0.7895987 0.8013815
#> 10:    10 0.7072803 6.909915e-04 2.809480e-02 0.6624083 0.6882540 0.6997003
#> 11:    11 0.5062614 1.052861e-03 4.019634e-02 0.4412611 0.4787906 0.4957663
#>        median  upper_20  upper_50  upper_90
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9922574 0.9929528 0.9941407 0.9964270
#>  3: 0.9827461 0.9841967 0.9867302 0.9915651
#>  4: 0.9706790 0.9730265 0.9769897 0.9848748
#>  5: 0.9552511 0.9585042 0.9640421 0.9755044
#>  6: 0.9349803 0.9391924 0.9464087 0.9617624
#>  7: 0.9075028 0.9125452 0.9214741 0.9411957
#>  8: 0.8678338 0.8738377 0.8843173 0.9087683
#>  9: 0.8076227 0.8144067 0.8256789 0.8521582
#> 10: 0.7064820 0.7135470 0.7254033 0.7524153
#> 11: 0.5040530 0.5149058 0.5316911 0.5748976
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>  1: 2020-03-24    4764         4789  2020-04-01 5285       0  31     5237
#>  2: 2020-03-25    5193         5249  2020-04-01 5259       1  49     5182
#>  3: 2020-03-26    5104         5210  2020-04-01 6206       2  83     6075
#>  4: 2020-03-27    5927         6153  2020-04-01 5960       3 106     5792
#>  5: 2020-03-28    5571         5959  2020-04-01 5837       3 131     5625
#>  6: 2020-03-29    5295         5974  2020-04-01 4826       3 133     4607
#>  7: 2020-03-30    4187         5217  2020-04-01 3306       2 108     3133
#>  8: 2020-03-31    2670         4050  2020-04-01 2377       2  93     2231
#>  9: 2020-04-01    1679         4053  2020-04-01    0      NA   0        0
#> 10: 2020-03-29    5970         5974  2020-04-06 6076       1  36     6020
#> 11: 2020-03-30    5209         5217  2020-04-06 5368       1  50     5288
#> 12: 2020-03-31    4036         4050  2020-04-06 4226       1  56     4137
#> 13: 2020-04-01    4021         4053  2020-04-06 4302       2  76     4180
#> 14: 2020-04-02    4698         4782  2020-04-06 5179       3 117     4991
#> 15: 2020-04-03    4485         4668  2020-04-06 5169       4 143     4935
#> 16: 2020-04-04    4183         4585  2020-04-06 5180       4 170     4908
#> 17: 2020-04-05    3864         4805  2020-04-06 5471       5 216     5135
#> 18: 2020-04-06    2417         4316  2020-04-06 4804       9 383     4204
#> 19: 2020-04-03    4653         4668  2020-04-11 4736       0  28     4692
#> 20: 2020-04-04    4555         4585  2020-04-11 4694       1  44     4624
#> 21: 2020-04-05    4742         4805  2020-04-11 4965       1  66     4861
#> 22: 2020-04-06    4209         4316  2020-04-11 4503       2  80     4376
#> 23: 2020-04-07    3433         3599  2020-04-11 3785       2  85     3647
#> 24: 2020-04-08    2781         3039  2020-04-11 3205       2  88     3060
#> 25: 2020-04-09    3237         3836  2020-04-11 4009       3 131     3798
#> 26: 2020-04-10    2994         4204  2020-04-11 4239       4 167     3979
#> 27: 2020-04-11    1846         3951  2020-04-11 3669       7 292     3211
#> 28: 2020-04-13    4074         4074  2020-04-21 4146       0  24     4108
#> 29: 2020-04-14    3124         3124  2020-04-21 3219       0  30     3171
#> 30: 2020-04-15    2920         2920  2020-04-21 3057       1  41     2993
#> 31: 2020-04-16    2581         2581  2020-04-21 2761       1  49     2683
#> 32: 2020-04-17    3565         3565  2020-04-21 3930       2  88     3787
#> 33: 2020-04-18    3129         3129  2020-04-21 3606       2  99     3443
#> 34: 2020-04-19    2846         2846  2020-04-21 3524       3 115     3339
#> 35: 2020-04-20    2052         2052  2020-04-21 2905       2 114     2727
#> 36: 2020-04-21     962          962  2020-04-21 1912       3 152     1673
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>  1:     5262     5276   5284     5293     5306     5340
#>  2:     5224     5245   5258     5272     5293     5345
#>  3:     6148     6183   6204     6227     6262     6347
#>  4:     5886     5931   5958     5987     6032     6142
#>  5:     5746     5802   5834     5871     5927     6061
#>  6:     4734     4791   4824     4859     4918     5056
#>  7:     3233     3278   3305     3331     3381     3489
#>  8:     2314     2353   2376     2399     2439     2534
#>  9:        0        0      0        0        0        0
#> 10:     6050     6065   6074     6085     6100     6139
#> 11:     5331     5353   5366     5380     5402     5455
#> 12:     4186     4210   4225     4240     4264     4322
#> 13:     4248     4281   4300     4321     4354     4433
#> 14:     5098     5148   5176     5209     5259     5378
#> 15:     5071     5132   5168     5205     5268     5415
#> 16:     5066     5136   5179     5219     5297     5467
#> 17:     5326     5415   5469     5522     5614     5833
#> 18:     4545     4694   4795     4875     5048     5477
#> 19:     4715     4727   4734     4742     4754     4785
#> 20:     4662     4681   4692     4705     4723     4770
#> 21:     4918     4947   4964     4982     5010     5078
#> 22:     4447     4481   4501     4523     4557     4640
#> 23:     3725     3762   3782     3806     3843     3929
#> 24:     3144     3182   3204     3227     3266     3358
#> 25:     3920     3974   4008     4039     4099     4230
#> 26:     4127     4195   4237     4278     4350     4519
#> 27:     3471     3585   3662     3723     3855     4183
#> 28:     4128     4139   4145     4152     4163     4189
#> 29:     3197     3210   3218     3226     3239     3272
#> 30:     3028     3046   3056     3068     3085     3127
#> 31:     2727     2748   2760     2774     2794     2845
#> 32:     3868     3906   3928     3952     3990     4081
#> 33:     3538     3580   3605     3631     3675     3778
#> 34:     3446     3494   3523     3551     3604     3719
#> 35:     2828     2875   2904     2932     2981     3097
#> 36:     1809     1868   1908     1940     2009     2180
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


options(old_opts)