Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow, epinow, and estimate_infections to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  obs,
  max_truncation,
  trunc_max = 10,
  trunc_dist = "lognormal",
  truncation = dist_spec(mean = 0, sd = 0, mean_sd = 1, sd_sd = 1, max = 10),
  model = NULL,
  CrIs = c(0.2, 0.5, 0.9),
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

obs

A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Deprecated; use truncation instead.

trunc_max

Deprecated; use truncation instead.

trunc_dist

Deprecated; use truncation instead.

truncation

A call to trunc_opts() defining the truncation of observed data. Defaults to trunc_opts(). See estimate_truncation() for an approach to estimating truncation from data.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

CrIs

Numeric vector of credible intervals to calculate.

weigh_delay_priors

Logical. If TRUE, all delay distribution priors will be weighted by the number of observation data points, in doing so approximately placing an independent prior at each time step and usually preventing the posteriors from shifting. If FALSE (default), no weight will be applied, i.e. delay distributions will be treated as a single parameters.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling.

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a data frame containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a data frame containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Author

Sam Abbott

Sebastian Funk

Examples

# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# get example case counts
reported_cases <- example_confirmed[1:60]

# define example truncation distribution (note not integer adjusted)
trunc <- dist_spec(
  mean = convert_to_logmean(3, 2),
  mean_sd = 0.1,
  sd = convert_to_logsd(3, 2),
  sd_sd = 0.1,
  max = 10
)

# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
  set.seed(index)
  if (dist$dist == 0) {
    dfunc <- dlnorm
  } else {
    dfunc <- dgamma
  }
  cmf <- cumsum(
    dfunc(
      1:(dist$max + 1),
      rnorm(1, dist$mean_mean, dist$mean_sd),
      rnorm(1, dist$sd_mean, dist$sd_sd)
    )
  )
  cmf <- cmf / cmf[dist$max + 1]
  cmf <- rev(cmf)[-1]
  trunc_cases <- data.table::copy(cases)[1:(.N - index)]
  trunc_cases[
    (.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)
  ]
  return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
  construct_truncation,
  cases = reported_cases,
  dist = trunc
)

# fit model to example data
est <- estimate_truncation(example_data,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# summary of the distribution
est$dist
#> 
#>   Uncertain lognormal distribution with (untruncated) logmean 0.74 (SD 0.049) and logSD 0.84 (SD 0.057)
#> 
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>  1:     1 1.0000000 2.977565e-18 1.279969e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9897790 9.852952e-05 2.561621e-03 0.9853582 0.9881719 0.9892414
#>  3:     3 0.9754393 2.164045e-04 5.635502e-03 0.9657961 0.9718691 0.9742069
#>  4:     4 0.9548505 3.552062e-04 9.267472e-03 0.9391303 0.9488744 0.9527539
#>  5:     5 0.9244992 5.129389e-04 1.341384e-02 0.9023601 0.9156591 0.9213726
#>  6:     6 0.8783954 6.780371e-04 1.779213e-02 0.8492902 0.8662863 0.8739658
#>  7:     7 0.8059975 8.155125e-04 2.154708e-02 0.7718169 0.7912876 0.8002060
#>  8:     8 0.6884286 8.380394e-04 2.262024e-02 0.6527860 0.6729946 0.6818290
#>  9:     9 0.4937034 5.804593e-04 1.766879e-02 0.4654358 0.4817656 0.4888049
#> 10:    10 0.1937325 2.635360e-04 1.226994e-02 0.1748211 0.1853655 0.1901667
#>        median  upper_20  upper_50  upper_90
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9898978 0.9904790 0.9915543 0.9937713
#>  3: 0.9756249 0.9769393 0.9792924 0.9842957
#>  4: 0.9550449 0.9572126 0.9610993 0.9695906
#>  5: 0.9246170 0.9278812 0.9334502 0.9461651
#>  6: 0.8781359 0.8829275 0.8900884 0.9075979
#>  7: 0.8051511 0.8111331 0.8199902 0.8419513
#>  8: 0.6873882 0.6932944 0.7027855 0.7268224
#>  9: 0.4929576 0.4975440 0.5048212 0.5239339
#> 10: 0.1932452 0.1959349 0.2011387 0.2145805
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>  1: 2020-03-24    4751         4789  2020-04-01 4800       0  12     4780
#>  2: 2020-03-25    5163         5249  2020-04-01 5293       1  30     5245
#>  3: 2020-03-26    5049         5210  2020-04-01 5288       1  51     5207
#>  4: 2020-03-27    5804         6153  2020-04-01 6279       3  91     6134
#>  5: 2020-03-28    5345         5959  2020-04-01 6087       4 123     5889
#>  6: 2020-03-29    4860         5974  2020-04-01 6034       6 161     5772
#>  7: 2020-03-30    3474         5217  2020-04-01 5051       6 165     4779
#>  8: 2020-03-31    1721         4050  2020-04-01 3490       4 124     3284
#>  9: 2020-04-01     502         4053  2020-04-01 2601       3 163     2339
#> 10: 2020-03-29    5852         5974  2020-04-06 5912       0  15     5888
#> 11: 2020-03-30    5025         5217  2020-04-06 5151       1  29     5105
#> 12: 2020-03-31    3808         4050  2020-04-06 3988       1  38     3927
#> 13: 2020-04-01    3676         4053  2020-04-06 3977       2  57     3885
#> 14: 2020-04-02    4105         4782  2020-04-06 4675       3  94     4522
#> 15: 2020-04-03    3668         4668  2020-04-06 4554       4 121     4356
#> 16: 2020-04-04    3093         4585  2020-04-06 4497       5 147     4255
#> 17: 2020-04-05    2423         4805  2020-04-06 4914       5 175     4624
#> 18: 2020-04-06    1106         4316  2020-04-06 5731       7 359     5154
#> 19: 2020-04-03    4636         4668  2020-04-11 4683       0  12     4665
#> 20: 2020-04-04    4522         4585  2020-04-11 4636       1  26     4594
#> 21: 2020-04-05    4681         4805  2020-04-11 4902       1  47     4827
#> 22: 2020-04-06    4115         4316  2020-04-11 4451       2  64     4349
#> 23: 2020-04-07    3298         3599  2020-04-11 3756       2  76     3633
#> 24: 2020-04-08    2580         3039  2020-04-11 3203       3  85     3064
#> 25: 2020-04-09    2788         3836  2020-04-11 4054       4 133     3835
#> 26: 2020-04-10    2156         4204  2020-04-11 4372       5 156     4115
#> 27: 2020-04-11     789         3951  2020-04-11 4088       5 256     3676
#> 28: 2020-04-13    4050         4050  2020-04-21 4091       0  10     4075
#> 29: 2020-04-14    3089         3089  2020-04-21 3166       0  18     3138
#> 30: 2020-04-15    2861         2861  2020-04-21 2996       1  29     2950
#> 31: 2020-04-16    2491         2491  2020-04-21 2695       1  39     2632
#> 32: 2020-04-17    3350         3350  2020-04-21 3815       2  77     3691
#> 33: 2020-04-18    2793         2793  2020-04-21 3467       3  92     3317
#> 34: 2020-04-19    2284         2284  2020-04-21 3321       4 109     3142
#> 35: 2020-04-20    1291         1291  2020-04-21 2618       3  93     2464
#> 36: 2020-04-21     302          302  2020-04-21 1565       2  98     1407
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>  1:     4791     4796   4799     4802     4807     4821
#>  2:     5272     5284   5291     5299     5312     5345
#>  3:     5253     5274   5286     5299     5321     5376
#>  4:     6217     6255   6277     6299     6338     6432
#>  5:     6005     6053   6086     6115     6170     6293
#>  6:     5926     5991   6036     6073     6141     6296
#>  7:     4943     5010   5053     5095     5162     5321
#>  8:     3409     3458   3491     3520     3572     3697
#>  9:     2495     2562   2597     2639     2708     2871
#> 10:     5901     5908   5911     5915     5922     5938
#> 11:     5131     5143   5150     5158     5170     5202
#> 12:     3962     3978   3987     3996     4013     4054
#> 13:     3938     3961   3975     3989     4014     4073
#> 14:     4611     4649   4674     4696     4738     4833
#> 15:     4473     4522   4555     4583     4635     4752
#> 16:     4401     4461   4499     4536     4595     4738
#> 17:     4799     4869   4915     4956     5029     5205
#> 18:     5498     5644   5723     5815     5966     6326
#> 19:     4675     4680   4683     4686     4691     4704
#> 20:     4617     4628   4634     4641     4652     4682
#> 21:     4870     4890   4901     4913     4933     4984
#> 22:     4408     4434   4450     4466     4494     4560
#> 23:     3705     3735   3755     3773     3807     3883
#> 24:     3146     3180   3204     3224     3260     3342
#> 25:     3967     4021   4055     4089     4142     4270
#> 26:     4270     4333   4373     4410     4475     4632
#> 27:     3922     4026   4082     4148     4256     4513
#> 28:     4084     4088   4091     4094     4098     4110
#> 29:     3154     3161   3166     3170     3178     3198
#> 30:     2976     2988   2995     3002     3015     3046
#> 31:     2668     2684   2694     2703     2720     2760
#> 32:     3763     3794   3814     3833     3867     3944
#> 33:     3406     3443   3468     3490     3529     3618
#> 34:     3249     3294   3322     3349     3393     3498
#> 35:     2557     2594   2618     2641     2679     2773
#> 36:     1501     1541   1562     1588     1629     1727
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


options(old_opts)