[Experimental] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow, epinow, and estimate_infections to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be log normal with a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation and 1 over the square root of the over dispersion.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

estimate_truncation(
  obs,
  max_truncation = 10,
  model = NULL,
  CrIs = c(0.2, 0.5, 0.9),
  verbose = TRUE,
  ...
)

Arguments

obs

A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Integer, defaults to 10. Maximum number of days to include in the truncation distribution.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

CrIs

Numeric vector of credible intervals to calculate.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling.

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a data frame containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a data frame containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# set number of cores to use
options(mc.cores = ifelse(interactive(), 4, 1))
# get example case counts
reported_cases <- example_confirmed[1:60]

# define example truncation distribution (note not integer adjusted)
trunc_dist <- list(
  mean = convert_to_logmean(3, 2),
  mean_sd = 0.1,
  sd = convert_to_logsd(3, 2),
  sd_sd = 0.1,
  max = 10
)

# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
  set.seed(index)
  cmf <- cumsum(
    dlnorm(
      1:(dist$max + 1),
      rnorm(1, dist$mean, dist$mean_sd),
      rnorm(1, dist$sd, dist$sd_sd)
    )
  )
  cmf <- cmf / cmf[dist$max + 1]
  cmf <- rev(cmf)[-1]
  trunc_cases <- data.table::copy(cases)[1:(.N - index)]
  trunc_cases[(.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)]
  return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
  construct_truncation,
  cases = reported_cases,
  dist = trunc_dist
)

# fit model to example data
est <- estimate_truncation(example_data,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# summary of the distribution
est$dist
#> $mean
#> [1] 0.744
#> 
#> $mean_sd
#> [1] 0.048
#> 
#> $sd
#> [1] 0.837
#> 
#> $sd_sd
#> [1] 0.057
#> 
#> $max
#> [1] 10
#> 
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_20  lower_50  lower_90
#>  1:    10 1.0000000 1.229709e-16 1.232606e-16 1.0000000 1.0000000 1.0000000
#>  2:     9 0.9897167 9.540969e-05 2.517766e-03 0.9852316 0.9881352 0.9892874
#>  3:     8 0.9752970 2.088653e-04 5.537674e-03 0.9656375 0.9717680 0.9742890
#>  4:     7 0.9546060 3.413565e-04 9.104161e-03 0.9388958 0.9487342 0.9528299
#>  5:     6 0.9241262 4.900833e-04 1.317354e-02 0.9018118 0.9154829 0.9213714
#>  6:     5 0.8778668 6.439108e-04 1.746685e-02 0.8489062 0.8662818 0.8738324
#>  7:     4 0.8052973 7.718258e-04 2.113911e-02 0.7714286 0.7913532 0.7998544
#>  8:     3 0.6875869 7.929954e-04 2.214976e-02 0.6528999 0.6727746 0.6815341
#>  9:     2 0.4928854 5.604574e-04 1.716848e-02 0.4666520 0.4811258 0.4877839
#> 10:     1 0.1933555 2.842250e-04 1.197321e-02 0.1743768 0.1855526 0.1901809
#>        median  upper_20  upper_50  upper_90
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9898835 0.9904372 0.9914312 0.9935087
#>  3: 0.9755912 0.9768446 0.9790264 0.9837393
#>  4: 0.9549756 0.9570536 0.9607162 0.9685737
#>  5: 0.9245207 0.9275099 0.9328207 0.9445165
#>  6: 0.8781651 0.8820490 0.8891580 0.9053469
#>  7: 0.8053906 0.8101147 0.8184032 0.8389713
#>  8: 0.6872303 0.6926734 0.7013039 0.7232266
#>  9: 0.4919453 0.4964692 0.5032356 0.5217425
#> 10: 0.1928419 0.1955930 0.2005729 0.2136926
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date mean se_mean  sd lower_20
#>  1: 2020-03-24    4751         4789  2020-04-01 4800       0  12     4782
#>  2: 2020-03-25    5163         5249  2020-04-01 5293       1  30     5248
#>  3: 2020-03-26    5049         5210  2020-04-01 5289       1  50     5212
#>  4: 2020-03-27    5804         6153  2020-04-01 6281       3  89     6144
#>  5: 2020-03-28    5345         5959  2020-04-01 6091       4 121     5903
#>  6: 2020-03-29    4860         5974  2020-04-01 6039       5 158     5792
#>  7: 2020-03-30    3474         5217  2020-04-01 5057       5 161     4803
#>  8: 2020-03-31    1721         4050  2020-04-01 3495       3 120     3298
#>  9: 2020-04-01     502         4053  2020-04-01 2606       3 160     2349
#> 10: 2020-03-29    5852         5974  2020-04-06 5912       0  15     5890
#> 11: 2020-03-30    5025         5217  2020-04-06 5152       1  29     5108
#> 12: 2020-03-31    3808         4050  2020-04-06 3989       1  38     3931
#> 13: 2020-04-01    3676         4053  2020-04-06 3978       2  56     3891
#> 14: 2020-04-02    4105         4782  2020-04-06 4677       3  93     4534
#> 15: 2020-04-03    3668         4668  2020-04-06 4557       4 119     4372
#> 16: 2020-04-04    3093         4585  2020-04-06 4502       5 144     4276
#> 17: 2020-04-05    2423         4805  2020-04-06 4921       5 170     4644
#> 18: 2020-04-06    1106         4316  2020-04-06 5741       8 353     5175
#> 19: 2020-04-03    4636         4668  2020-04-11 4684       0  11     4666
#> 20: 2020-04-04    4522         4585  2020-04-11 4636       0  26     4596
#> 21: 2020-04-05    4681         4805  2020-04-11 4904       1  46     4832
#> 22: 2020-04-06    4115         4316  2020-04-11 4453       2  63     4356
#> 23: 2020-04-07    3298         3599  2020-04-11 3758       2  74     3642
#> 24: 2020-04-08    2580         3039  2020-04-11 3205       3  83     3075
#> 25: 2020-04-09    2788         3836  2020-04-11 4058       4 129     3854
#> 26: 2020-04-10    2156         4204  2020-04-11 4379       4 151     4132
#> 27: 2020-04-11     789         3951  2020-04-11 4096       5 252     3692
#> 28: 2020-04-13    4050         4050  2020-04-21 4092       0  10     4076
#> 29: 2020-04-14    3089         3089  2020-04-21 3167       0  18     3140
#> 30: 2020-04-15    2861         2861  2020-04-21 2997       1  28     2953
#> 31: 2020-04-16    2491         2491  2020-04-21 2696       1  38     2637
#> 32: 2020-04-17    3350         3350  2020-04-21 3817       2  75     3700
#> 33: 2020-04-18    2793         2793  2020-04-21 3470       3  90     3329
#> 34: 2020-04-19    2284         2284  2020-04-21 3325       3 106     3158
#> 35: 2020-04-20    1291         1291  2020-04-21 2622       2  90     2474
#> 36: 2020-04-21     302          302  2020-04-21 1567       2  96     1413
#>           date confirm last_confirm report_date mean se_mean  sd lower_20
#>     lower_50 lower_90 median upper_20 upper_50 upper_90
#>  1:     4792     4796   4799     4802     4808     4822
#>  2:     5273     5285   5292     5299     5312     5346
#>  3:     5255     5275   5287     5298     5321     5377
#>  4:     6221     6257   6277     6299     6339     6435
#>  5:     6011     6059   6086     6116     6170     6296
#>  6:     5938     5999   6034     6076     6141     6299
#>  7:     4953     5015   5055     5097     5163     5320
#>  8:     3419     3466   3498     3528     3577     3687
#>  9:     2502     2566   2603     2639     2705     2878
#> 10:     5902     5908   5911     5915     5922     5939
#> 11:     5132     5144   5150     5157     5170     5203
#> 12:     3963     3978   3987     3996     4013     4055
#> 13:     3940     3963   3976     3989     4015     4076
#> 14:     4616     4653   4674     4697     4738     4835
#> 15:     4481     4527   4554     4585     4635     4754
#> 16:     4410     4465   4500     4538     4597     4737
#> 17:     4814     4880   4925     4967     5036     5192
#> 18:     5514     5654   5735     5815     5960     6342
#> 19:     4676     4680   4683     4686     4691     4705
#> 20:     4618     4629   4635     4641     4653     4682
#> 21:     4872     4891   4901     4912     4933     4985
#> 22:     4411     4436   4450     4466     4494     4563
#> 23:     3709     3739   3755     3774     3807     3884
#> 24:     3152     3184   3203     3225     3260     3344
#> 25:     3975     4024   4056     4090     4144     4270
#> 26:     4284     4342   4382     4419     4481     4620
#> 27:     3933     4033   4091     4148     4252     4524
#> 28:     4085     4089   4091     4093     4098     4110
#> 29:     3155     3162   3166     3170     3178     3198
#> 30:     2977     2989   2995     3002     3015     3047
#> 31:     2670     2685   2694     2703     2720     2762
#> 32:     3767     3797   3814     3833     3867     3946
#> 33:     3412     3447   3467     3491     3529     3620
#> 34:     3256     3297   3323     3351     3394     3498
#> 35:     2565     2600   2624     2646     2683     2766
#> 36:     1505     1544   1566     1587     1627     1731
#>     lower_50 lower_90 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)