Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used in
regional_epinow
, epinow
, and estimate_infections
to adjust for
truncated data. See
here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- obs
A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.
- max_truncation
Deprecated; use
trunc_max
instead.- trunc_max
Integer, defaults to 10. Maximum number of days to include in the truncation distribution.
- trunc_dist
Character, defaults to "lognormal". The parametric distribution to be used for truncation.
- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- CrIs
Numeric vector of credible intervals to calculate.
- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling
.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), the estimated CMF of the truncation distribution
(cmf
, can be used to adjusted new data), a data frame containing the
observed truncated data, latest observed data and the adjusted for
truncation observations (obs
), a data frame containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# get example case counts
reported_cases <- example_confirmed[1:60]
# define example truncation distribution (note not integer adjusted)
trunc <- list(
mean = convert_to_logmean(3, 2),
mean_sd = 0.1,
sd = convert_to_logsd(3, 2),
sd_sd = 0.1,
max = 10
)
# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
set.seed(index)
cmf <- cumsum(
dlnorm(
1:(dist$max + 1),
rnorm(1, dist$mean, dist$mean_sd),
rnorm(1, dist$sd, dist$sd_sd)
)
)
cmf <- cmf / cmf[dist$max + 1]
cmf <- rev(cmf)[-1]
trunc_cases <- data.table::copy(cases)[1:(.N - index)]
trunc_cases[(.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)]
return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
construct_truncation,
cases = reported_cases,
dist = trunc
)
# fit model to example data
est <- estimate_truncation(example_data,
verbose = interactive(),
chains = 2, iter = 2000
)
# summary of the distribution
est$dist
#> $mean
#> [1] 0.745
#>
#> $mean_sd
#> [1] 0.049
#>
#> $sd
#> [1] 0.841
#>
#> $sd_sd
#> [1] 0.058
#>
#> $max
#> [1] 10
#>
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> 1: 1 1.0000000 2.323258e-18 1.016578e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9895688 9.306199e-05 2.598503e-03 0.9849659 0.9879570 0.9890751
#> 3: 3 0.9749787 2.037676e-04 5.702845e-03 0.9649637 0.9714167 0.9737968
#> 4: 4 0.9540969 3.332207e-04 9.351138e-03 0.9379148 0.9481680 0.9520518
#> 5: 5 0.9234165 4.790112e-04 1.348741e-02 0.9004103 0.9147455 0.9202467
#> 6: 6 0.8769760 6.296617e-04 1.781110e-02 0.8472604 0.8654910 0.8726362
#> 7: 7 0.8043144 7.520740e-04 2.144412e-02 0.7695263 0.7902834 0.7985765
#> 8: 8 0.6867509 7.661309e-04 2.231039e-02 0.6515176 0.6719414 0.6807497
#> 9: 9 0.4926862 5.222733e-04 1.709335e-02 0.4654699 0.4813277 0.4880190
#> 10: 10 0.1941655 2.602055e-04 1.199601e-02 0.1759491 0.1860716 0.1905802
#> median upper_20 upper_50 upper_90
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9897168 0.9903550 0.9913599 0.9935947
#> 3: 0.9752392 0.9766642 0.9788681 0.9839588
#> 4: 0.9543179 0.9567695 0.9604400 0.9690790
#> 5: 0.9235168 0.9271353 0.9325527 0.9454211
#> 6: 0.8767923 0.8816532 0.8886413 0.9063534
#> 7: 0.8037239 0.8094892 0.8186491 0.8400298
#> 8: 0.6859477 0.6914729 0.7015577 0.7232533
#> 9: 0.4923436 0.4964657 0.5035816 0.5215085
#> 10: 0.1936145 0.1962295 0.2016757 0.2153953
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> 1: 2020-03-24 4751 4789 2020-04-01 4801 0 12 4781
#> 2: 2020-03-25 5163 5249 2020-04-01 5295 1 31 5247
#> 3: 2020-03-26 5049 5210 2020-04-01 5292 1 52 5210
#> 4: 2020-03-27 5804 6153 2020-04-01 6286 3 92 6139
#> 5: 2020-03-28 5345 5959 2020-04-01 6097 4 124 5897
#> 6: 2020-03-29 4860 5974 2020-04-01 6046 5 161 5785
#> 7: 2020-03-30 3474 5217 2020-04-01 5063 5 164 4803
#> 8: 2020-03-31 1721 4050 2020-04-01 3497 3 121 3300
#> 9: 2020-04-01 502 4053 2020-04-01 2595 3 158 2330
#> 10: 2020-03-29 5852 5974 2020-04-06 5913 0 15 5889
#> 11: 2020-03-30 5025 5217 2020-04-06 5154 1 30 5106
#> 12: 2020-03-31 3808 4050 2020-04-06 3991 1 39 3929
#> 13: 2020-04-01 3676 4053 2020-04-06 3981 2 58 3888
#> 14: 2020-04-02 4105 4782 2020-04-06 4682 3 95 4529
#> 15: 2020-04-03 3668 4668 2020-04-06 4563 4 121 4366
#> 16: 2020-04-04 3093 4585 2020-04-06 4508 5 146 4276
#> 17: 2020-04-05 2423 4805 2020-04-06 4923 5 170 4646
#> 18: 2020-04-06 1106 4316 2020-04-06 5717 7 349 5134
#> 19: 2020-04-03 4636 4668 2020-04-11 4684 0 12 4665
#> 20: 2020-04-04 4522 4585 2020-04-11 4638 0 27 4595
#> 21: 2020-04-05 4681 4805 2020-04-11 4906 1 48 4830
#> 22: 2020-04-06 4115 4316 2020-04-11 4457 2 65 4352
#> 23: 2020-04-07 3298 3599 2020-04-11 3762 2 76 3638
#> 24: 2020-04-08 2580 3039 2020-04-11 3209 3 85 3071
#> 25: 2020-04-09 2788 3836 2020-04-11 4063 4 131 3854
#> 26: 2020-04-10 2156 4204 2020-04-11 4381 4 152 4134
#> 27: 2020-04-11 789 3951 2020-04-11 4078 5 249 3663
#> 28: 2020-04-13 4050 4050 2020-04-21 4092 0 10 4076
#> 29: 2020-04-14 3089 3089 2020-04-21 3168 0 18 3139
#> 30: 2020-04-15 2861 2861 2020-04-21 2998 1 29 2952
#> 31: 2020-04-16 2491 2491 2020-04-21 2698 1 39 2634
#> 32: 2020-04-17 3350 3350 2020-04-21 3821 2 77 3696
#> 33: 2020-04-18 2793 2793 2020-04-21 3474 3 92 3324
#> 34: 2020-04-19 2284 2284 2020-04-21 3329 3 108 3157
#> 35: 2020-04-20 1291 1291 2020-04-21 2623 2 91 2475
#> 36: 2020-04-21 302 302 2020-04-21 1561 2 95 1402
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> 1: 4792 4797 4800 4803 4808 4823
#> 2: 5274 5286 5294 5301 5314 5350
#> 3: 5256 5277 5290 5303 5325 5383
#> 4: 6223 6260 6284 6307 6344 6445
#> 5: 6014 6062 6096 6125 6175 6308
#> 6: 5936 6003 6046 6085 6149 6315
#> 7: 4951 5024 5064 5103 5170 5332
#> 8: 3417 3466 3495 3526 3575 3697
#> 9: 2489 2558 2592 2634 2697 2853
#> 10: 5903 5908 5912 5916 5923 5941
#> 11: 5133 5145 5152 5160 5172 5207
#> 12: 3964 3980 3990 3999 4016 4060
#> 13: 3941 3964 3980 3994 4018 4082
#> 14: 4619 4656 4681 4704 4742 4845
#> 15: 4480 4531 4563 4593 4641 4766
#> 16: 4408 4473 4509 4543 4603 4747
#> 17: 4811 4880 4921 4964 5033 5205
#> 18: 5484 5636 5712 5803 5943 6285
#> 19: 4676 4681 4684 4687 4692 4706
#> 20: 4619 4630 4636 4643 4655 4686
#> 21: 4873 4892 4905 4916 4936 4990
#> 22: 4412 4438 4455 4471 4498 4570
#> 23: 3711 3740 3761 3779 3810 3892
#> 24: 3151 3187 3210 3230 3264 3352
#> 25: 3974 4031 4064 4095 4149 4279
#> 26: 4281 4342 4379 4417 4479 4631
#> 27: 3912 4020 4075 4139 4240 4484
#> 28: 4085 4089 4092 4094 4099 4111
#> 29: 3155 3162 3167 3172 3179 3201
#> 30: 2978 2990 2997 3005 3017 3050
#> 31: 2671 2686 2697 2706 2723 2766
#> 32: 3769 3799 3820 3838 3870 3953
#> 33: 3411 3450 3475 3497 3534 3629
#> 34: 3255 3303 3329 3355 3399 3505
#> 35: 2563 2600 2622 2645 2682 2773
#> 36: 1497 1539 1559 1584 1623 1716
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
options(old_opts)