Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used in
regional_epinow
, epinow
, and estimate_infections
to adjust for
truncated data. See
here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- obs
A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.
- max_truncation
Deprecated; use
truncation
instead.- trunc_max
Deprecated; use
truncation
instead.- trunc_dist
Deprecated; use
truncation
instead.- truncation
A call to
trunc_opts()
defining the truncation of observed data. Defaults totrunc_opts()
. Seeestimate_truncation()
for an approach to estimating truncation from data.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- CrIs
Numeric vector of credible intervals to calculate.
- weigh_delay_priors
Logical. If TRUE, all delay distribution priors will be weighted by the number of observation data points, in doing so approximately placing an independent prior at each time step and usually preventing the posteriors from shifting. If FALSE (default), no weight will be applied, i.e. delay distributions will be treated as a single parameters.
- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling
.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), the estimated CMF of the truncation distribution
(cmf
, can be used to adjusted new data), a data frame containing the
observed truncated data, latest observed data and the adjusted for
truncation observations (obs
), a data frame containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# get example case counts
reported_cases <- example_confirmed[1:60]
# define example truncation distribution (note not integer adjusted)
trunc <- dist_spec(
mean = convert_to_logmean(3, 2),
mean_sd = 0.1,
sd = convert_to_logsd(3, 2),
sd_sd = 0.1,
max = 10
)
# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
set.seed(index)
if (dist$dist == 0) {
dfunc <- dlnorm
} else {
dfunc <- dgamma
}
cmf <- cumsum(
dfunc(
1:(dist$max + 1),
rnorm(1, dist$mean_mean, dist$mean_sd),
rnorm(1, dist$sd_mean, dist$sd_sd)
)
)
cmf <- cmf / cmf[dist$max + 1]
cmf <- rev(cmf)[-1]
trunc_cases <- data.table::copy(cases)[1:(.N - index)]
trunc_cases[
(.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)
]
return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
construct_truncation,
cases = reported_cases,
dist = trunc
)
# fit model to example data
est <- estimate_truncation(example_data,
verbose = interactive(),
chains = 2, iter = 2000
)
# summary of the distribution
est$dist
#>
#> Uncertain lognormal distribution with (untruncated) logmean 0.74 (SD 0.049) and logSD 0.84 (SD 0.057)
#>
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> 1: 1 1.0000000 2.977565e-18 1.279969e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9897790 9.852952e-05 2.561621e-03 0.9853582 0.9881719 0.9892414
#> 3: 3 0.9754393 2.164045e-04 5.635502e-03 0.9657961 0.9718691 0.9742069
#> 4: 4 0.9548505 3.552062e-04 9.267472e-03 0.9391303 0.9488744 0.9527539
#> 5: 5 0.9244992 5.129389e-04 1.341384e-02 0.9023601 0.9156591 0.9213726
#> 6: 6 0.8783954 6.780371e-04 1.779213e-02 0.8492902 0.8662863 0.8739658
#> 7: 7 0.8059975 8.155125e-04 2.154708e-02 0.7718169 0.7912876 0.8002060
#> 8: 8 0.6884286 8.380394e-04 2.262024e-02 0.6527860 0.6729946 0.6818290
#> 9: 9 0.4937034 5.804593e-04 1.766879e-02 0.4654358 0.4817656 0.4888049
#> 10: 10 0.1937325 2.635360e-04 1.226994e-02 0.1748211 0.1853655 0.1901667
#> median upper_20 upper_50 upper_90
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9898978 0.9904790 0.9915543 0.9937713
#> 3: 0.9756249 0.9769393 0.9792924 0.9842957
#> 4: 0.9550449 0.9572126 0.9610993 0.9695906
#> 5: 0.9246170 0.9278812 0.9334502 0.9461651
#> 6: 0.8781359 0.8829275 0.8900884 0.9075979
#> 7: 0.8051511 0.8111331 0.8199902 0.8419513
#> 8: 0.6873882 0.6932944 0.7027855 0.7268224
#> 9: 0.4929576 0.4975440 0.5048212 0.5239339
#> 10: 0.1932452 0.1959349 0.2011387 0.2145805
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> 1: 2020-03-24 4751 4789 2020-04-01 4800 0 12 4780
#> 2: 2020-03-25 5163 5249 2020-04-01 5293 1 30 5245
#> 3: 2020-03-26 5049 5210 2020-04-01 5288 1 51 5207
#> 4: 2020-03-27 5804 6153 2020-04-01 6279 3 91 6134
#> 5: 2020-03-28 5345 5959 2020-04-01 6087 4 123 5889
#> 6: 2020-03-29 4860 5974 2020-04-01 6034 6 161 5772
#> 7: 2020-03-30 3474 5217 2020-04-01 5051 6 165 4779
#> 8: 2020-03-31 1721 4050 2020-04-01 3490 4 124 3284
#> 9: 2020-04-01 502 4053 2020-04-01 2601 3 163 2339
#> 10: 2020-03-29 5852 5974 2020-04-06 5912 0 15 5888
#> 11: 2020-03-30 5025 5217 2020-04-06 5151 1 29 5105
#> 12: 2020-03-31 3808 4050 2020-04-06 3988 1 38 3927
#> 13: 2020-04-01 3676 4053 2020-04-06 3977 2 57 3885
#> 14: 2020-04-02 4105 4782 2020-04-06 4675 3 94 4522
#> 15: 2020-04-03 3668 4668 2020-04-06 4554 4 121 4356
#> 16: 2020-04-04 3093 4585 2020-04-06 4497 5 147 4255
#> 17: 2020-04-05 2423 4805 2020-04-06 4914 5 175 4624
#> 18: 2020-04-06 1106 4316 2020-04-06 5731 7 359 5154
#> 19: 2020-04-03 4636 4668 2020-04-11 4683 0 12 4665
#> 20: 2020-04-04 4522 4585 2020-04-11 4636 1 26 4594
#> 21: 2020-04-05 4681 4805 2020-04-11 4902 1 47 4827
#> 22: 2020-04-06 4115 4316 2020-04-11 4451 2 64 4349
#> 23: 2020-04-07 3298 3599 2020-04-11 3756 2 76 3633
#> 24: 2020-04-08 2580 3039 2020-04-11 3203 3 85 3064
#> 25: 2020-04-09 2788 3836 2020-04-11 4054 4 133 3835
#> 26: 2020-04-10 2156 4204 2020-04-11 4372 5 156 4115
#> 27: 2020-04-11 789 3951 2020-04-11 4088 5 256 3676
#> 28: 2020-04-13 4050 4050 2020-04-21 4091 0 10 4075
#> 29: 2020-04-14 3089 3089 2020-04-21 3166 0 18 3138
#> 30: 2020-04-15 2861 2861 2020-04-21 2996 1 29 2950
#> 31: 2020-04-16 2491 2491 2020-04-21 2695 1 39 2632
#> 32: 2020-04-17 3350 3350 2020-04-21 3815 2 77 3691
#> 33: 2020-04-18 2793 2793 2020-04-21 3467 3 92 3317
#> 34: 2020-04-19 2284 2284 2020-04-21 3321 4 109 3142
#> 35: 2020-04-20 1291 1291 2020-04-21 2618 3 93 2464
#> 36: 2020-04-21 302 302 2020-04-21 1565 2 98 1407
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> 1: 4791 4796 4799 4802 4807 4821
#> 2: 5272 5284 5291 5299 5312 5345
#> 3: 5253 5274 5286 5299 5321 5376
#> 4: 6217 6255 6277 6299 6338 6432
#> 5: 6005 6053 6086 6115 6170 6293
#> 6: 5926 5991 6036 6073 6141 6296
#> 7: 4943 5010 5053 5095 5162 5321
#> 8: 3409 3458 3491 3520 3572 3697
#> 9: 2495 2562 2597 2639 2708 2871
#> 10: 5901 5908 5911 5915 5922 5938
#> 11: 5131 5143 5150 5158 5170 5202
#> 12: 3962 3978 3987 3996 4013 4054
#> 13: 3938 3961 3975 3989 4014 4073
#> 14: 4611 4649 4674 4696 4738 4833
#> 15: 4473 4522 4555 4583 4635 4752
#> 16: 4401 4461 4499 4536 4595 4738
#> 17: 4799 4869 4915 4956 5029 5205
#> 18: 5498 5644 5723 5815 5966 6326
#> 19: 4675 4680 4683 4686 4691 4704
#> 20: 4617 4628 4634 4641 4652 4682
#> 21: 4870 4890 4901 4913 4933 4984
#> 22: 4408 4434 4450 4466 4494 4560
#> 23: 3705 3735 3755 3773 3807 3883
#> 24: 3146 3180 3204 3224 3260 3342
#> 25: 3967 4021 4055 4089 4142 4270
#> 26: 4270 4333 4373 4410 4475 4632
#> 27: 3922 4026 4082 4148 4256 4513
#> 28: 4084 4088 4091 4094 4098 4110
#> 29: 3154 3161 3166 3170 3178 3198
#> 30: 2976 2988 2995 3002 3015 3046
#> 31: 2668 2684 2694 2703 2720 2760
#> 32: 3763 3794 3814 3833 3867 3944
#> 33: 3406 3443 3468 3490 3529 3618
#> 34: 3249 3294 3322 3349 3393 3498
#> 35: 2557 2594 2618 2641 2679 2773
#> 36: 1501 1541 1562 1588 1629 1727
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
options(old_opts)