Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used in
regional_epinow()
, epinow()
, and estimate_infections()
to adjust for
truncated data. See
here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- obs
A list of
<data.frame>
s each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- max_truncation
Deprecated; use
truncation
instead.- trunc_max
Deprecated; use
truncation
instead.- trunc_dist
Deprecated; use
truncation
instead.- truncation
A call to
trunc_opts()
defining the truncation of observed data. Defaults totrunc_opts()
. Seeestimate_truncation()
for an approach to estimating truncation from data.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- CrIs
Numeric vector of credible intervals to calculate.
- weigh_delay_priors
Logical. If TRUE, all delay distribution priors will be weighted by the number of observation data points, in doing so approximately placing an independent prior at each time step and usually preventing the posteriors from shifting. If FALSE (default), no weight will be applied, i.e. delay distributions will be treated as a single parameters.
- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling()
.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), the estimated CMF of the truncation distribution
(cmf
, can be used to adjusted new data), a <data.frame>
containing the
observed truncated data, latest observed data and the adjusted for
truncation observations (obs
), a <data.frame>
containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# get example case counts
reported_cases <- example_confirmed[1:60]
# define example truncation distribution (note not integer adjusted)
trunc <- dist_spec(
mean = convert_to_logmean(3, 2),
mean_sd = 0.1,
sd = convert_to_logsd(3, 2),
sd_sd = 0.1,
max = 10
)
# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
set.seed(index)
if (dist$dist == 0) {
dfunc <- dlnorm
} else {
dfunc <- dgamma
}
cmf <- cumsum(
dfunc(
1:(dist$max + 1),
rnorm(1, dist$mean_mean, dist$mean_sd),
rnorm(1, dist$sd_mean, dist$sd_sd)
)
)
cmf <- cmf / cmf[dist$max + 1]
cmf <- rev(cmf)[-1]
trunc_cases <- data.table::copy(cases)[1:(.N - index)]
trunc_cases[
(.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)
]
return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
construct_truncation,
cases = reported_cases,
dist = trunc
)
# fit model to example data
est <- estimate_truncation(example_data,
verbose = interactive(),
chains = 2, iter = 2000
)
# summary of the distribution
est$dist
#>
#> Uncertain lognormal distribution with (untruncated) logmean 0.091 (SD 0.14) and logSD 1.5 (SD 0.32)
#>
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> 1: 1 1.0000000 2.775896e-18 1.212176e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9921205 8.568034e-05 2.788720e-03 0.9872529 0.9902813 0.9914826
#> 3: 3 0.9824764 1.780379e-04 5.844760e-03 0.9724098 0.9785708 0.9810700
#> 4: 4 0.9704462 2.768270e-04 9.182233e-03 0.9547612 0.9642469 0.9681107
#> 5: 5 0.9550902 3.808010e-04 1.279491e-02 0.9336809 0.9463992 0.9516969
#> 6: 6 0.9349227 4.866058e-04 1.662972e-02 0.9069603 0.9234404 0.9303761
#> 7: 7 0.9074571 5.864466e-04 2.053122e-02 0.8735577 0.8932890 0.9018734
#> 8: 8 0.8682170 6.632202e-04 2.413223e-02 0.8281059 0.8513502 0.8615467
#> 9: 9 0.8083018 6.966849e-04 2.672062e-02 0.7651308 0.7895987 0.8013815
#> 10: 10 0.7072803 6.909915e-04 2.809480e-02 0.6624083 0.6882540 0.6997003
#> 11: 11 0.5062614 1.052861e-03 4.019634e-02 0.4412611 0.4787906 0.4957663
#> median upper_20 upper_50 upper_90
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9922574 0.9929528 0.9941407 0.9964270
#> 3: 0.9827461 0.9841967 0.9867302 0.9915651
#> 4: 0.9706790 0.9730265 0.9769897 0.9848748
#> 5: 0.9552511 0.9585042 0.9640421 0.9755044
#> 6: 0.9349803 0.9391924 0.9464087 0.9617624
#> 7: 0.9075028 0.9125452 0.9214741 0.9411957
#> 8: 0.8678338 0.8738377 0.8843173 0.9087683
#> 9: 0.8076227 0.8144067 0.8256789 0.8521582
#> 10: 0.7064820 0.7135470 0.7254033 0.7524153
#> 11: 0.5040530 0.5149058 0.5316911 0.5748976
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> 1: 2020-03-24 4764 4789 2020-04-01 5285 0 31 5237
#> 2: 2020-03-25 5193 5249 2020-04-01 5259 1 49 5182
#> 3: 2020-03-26 5104 5210 2020-04-01 6206 2 83 6075
#> 4: 2020-03-27 5927 6153 2020-04-01 5960 3 106 5792
#> 5: 2020-03-28 5571 5959 2020-04-01 5837 3 131 5625
#> 6: 2020-03-29 5295 5974 2020-04-01 4826 3 133 4607
#> 7: 2020-03-30 4187 5217 2020-04-01 3306 2 108 3133
#> 8: 2020-03-31 2670 4050 2020-04-01 2377 2 93 2231
#> 9: 2020-04-01 1679 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6076 1 36 6020
#> 11: 2020-03-30 5209 5217 2020-04-06 5368 1 50 5288
#> 12: 2020-03-31 4036 4050 2020-04-06 4226 1 56 4137
#> 13: 2020-04-01 4021 4053 2020-04-06 4302 2 76 4180
#> 14: 2020-04-02 4698 4782 2020-04-06 5179 3 117 4991
#> 15: 2020-04-03 4485 4668 2020-04-06 5169 4 143 4935
#> 16: 2020-04-04 4183 4585 2020-04-06 5180 4 170 4908
#> 17: 2020-04-05 3864 4805 2020-04-06 5471 5 216 5135
#> 18: 2020-04-06 2417 4316 2020-04-06 4804 9 383 4204
#> 19: 2020-04-03 4653 4668 2020-04-11 4736 0 28 4692
#> 20: 2020-04-04 4555 4585 2020-04-11 4694 1 44 4624
#> 21: 2020-04-05 4742 4805 2020-04-11 4965 1 66 4861
#> 22: 2020-04-06 4209 4316 2020-04-11 4503 2 80 4376
#> 23: 2020-04-07 3433 3599 2020-04-11 3785 2 85 3647
#> 24: 2020-04-08 2781 3039 2020-04-11 3205 2 88 3060
#> 25: 2020-04-09 3237 3836 2020-04-11 4009 3 131 3798
#> 26: 2020-04-10 2994 4204 2020-04-11 4239 4 167 3979
#> 27: 2020-04-11 1846 3951 2020-04-11 3669 7 292 3211
#> 28: 2020-04-13 4074 4074 2020-04-21 4146 0 24 4108
#> 29: 2020-04-14 3124 3124 2020-04-21 3219 0 30 3171
#> 30: 2020-04-15 2920 2920 2020-04-21 3057 1 41 2993
#> 31: 2020-04-16 2581 2581 2020-04-21 2761 1 49 2683
#> 32: 2020-04-17 3565 3565 2020-04-21 3930 2 88 3787
#> 33: 2020-04-18 3129 3129 2020-04-21 3606 2 99 3443
#> 34: 2020-04-19 2846 2846 2020-04-21 3524 3 115 3339
#> 35: 2020-04-20 2052 2052 2020-04-21 2905 2 114 2727
#> 36: 2020-04-21 962 962 2020-04-21 1912 3 152 1673
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> 1: 5262 5276 5284 5293 5306 5340
#> 2: 5224 5245 5258 5272 5293 5345
#> 3: 6148 6183 6204 6227 6262 6347
#> 4: 5886 5931 5958 5987 6032 6142
#> 5: 5746 5802 5834 5871 5927 6061
#> 6: 4734 4791 4824 4859 4918 5056
#> 7: 3233 3278 3305 3331 3381 3489
#> 8: 2314 2353 2376 2399 2439 2534
#> 9: 0 0 0 0 0 0
#> 10: 6050 6065 6074 6085 6100 6139
#> 11: 5331 5353 5366 5380 5402 5455
#> 12: 4186 4210 4225 4240 4264 4322
#> 13: 4248 4281 4300 4321 4354 4433
#> 14: 5098 5148 5176 5209 5259 5378
#> 15: 5071 5132 5168 5205 5268 5415
#> 16: 5066 5136 5179 5219 5297 5467
#> 17: 5326 5415 5469 5522 5614 5833
#> 18: 4545 4694 4795 4875 5048 5477
#> 19: 4715 4727 4734 4742 4754 4785
#> 20: 4662 4681 4692 4705 4723 4770
#> 21: 4918 4947 4964 4982 5010 5078
#> 22: 4447 4481 4501 4523 4557 4640
#> 23: 3725 3762 3782 3806 3843 3929
#> 24: 3144 3182 3204 3227 3266 3358
#> 25: 3920 3974 4008 4039 4099 4230
#> 26: 4127 4195 4237 4278 4350 4519
#> 27: 3471 3585 3662 3723 3855 4183
#> 28: 4128 4139 4145 4152 4163 4189
#> 29: 3197 3210 3218 3226 3239 3272
#> 30: 3028 3046 3056 3068 3085 3127
#> 31: 2727 2748 2760 2774 2794 2845
#> 32: 3868 3906 3928 3952 3990 4081
#> 33: 3538 3580 3605 3631 3675 3778
#> 34: 3446 3494 3523 3551 3604 3719
#> 35: 2828 2875 2904 2932 2981 3097
#> 36: 1809 1868 1908 1940 2009 2180
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
options(old_opts)