Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used in
regional_epinow()
, epinow()
, and estimate_infections()
to adjust for
truncated data. See
here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Usage
estimate_truncation(
obs,
max_truncation,
trunc_max = 10,
trunc_dist = "lognormal",
truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
10)),
model = NULL,
stan = stan_opts(),
CrIs = c(0.2, 0.5, 0.9),
filter_leading_zeros = FALSE,
zero_threshold = Inf,
weigh_delay_priors = FALSE,
verbose = TRUE,
...
)
Arguments
- obs
A list of
<data.frame>
s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- max_truncation
Deprecated; use
truncation
instead.- trunc_max
Deprecated; use
truncation
instead.- trunc_dist
Deprecated; use
truncation
instead.- truncation
A call to
trunc_opts()
defining the truncation of observed data. Defaults totrunc_opts()
. Seeestimate_truncation()
for an approach to estimating truncation from data.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- stan
A list of stan options as generated by
stan_opts()
. Defaults tostan_opts()
. Can be used to overridedata
,init
, andverbose
settings if desired.- CrIs
Numeric vector of credible intervals to calculate.
- filter_leading_zeros
Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
- zero_threshold
Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using
fill
.- weigh_delay_priors
Deprecated; use the
weight_prior
option intrunc_opts()
instead.- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling()
.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), the estimated CMF of the truncation distribution
(cmf
, can be used to adjusted new data), a <data.frame>
containing the
observed truncated data, latest observed data and the adjusted for
truncation observations (obs
), a <data.frame>
containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
verbose = interactive(),
chains = 2, iter = 2000
)
#> WARN [2024-04-25 14:35:16] estimate_truncation (chain: 1): There were 25 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2024-04-25 14:35:16] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#> -
# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1.5
#> sd:
#> 0.42
#> sdlog:
#> - normal distribution:
#> mean:
#> 2.3
#> sd:
#> 0.78
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1.0000000 2.965431e-18 1.266896e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9958943 7.811206e-05 2.425520e-03 0.9916766 0.9941911 0.9953472
#> 3: 3 0.9909983 1.665071e-04 5.164482e-03 0.9820701 0.9873934 0.9897497
#> 4: 4 0.9850382 2.678167e-04 8.292710e-03 0.9708321 0.9792489 0.9830146
#> 5: 5 0.9775902 3.857553e-04 1.191441e-02 0.9575502 0.9693891 0.9745650
#> 6: 6 0.9679567 5.257750e-04 1.617684e-02 0.9414292 0.9567064 0.9637027
#> 7: 7 0.9548908 6.962624e-04 2.129397e-02 0.9208823 0.9397821 0.9488982
#> 8: 8 0.9358834 9.118719e-04 2.758054e-02 0.8930050 0.9161742 0.9276654
#> 9: 9 0.9048426 1.196486e-03 3.545357e-02 0.8529270 0.8796020 0.8938938
#> 10: 10 0.8404026 1.527546e-03 4.452953e-02 0.7789812 0.8095312 0.8263420
#> 11: 11 0.3977947 8.724564e-04 2.540405e-02 0.3631330 0.3808849 0.3895427
#> median upper_20 upper_50 upper_90
#> <num> <num> <num> <num>
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9960525 0.9966948 0.9977037 0.9996963
#> 3: 0.9912985 0.9926157 0.9948007 0.9992552
#> 4: 0.9854057 0.9875406 0.9910707 0.9985619
#> 5: 0.9778528 0.9809524 0.9861202 0.9973901
#> 6: 0.9679971 0.9721551 0.9792921 0.9953694
#> 7: 0.9543219 0.9598981 0.9694710 0.9921415
#> 8: 0.9345028 0.9418159 0.9535646 0.9849487
#> 9: 0.9020340 0.9113486 0.9262196 0.9684578
#> 10: 0.8366951 0.8467094 0.8644616 0.9176681
#> 11: 0.3952949 0.4010334 0.4109629 0.4415563
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> <Date> <num> <num> <Date> <int> <int> <int> <int>
#> 1: 2020-03-24 4764 4789 2020-04-01 5238 0 27 5194
#> 2: 2020-03-25 5191 5249 2020-04-01 5179 1 43 5109
#> 3: 2020-03-26 5102 5210 2020-04-01 6060 2 74 5939
#> 4: 2020-03-27 5924 6153 2020-04-01 5752 3 96 5592
#> 5: 2020-03-28 5567 5959 2020-04-01 5541 4 123 5330
#> 6: 2020-03-29 5289 5974 2020-04-01 4473 4 131 4246
#> 7: 2020-03-30 4183 5217 2020-04-01 2953 3 114 2754
#> 8: 2020-03-31 2668 4050 2020-04-01 2005 3 103 1831
#> 9: 2020-04-01 1681 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6024 1 31 5974
#> 11: 2020-03-30 5209 5217 2020-04-06 5288 1 44 5216
#> 12: 2020-03-31 4035 4050 2020-04-06 4128 1 50 4045
#> 13: 2020-04-01 4020 4053 2020-04-06 4154 2 69 4038
#> 14: 2020-04-02 4697 4782 2020-04-06 4921 3 109 4734
#> 15: 2020-04-03 4483 4668 2020-04-06 4794 4 140 4551
#> 16: 2020-04-04 4182 4585 2020-04-06 4628 5 179 4318
#> 17: 2020-04-05 3864 4805 2020-04-06 4610 7 237 4210
#> 18: 2020-04-06 2420 4316 2020-04-06 6107 12 373 5480
#> 19: 2020-04-03 4653 4668 2020-04-11 4695 0 24 4656
#> 20: 2020-04-04 4554 4585 2020-04-11 4623 1 39 4560
#> 21: 2020-04-05 4741 4805 2020-04-11 4850 1 59 4753
#> 22: 2020-04-06 4208 4316 2020-04-11 4348 2 72 4227
#> 23: 2020-04-07 3431 3599 2020-04-11 3594 2 80 3458
#> 24: 2020-04-08 2779 3039 2020-04-11 2971 2 87 2821
#> 25: 2020-04-09 3234 3836 2020-04-11 3579 4 138 3339
#> 26: 2020-04-10 2993 4204 2020-04-11 3571 6 183 3261
#> 27: 2020-04-11 1848 3951 2020-04-11 4663 9 285 4185
#> 28: 2020-04-08 3036 3039 2020-04-16 3063 0 15 3038
#> 29: 2020-04-09 3829 3836 2020-04-16 3887 1 32 3834
#> 30: 2020-04-10 4187 4204 2020-04-16 4283 1 52 4197
#> 31: 2020-04-11 3916 3951 2020-04-16 4046 2 67 3934
#> 32: 2020-04-12 4605 4686 2020-04-16 4824 3 107 4641
#> 33: 2020-04-13 3925 4074 2020-04-16 4197 4 123 3984
#> 34: 2020-04-14 2873 3124 2020-04-16 3179 4 123 2966
#> 35: 2020-04-15 2393 2919 2020-04-16 2855 4 146 2607
#> 36: 2020-04-16 1512 2580 2020-04-16 3815 7 233 3424
#> 37: 2020-04-13 4074 4074 2020-04-21 4111 0 21 4077
#> 38: 2020-04-14 3124 3124 2020-04-21 3171 0 26 3128
#> 39: 2020-04-15 2919 2919 2020-04-21 2986 1 36 2926
#> 40: 2020-04-16 2580 2580 2020-04-21 2666 1 44 2592
#> 41: 2020-04-17 3563 3563 2020-04-21 3733 2 83 3591
#> 42: 2020-04-18 3126 3126 2020-04-21 3343 3 98 3173
#> 43: 2020-04-19 2843 2843 2020-04-21 3146 4 121 2935
#> 44: 2020-04-20 2051 2051 2020-04-21 2447 4 125 2235
#> 45: 2020-04-21 962 962 2020-04-21 2427 4 148 2178
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> <int> <int> <int> <int> <int> <int>
#> 1: 5218 5229 5236 5244 5257 5285
#> 2: 5147 5166 5177 5190 5210 5255
#> 3: 6007 6039 6058 6078 6111 6186
#> 4: 5684 5726 5751 5776 5818 5913
#> 5: 5455 5509 5542 5573 5627 5743
#> 6: 4386 4441 4476 4509 4565 4684
#> 7: 2880 2927 2957 2984 3033 3128
#> 8: 1944 1985 2009 2034 2076 2157
#> 9: 0 0 0 0 0 0
#> 10: 6001 6014 6022 6031 6046 6078
#> 11: 5255 5274 5286 5299 5319 5365
#> 12: 4091 4113 4126 4140 4162 4213
#> 13: 4105 4135 4152 4171 4201 4270
#> 14: 4844 4893 4921 4949 4997 5100
#> 15: 4701 4759 4797 4832 4893 5020
#> 16: 4515 4588 4636 4678 4754 4903
#> 17: 4469 4563 4618 4676 4773 4960
#> 18: 5888 6034 6122 6212 6353 6664
#> 19: 4677 4687 4693 4701 4712 4737
#> 20: 4595 4611 4621 4632 4650 4690
#> 21: 4807 4833 4848 4864 4890 4951
#> 22: 4296 4328 4347 4366 4398 4469
#> 23: 3539 3574 3595 3615 3650 3725
#> 24: 2914 2950 2973 2995 3033 3111
#> 25: 3491 3548 3585 3617 3676 3791
#> 26: 3462 3534 3577 3621 3697 3842
#> 27: 4496 4608 4674 4744 4851 5089
#> 28: 3051 3058 3062 3067 3074 3091
#> 29: 3863 3877 3885 3895 3910 3944
#> 30: 4245 4268 4281 4296 4319 4372
#> 31: 3998 4028 4045 4063 4093 4159
#> 32: 4750 4797 4825 4852 4900 5000
#> 33: 4116 4167 4200 4231 4284 4395
#> 34: 3101 3152 3185 3214 3266 3368
#> 35: 2768 2826 2860 2895 2956 3071
#> 36: 3679 3770 3824 3881 3969 4163
#> 37: 4095 4104 4109 4116 4126 4148
#> 38: 3152 3163 3170 3177 3190 3217
#> 39: 2960 2975 2985 2995 3011 3048
#> 40: 2634 2653 2665 2677 2696 2740
#> 41: 3675 3711 3733 3754 3791 3869
#> 42: 3278 3319 3345 3369 3412 3500
#> 43: 3069 3119 3151 3180 3232 3333
#> 44: 2372 2422 2451 2482 2533 2632
#> 45: 2340 2398 2433 2469 2525 2649
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
options(old_opts)