Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used passed to the
truncation
argument in regional_epinow()
, epinow()
, and
estimate_infections()
to adjust for truncated data and propagate the
uncertainty associated with data truncation into the estimates.
See here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- data
A list of
<data.frame>
s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- truncation
A call to
trunc_opts()
defining the truncation of the observed data. Defaults totrunc_opts()
, i.e. no truncation. See theestimate_truncation()
help file for an approach to estimating this from data where thedist
list element returned byestimate_truncation()
is used as thetruncation
argument here, thereby propagating the uncertainty in the estimate.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- stan
A list of stan options as generated by
stan_opts()
. Defaults tostan_opts()
. Can be used to overridedata
,init
, andverbose
settings if desired.- CrIs
Numeric vector of credible intervals to calculate.
- filter_leading_zeros
Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
- zero_threshold
Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using
fill
.- weigh_delay_priors
Deprecated; use the
weight_prior
option intrunc_opts()
instead.- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling()
.- obs
Deprecated; use
data
instead.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), which could be passed to the truncation
argument
of epinow()
, regional_epinow()
, and estimate_infections()
, the
estimated CMF of the truncation distribution (cmf
, can be used to
adjusted new data), a <data.frame>
containing the observed truncated
data, latest observed data and the adjusted for
truncation observations (obs
), a <data.frame>
containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
verbose = interactive(),
chains = 2, iter = 2000
)
#> WARN [2024-07-25 13:42:58] estimate_truncation (chain: 1): There were 11 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2024-07-25 13:42:58] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#> -
# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1.5
#> sd:
#> 0.42
#> sdlog:
#> - normal distribution:
#> mean:
#> 2.3
#> sd:
#> 0.74
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1.0000000 2.978589e-18 1.285018e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9959462 6.497312e-05 2.369883e-03 0.9918610 0.9942452 0.9953981
#> 3: 3 0.9911081 1.381692e-04 5.049245e-03 0.9825195 0.9874871 0.9898980
#> 4: 4 0.9852131 2.215392e-04 8.112403e-03 0.9716009 0.9793982 0.9831825
#> 5: 5 0.9778386 3.175203e-04 1.166082e-02 0.9583967 0.9694376 0.9749067
#> 6: 6 0.9682883 4.294179e-04 1.583623e-02 0.9422300 0.9567850 0.9640496
#> 7: 7 0.9553150 5.622166e-04 2.083975e-02 0.9217175 0.9402150 0.9497443
#> 8: 8 0.9364028 7.219155e-04 2.694867e-02 0.8937583 0.9172530 0.9287781
#> 9: 9 0.9054164 9.106295e-04 3.442699e-02 0.8520255 0.8820322 0.8952061
#> 10: 10 0.8407753 1.074547e-03 4.215180e-02 0.7763458 0.8124745 0.8283208
#> 11: 11 0.3978766 5.905733e-04 2.367395e-02 0.3610141 0.3822660 0.3910367
#> median upper_20 upper_50 upper_90
#> <num> <num> <num> <num>
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9961208 0.9967760 0.9977140 0.9996405
#> 3: 0.9914047 0.9928148 0.9948491 0.9991301
#> 4: 0.9856095 0.9877776 0.9911060 0.9983564
#> 5: 0.9782034 0.9813310 0.9862180 0.9971823
#> 6: 0.9684357 0.9725519 0.9793960 0.9952651
#> 7: 0.9552000 0.9602784 0.9695843 0.9917761
#> 8: 0.9355103 0.9419886 0.9540390 0.9844180
#> 9: 0.9031839 0.9109961 0.9259069 0.9680421
#> 10: 0.8369670 0.8477014 0.8651704 0.9165417
#> 11: 0.3960778 0.4018679 0.4111110 0.4394924
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> <Date> <num> <num> <Date> <int> <int> <int> <int>
#> 1: 2020-03-24 4764 4789 2020-04-01 5237 0 26 5195
#> 2: 2020-03-25 5191 5249 2020-04-01 5178 1 42 5110
#> 3: 2020-03-26 5102 5210 2020-04-01 6059 1 72 5940
#> 4: 2020-03-27 5924 6153 2020-04-01 5750 2 94 5593
#> 5: 2020-03-28 5567 5959 2020-04-01 5539 3 120 5332
#> 6: 2020-03-29 5289 5974 2020-04-01 4470 3 128 4249
#> 7: 2020-03-30 4183 5217 2020-04-01 2950 2 111 2756
#> 8: 2020-03-31 2668 4050 2020-04-01 2004 2 98 1834
#> 9: 2020-04-01 1681 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6023 0 30 5975
#> 11: 2020-03-30 5209 5217 2020-04-06 5287 1 43 5217
#> 12: 2020-03-31 4035 4050 2020-04-06 4127 1 49 4046
#> 13: 2020-04-01 4020 4053 2020-04-06 4152 1 68 4039
#> 14: 2020-04-02 4697 4782 2020-04-06 4919 2 107 4735
#> 15: 2020-04-03 4483 4668 2020-04-06 4791 3 137 4553
#> 16: 2020-04-04 4182 4585 2020-04-06 4625 4 174 4320
#> 17: 2020-04-05 3864 4805 2020-04-06 4607 5 227 4215
#> 18: 2020-04-06 2420 4316 2020-04-06 6103 8 356 5506
#> 19: 2020-04-03 4653 4668 2020-04-11 4694 0 23 4657
#> 20: 2020-04-04 4554 4585 2020-04-11 4622 1 38 4561
#> 21: 2020-04-05 4741 4805 2020-04-11 4849 1 57 4754
#> 22: 2020-04-06 4208 4316 2020-04-11 4346 1 71 4228
#> 23: 2020-04-07 3431 3599 2020-04-11 3593 2 78 3459
#> 24: 2020-04-08 2779 3039 2020-04-11 2970 2 85 2822
#> 25: 2020-04-09 3234 3836 2020-04-11 3576 3 134 3340
#> 26: 2020-04-10 2993 4204 2020-04-11 3568 4 176 3265
#> 27: 2020-04-11 1848 3951 2020-04-11 4660 6 271 4204
#> 28: 2020-04-08 3036 3039 2020-04-16 3063 0 15 3038
#> 29: 2020-04-09 3829 3836 2020-04-16 3886 0 32 3835
#> 30: 2020-04-10 4187 4204 2020-04-16 4282 1 51 4198
#> 31: 2020-04-11 3916 3951 2020-04-16 4045 1 66 3934
#> 32: 2020-04-12 4605 4686 2020-04-16 4822 2 105 4643
#> 33: 2020-04-13 3925 4074 2020-04-16 4195 3 120 3987
#> 34: 2020-04-14 2873 3124 2020-04-16 3177 3 119 2967
#> 35: 2020-04-15 2393 2919 2020-04-16 2853 3 140 2610
#> 36: 2020-04-16 1512 2580 2020-04-16 3813 5 222 3440
#> 37: 2020-04-13 4074 4074 2020-04-21 4110 0 20 4077
#> 38: 2020-04-14 3124 3124 2020-04-21 3171 0 26 3129
#> 39: 2020-04-15 2919 2919 2020-04-21 2985 0 35 2927
#> 40: 2020-04-16 2580 2580 2020-04-21 2665 1 43 2592
#> 41: 2020-04-17 3563 3563 2020-04-21 3731 2 81 3592
#> 42: 2020-04-18 3126 3126 2020-04-21 3341 2 95 3175
#> 43: 2020-04-19 2843 2843 2020-04-21 3144 3 118 2936
#> 44: 2020-04-20 2051 2051 2020-04-21 2445 3 120 2237
#> 45: 2020-04-21 962 962 2020-04-21 2426 3 141 2188
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> <int> <int> <int> <int> <int> <int>
#> 1: 5217 5228 5236 5243 5256 5283
#> 2: 5147 5165 5176 5189 5209 5251
#> 3: 6006 6036 6056 6076 6110 6181
#> 4: 5684 5724 5748 5774 5818 5908
#> 5: 5454 5507 5537 5568 5625 5738
#> 6: 4384 4440 4471 4503 4560 4680
#> 7: 2881 2928 2953 2980 3024 3131
#> 8: 1942 1983 2008 2029 2068 2165
#> 9: 0 0 0 0 0 0
#> 10: 6000 6013 6021 6030 6045 6076
#> 11: 5255 5273 5285 5298 5318 5361
#> 12: 4091 4111 4124 4138 4162 4210
#> 13: 4104 4133 4151 4169 4201 4266
#> 14: 4844 4891 4917 4945 4995 5095
#> 15: 4698 4759 4792 4826 4887 5015
#> 16: 4516 4590 4630 4671 4741 4908
#> 17: 4466 4558 4616 4664 4755 4977
#> 18: 5886 6021 6109 6188 6330 6703
#> 19: 4677 4686 4693 4700 4711 4735
#> 20: 4594 4610 4620 4631 4649 4687
#> 21: 4807 4831 4846 4863 4890 4946
#> 22: 4296 4326 4345 4364 4398 4466
#> 23: 3538 3572 3591 3612 3649 3722
#> 24: 2912 2950 2970 2992 3029 3109
#> 25: 3492 3549 3580 3612 3666 3795
#> 26: 3459 3530 3576 3613 3683 3855
#> 27: 4495 4598 4665 4725 4834 5118
#> 28: 3051 3057 3062 3066 3074 3090
#> 29: 3863 3876 3884 3894 3909 3940
#> 30: 4245 4266 4280 4294 4318 4368
#> 31: 3998 4026 4043 4062 4092 4156
#> 32: 4749 4795 4820 4848 4897 4996
#> 33: 4114 4166 4195 4225 4279 4391
#> 34: 3102 3153 3180 3209 3257 3371
#> 35: 2765 2822 2859 2888 2945 3082
#> 36: 3677 3762 3817 3866 3955 4188
#> 37: 4095 4103 4109 4115 4125 4146
#> 38: 3152 3162 3169 3177 3189 3215
#> 39: 2959 2974 2984 2994 3011 3045
#> 40: 2634 2652 2664 2676 2696 2738
#> 41: 3674 3710 3730 3751 3789 3865
#> 42: 3276 3318 3341 3365 3408 3497
#> 43: 3070 3120 3147 3175 3223 3336
#> 44: 2370 2419 2450 2476 2524 2641
#> 45: 2340 2393 2428 2460 2516 2664
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
example_truncated[[5]],
truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the EpiNow2 logger
#> Writing EpiNow2 logs to the console and: /tmp/RtmpndnyWS/regional-epinow/2020-04-21.log
#> Logging threshold set at INFO for the EpiNow2.epinow logger
#> Writing EpiNow2.epinow logs to the console and: /tmp/RtmpndnyWS/epinow/2020-04-21.log
#> WARN [2024-07-25 13:43:00] epinow: No generation time distribution given. Assuming a fixed generation time of 1 day, i.e. the reproduction number is the same as the daily growth rate. If this was intended then this warning can be silenced by setting `dist` explicitly to `Fixed(1)`. - gt_opts
#> WARN [2024-07-25 13:47:31] epinow: There were 45 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2024-07-25 13:47:31] epinow: Examine the pairs() plot to diagnose sampling problems
#> -
plot(out)
options(old_opts)
# }