Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used passed to the
truncation argument in regional_epinow(), epinow(), and
estimate_infections() to adjust for truncated data and propagate the
uncertainty associated with data truncation into the estimates.
See here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- data
A list of
<data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- truncation
A call to
trunc_opts()defining the truncation of the observed data. Defaults totrunc_opts(), i.e. no truncation. See theestimate_truncation()help file for an approach to estimating this from data where thedistlist element returned byestimate_truncation()is used as thetruncationargument here, thereby propagating the uncertainty in the estimate.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- stan
A list of stan options as generated by
stan_opts(). Defaults tostan_opts(). Can be used to overridedata,init, andverbosesettings if desired.- CrIs
Numeric vector of credible intervals to calculate.
- filter_leading_zeros
Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
- zero_threshold
Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using
fill.- weigh_delay_priors
Deprecated; use the
weight_prioroption intrunc_opts()instead.- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling().- obs
Deprecated; use
datainstead.
Value
A list containing: the summary parameters of the truncation
distribution (dist), which could be passed to the truncation argument
of epinow(), regional_epinow(), and estimate_infections(), the
estimated CMF of the truncation distribution (cmf, can be used to
adjusted new data), a <data.frame> containing the observed truncated
data, latest observed data and the adjusted for
truncation observations (obs), a <data.frame> containing the last
observed data (last_obs, useful for plotting and validation), the data
used for fitting (data) and the fit object (fit).
Examples
# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
verbose = interactive(),
chains = 2, iter = 2000
)
#> WARN [2025-12-19 17:45:34] estimate_truncation (chain: 1): There were 11 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2025-12-19 17:45:34] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#> -
# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1.9
#> sd:
#> 0.46
#> sdlog:
#> - normal distribution:
#> mean:
#> 2.2
#> sd:
#> 0.71
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1.0000000 2.944730e-18 1.281895e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9972605 4.867348e-05 1.585199e-03 0.9945104 0.9962054 0.9969804
#> 3: 3 0.9939538 1.040728e-04 3.390517e-03 0.9881623 0.9916583 0.9932890
#> 4: 4 0.9898733 1.679187e-04 5.471613e-03 0.9806566 0.9861976 0.9887057
#> 5: 5 0.9846952 2.426087e-04 7.905554e-03 0.9714968 0.9794354 0.9829350
#> 6: 6 0.9778773 3.336925e-04 1.080196e-02 0.9600425 0.9706240 0.9753013
#> 7: 7 0.9684321 4.483195e-04 1.432055e-02 0.9454541 0.9587650 0.9651087
#> 8: 8 0.9543251 5.938731e-04 1.869399e-02 0.9252447 0.9417461 0.9496416
#> 9: 9 0.9304576 7.603693e-04 2.420187e-02 0.8942182 0.9145460 0.9236810
#> 10: 10 0.8779849 8.008940e-04 3.000521e-02 0.8336619 0.8585523 0.8694527
#> 11: 11 0.4204066 4.108028e-04 1.713876e-02 0.3953593 0.4091003 0.4154806
#> median upper_20 upper_50 upper_90
#> <num> <num> <num> <num>
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9973771 0.9977285 0.9983844 0.9997465
#> 3: 0.9941482 0.9949089 0.9963326 0.9993745
#> 4: 0.9901333 0.9913502 0.9936342 0.9988211
#> 5: 0.9849411 0.9866850 0.9900152 0.9979098
#> 6: 0.9779266 0.9803808 0.9850079 0.9965054
#> 7: 0.9682469 0.9714666 0.9773885 0.9938562
#> 8: 0.9536902 0.9577404 0.9653897 0.9883349
#> 9: 0.9288236 0.9344847 0.9434331 0.9756384
#> 10: 0.8758597 0.8819218 0.8932112 0.9344423
#> 11: 0.4189786 0.4226764 0.4293971 0.4523016
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> <Date> <num> <num> <Date> <int> <int> <int> <int>
#> 1: 2020-03-24 4764 4789 2020-04-01 5244 0 29 5197
#> 2: 2020-03-25 5191 5249 2020-04-01 5181 1 41 5112
#> 3: 2020-03-26 5102 5210 2020-04-01 6058 2 67 5944
#> 4: 2020-03-27 5924 6153 2020-04-01 5749 2 85 5601
#> 5: 2020-03-28 5567 5959 2020-04-01 5544 3 108 5351
#> 6: 2020-03-29 5289 5974 2020-04-01 4498 3 115 4287
#> 7: 2020-03-30 4183 5217 2020-04-01 3042 2 102 2855
#> 8: 2020-03-31 2668 4050 2020-04-01 4005 3 159 3716
#> 9: 2020-04-01 1681 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6006 0 20 5973
#> 11: 2020-03-30 5209 5217 2020-04-06 5262 0 29 5215
#> 12: 2020-03-31 4035 4050 2020-04-06 4097 1 32 4043
#> 13: 2020-04-01 4020 4053 2020-04-06 4111 1 45 4034
#> 14: 2020-04-02 4697 4782 2020-04-06 4851 2 71 4726
#> 15: 2020-04-03 4483 4668 2020-04-06 4699 2 91 4535
#> 16: 2020-04-04 4182 4585 2020-04-06 4497 3 115 4286
#> 17: 2020-04-05 3864 4805 2020-04-06 4406 3 147 4135
#> 18: 2020-04-06 2420 4316 2020-04-06 5765 5 230 5350
#> 19: 2020-04-03 4653 4668 2020-04-11 4681 0 15 4655
#> 20: 2020-04-04 4554 4585 2020-04-11 4600 0 25 4559
#> 21: 2020-04-05 4741 4805 2020-04-11 4814 1 38 4750
#> 22: 2020-04-06 4208 4316 2020-04-11 4303 1 47 4222
#> 23: 2020-04-07 3431 3599 2020-04-11 3543 1 52 3452
#> 24: 2020-04-08 2779 3039 2020-04-11 2913 1 56 2811
#> 25: 2020-04-09 3234 3836 2020-04-11 3478 2 89 3314
#> 26: 2020-04-10 2993 4204 2020-04-11 3412 2 114 3202
#> 27: 2020-04-11 1848 3951 2020-04-11 4402 4 175 4085
#> 28: 2020-04-08 3036 3039 2020-04-16 3054 0 10 3037
#> 29: 2020-04-09 3829 3836 2020-04-16 3868 0 21 3833
#> 30: 2020-04-10 4187 4204 2020-04-16 4252 1 34 4195
#> 31: 2020-04-11 3916 3951 2020-04-16 4005 1 44 3929
#> 32: 2020-04-12 4605 4686 2020-04-16 4756 2 70 4633
#> 33: 2020-04-13 3925 4074 2020-04-16 4114 2 80 3971
#> 34: 2020-04-14 2873 3124 2020-04-16 3089 2 79 2944
#> 35: 2020-04-15 2393 2919 2020-04-16 2728 2 91 2560
#> 36: 2020-04-16 1512 2580 2020-04-16 3602 3 143 3342
#> 37: 2020-04-13 4074 4074 2020-04-21 4098 0 14 4076
#> 38: 2020-04-14 3124 3124 2020-04-21 3156 0 17 3127
#> 39: 2020-04-15 2919 2919 2020-04-21 2964 0 23 2925
#> 40: 2020-04-16 2580 2580 2020-04-21 2638 0 29 2589
#> 41: 2020-04-17 3563 3563 2020-04-21 3679 1 54 3585
#> 42: 2020-04-18 3126 3126 2020-04-21 3276 1 64 3162
#> 43: 2020-04-19 2843 2843 2020-04-21 3057 2 78 2913
#> 44: 2020-04-20 2051 2051 2020-04-21 2338 2 78 2194
#> 45: 2020-04-21 962 962 2020-04-21 2291 2 91 2126
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> <int> <int> <int> <int> <int> <int>
#> 1: 5224 5236 5242 5250 5263 5293
#> 2: 5153 5170 5180 5190 5209 5251
#> 3: 6014 6042 6057 6074 6103 6170
#> 4: 5695 5730 5749 5768 5806 5888
#> 5: 5478 5522 5545 5569 5616 5716
#> 6: 4433 4476 4503 4528 4573 4677
#> 7: 2986 3025 3046 3068 3107 3200
#> 8: 3914 3977 4012 4045 4109 4251
#> 9: 0 0 0 0 0 0
#> 10: 5991 6000 6005 6010 6020 6041
#> 11: 5242 5254 5260 5268 5281 5311
#> 12: 4075 4089 4096 4105 4119 4153
#> 13: 4081 4100 4110 4121 4141 4187
#> 14: 4805 4834 4851 4866 4899 4967
#> 15: 4643 4680 4700 4720 4760 4845
#> 16: 4432 4475 4502 4527 4572 4676
#> 17: 4325 4381 4411 4444 4500 4634
#> 18: 5635 5725 5775 5824 5915 6121
#> 19: 4670 4676 4680 4684 4692 4708
#> 20: 4583 4593 4599 4606 4617 4643
#> 21: 4788 4804 4813 4823 4840 4880
#> 22: 4272 4292 4302 4314 4335 4383
#> 23: 3510 3531 3543 3555 3578 3628
#> 24: 2878 2901 2913 2926 2950 3003
#> 25: 3427 3460 3481 3501 3536 3616
#> 26: 3350 3393 3417 3442 3486 3590
#> 27: 4303 4372 4410 4447 4517 4674
#> 28: 3047 3051 3053 3056 3061 3072
#> 29: 3853 3862 3867 3872 3882 3904
#> 30: 4229 4243 4251 4259 4274 4309
#> 31: 3975 3994 4004 4015 4034 4078
#> 32: 4711 4740 4756 4771 4803 4870
#> 33: 4065 4098 4115 4133 4167 4242
#> 34: 3045 3074 3093 3110 3141 3212
#> 35: 2679 2713 2732 2752 2787 2870
#> 36: 3521 3577 3608 3639 3695 3824
#> 37: 4088 4094 4097 4101 4108 4122
#> 38: 3144 3151 3155 3159 3167 3185
#> 39: 2948 2958 2963 2969 2980 3004
#> 40: 2619 2631 2638 2645 2658 2687
#> 41: 3645 3667 3679 3691 3716 3768
#> 42: 3238 3263 3277 3291 3319 3378
#> 43: 3013 3042 3060 3077 3108 3179
#> 44: 2296 2325 2341 2358 2388 2460
#> 45: 2240 2275 2296 2315 2351 2433
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
#> Ignoring unknown labels:
#> • fill : "Type"
# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
generation_time = generation_time_opts(example_generation_time),
example_truncated[[5]],
truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmpPIfNEV/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmpPIfNEV/epinow/2020-04-21.log.
#> WARN [2025-12-19 17:48:18] epinow: There were 2 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2025-12-19 17:48:18] epinow: Examine the pairs() plot to diagnose sampling problems
#> -
plot(out)
options(old_opts)
# }
