Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used passed to the
truncation
argument in regional_epinow()
, epinow()
, and
estimate_infections()
to adjust for truncated data and propagate the
uncertainty associated with data truncation into the estimates.
See here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- data
A list of
<data.frame>
s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- truncation
A call to
trunc_opts()
defining the truncation of the observed data. Defaults totrunc_opts()
, i.e. no truncation. See theestimate_truncation()
help file for an approach to estimating this from data where thedist
list element returned byestimate_truncation()
is used as thetruncation
argument here, thereby propagating the uncertainty in the estimate.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- stan
A list of stan options as generated by
stan_opts()
. Defaults tostan_opts()
. Can be used to overridedata
,init
, andverbose
settings if desired.- CrIs
Numeric vector of credible intervals to calculate.
- filter_leading_zeros
Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
- zero_threshold
Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using
fill
.- weigh_delay_priors
Deprecated; use the
weight_prior
option intrunc_opts()
instead.- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling()
.- obs
Deprecated; use
data
instead.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), which could be passed to the truncation
argument
of epinow()
, regional_epinow()
, and estimate_infections()
, the
estimated CMF of the truncation distribution (cmf
, can be used to
adjusted new data), a <data.frame>
containing the observed truncated
data, latest observed data and the adjusted for
truncation observations (obs
), a <data.frame>
containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
verbose = interactive(),
chains = 2, iter = 2000
)
#> WARN [2025-01-13 16:02:24] estimate_truncation (chain: 1): There were 7 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2025-01-13 16:02:24] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#> -
# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1.9
#> sd:
#> 0.45
#> sdlog:
#> - normal distribution:
#> mean:
#> 2.2
#> sd:
#> 0.65
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1.0000000 2.996384e-18 1.241081e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9972401 4.119192e-05 1.519567e-03 0.9946467 0.9961854 0.9968822
#> 3: 3 0.9939038 8.772757e-05 3.253960e-03 0.9883974 0.9916302 0.9930961
#> 4: 4 0.9897809 1.408050e-04 5.257126e-03 0.9810246 0.9861410 0.9883933
#> 5: 5 0.9845412 2.019796e-04 7.603159e-03 0.9720983 0.9792543 0.9825026
#> 6: 6 0.9776325 2.732350e-04 1.039598e-02 0.9610044 0.9703445 0.9746933
#> 7: 7 0.9680494 3.568831e-04 1.378307e-02 0.9461513 0.9583565 0.9640005
#> 8: 8 0.9537233 4.545126e-04 1.796668e-02 0.9253513 0.9410345 0.9482091
#> 9: 9 0.9294879 5.604604e-04 2.314474e-02 0.8934592 0.9133065 0.9222476
#> 10: 10 0.8765754 6.184461e-04 2.874597e-02 0.8324576 0.8569271 0.8681280
#> 11: 11 0.4195982 3.465569e-04 1.642918e-02 0.3939804 0.4085308 0.4151351
#> median upper_20 upper_50 upper_90
#> <num> <num> <num> <num>
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9973130 0.9977331 0.9984000 0.9995982
#> 3: 0.9940481 0.9949375 0.9963661 0.9990372
#> 4: 0.9899182 0.9913838 0.9937054 0.9982257
#> 5: 0.9846382 0.9867015 0.9901427 0.9969796
#> 6: 0.9776255 0.9804983 0.9851524 0.9950013
#> 7: 0.9678872 0.9716190 0.9778215 0.9916590
#> 8: 0.9532789 0.9579658 0.9661197 0.9851498
#> 9: 0.9283641 0.9344218 0.9445515 0.9703486
#> 10: 0.8746851 0.8821058 0.8946402 0.9256043
#> 11: 0.4188815 0.4228498 0.4293740 0.4473664
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> <Date> <num> <num> <Date> <int> <int> <int> <int>
#> 1: 2020-03-24 4764 4789 2020-04-01 5244 0 27 5200
#> 2: 2020-03-25 5191 5249 2020-04-01 5182 1 40 5117
#> 3: 2020-03-26 5102 5210 2020-04-01 6060 1 64 5953
#> 4: 2020-03-27 5924 6153 2020-04-01 5751 2 81 5613
#> 5: 2020-03-28 5567 5959 2020-04-01 5547 2 104 5368
#> 6: 2020-03-29 5289 5974 2020-04-01 4503 2 111 4310
#> 7: 2020-03-30 4183 5217 2020-04-01 3046 2 99 2882
#> 8: 2020-03-31 2668 4050 2020-04-01 4012 3 155 3757
#> 9: 2020-04-01 1681 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6006 0 19 5975
#> 11: 2020-03-30 5209 5217 2020-04-06 5262 0 27 5218
#> 12: 2020-03-31 4035 4050 2020-04-06 4098 0 31 4047
#> 13: 2020-04-01 4020 4053 2020-04-06 4112 1 43 4040
#> 14: 2020-04-02 4697 4782 2020-04-06 4853 1 69 4736
#> 15: 2020-04-03 4483 4668 2020-04-06 4702 2 88 4550
#> 16: 2020-04-04 4182 4585 2020-04-06 4502 2 111 4309
#> 17: 2020-04-05 3864 4805 2020-04-06 4412 3 143 4174
#> 18: 2020-04-06 2420 4316 2020-04-06 5776 4 224 5409
#> 19: 2020-04-03 4653 4668 2020-04-11 4681 0 15 4657
#> 20: 2020-04-04 4554 4585 2020-04-11 4601 0 24 4562
#> 21: 2020-04-05 4741 4805 2020-04-11 4815 0 37 4755
#> 22: 2020-04-06 4208 4316 2020-04-11 4304 1 45 4229
#> 23: 2020-04-07 3431 3599 2020-04-11 3544 1 50 3459
#> 24: 2020-04-08 2779 3039 2020-04-11 2914 1 54 2820
#> 25: 2020-04-09 3234 3836 2020-04-11 3481 2 86 3332
#> 26: 2020-04-10 2993 4204 2020-04-11 3418 2 111 3233
#> 27: 2020-04-11 1848 3951 2020-04-11 4410 3 171 4130
#> 28: 2020-04-08 3036 3039 2020-04-16 3054 0 10 3038
#> 29: 2020-04-09 3829 3836 2020-04-16 3868 0 20 3835
#> 30: 2020-04-10 4187 4204 2020-04-16 4252 0 32 4199
#> 31: 2020-04-11 3916 3951 2020-04-16 4006 1 42 3935
#> 32: 2020-04-12 4605 4686 2020-04-16 4757 1 67 4643
#> 33: 2020-04-13 3925 4074 2020-04-16 4116 1 77 3984
#> 34: 2020-04-14 2873 3124 2020-04-16 3092 1 76 2960
#> 35: 2020-04-15 2393 2919 2020-04-16 2732 1 88 2585
#> 36: 2020-04-16 1512 2580 2020-04-16 3608 2 140 3379
#> 37: 2020-04-13 4074 4074 2020-04-21 4099 0 13 4077
#> 38: 2020-04-14 3124 3124 2020-04-21 3156 0 16 3129
#> 39: 2020-04-15 2919 2919 2020-04-21 2965 0 22 2927
#> 40: 2020-04-16 2580 2580 2020-04-21 2639 0 28 2592
#> 41: 2020-04-17 3563 3563 2020-04-21 3681 1 52 3592
#> 42: 2020-04-18 3126 3126 2020-04-21 3278 1 61 3173
#> 43: 2020-04-19 2843 2843 2020-04-21 3060 1 75 2929
#> 44: 2020-04-20 2051 2051 2020-04-21 2342 1 76 2215
#> 45: 2020-04-21 962 962 2020-04-21 2296 1 89 2150
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> <int> <int> <int> <int> <int> <int>
#> 1: 5223 5236 5243 5251 5263 5291
#> 2: 5152 5170 5181 5192 5210 5248
#> 3: 6013 6041 6059 6077 6105 6164
#> 4: 5693 5729 5751 5774 5808 5883
#> 5: 5474 5521 5548 5577 5620 5715
#> 6: 4428 4476 4505 4535 4580 4681
#> 7: 2982 3024 3050 3073 3113 3204
#> 8: 3915 3975 4013 4049 4114 4266
#> 9: 0 0 0 0 0 0
#> 10: 5991 6000 6005 6011 6020 6040
#> 11: 5241 5254 5262 5270 5282 5309
#> 12: 4075 4089 4097 4106 4120 4150
#> 13: 4080 4099 4112 4124 4142 4183
#> 14: 4803 4834 4852 4872 4901 4964
#> 15: 4640 4679 4702 4727 4763 4844
#> 16: 4427 4475 4504 4534 4578 4680
#> 17: 4319 4380 4417 4450 4509 4641
#> 18: 5636 5723 5777 5829 5923 6142
#> 19: 4669 4676 4680 4685 4692 4707
#> 20: 4582 4593 4600 4607 4618 4642
#> 21: 4788 4804 4814 4825 4841 4877
#> 22: 4271 4291 4304 4317 4336 4378
#> 23: 3508 3531 3544 3559 3580 3626
#> 24: 2876 2900 2915 2930 2953 3003
#> 25: 3423 3460 3483 3506 3540 3619
#> 26: 3345 3393 3421 3447 3492 3595
#> 27: 4303 4370 4411 4451 4523 4690
#> 28: 3047 3051 3054 3057 3061 3071
#> 29: 3853 3862 3867 3873 3882 3903
#> 30: 4228 4243 4252 4261 4275 4307
#> 31: 3975 3993 4005 4017 4035 4074
#> 32: 4709 4739 4757 4776 4805 4867
#> 33: 4062 4097 4117 4139 4170 4241
#> 34: 3041 3074 3094 3115 3145 3215
#> 35: 2674 2712 2735 2756 2792 2874
#> 36: 3521 3575 3609 3642 3701 3837
#> 37: 4088 4094 4098 4102 4108 4121
#> 38: 3143 3151 3155 3160 3167 3184
#> 39: 2948 2958 2964 2970 2980 3002
#> 40: 2618 2631 2639 2646 2658 2684
#> 41: 3643 3667 3681 3696 3717 3765
#> 42: 3235 3263 3279 3296 3321 3378
#> 43: 3009 3042 3062 3082 3112 3182
#> 44: 2292 2325 2344 2362 2393 2463
#> 45: 2240 2275 2296 2317 2354 2441
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
example_truncated[[5]],
truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmpB3yHY0/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmpB3yHY0/epinow/2020-04-21.log.
#> WARN [2025-01-13 16:02:25] epinow: ! No generation time distribution given.
#> ℹ Now using a fixed generation time of 1 day, i.e. the reproduction number is
#> the same as the daily growth rate.
#> ℹ If this was intended then this warning can be silenced by setting `dist =
#> Fixed(1)`'. -
#> WARN [2025-01-13 16:04:59] epinow: There were 34 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2025-01-13 16:04:59] epinow: Examine the pairs() plot to diagnose sampling problems
#> -
plot(out)
options(old_opts)
# }