Estimates a truncation distribution from multiple snapshots of the same
data source over time. This distribution can then be used passed to the
truncation
argument in regional_epinow()
, epinow()
, and
estimate_infections()
to adjust for truncated data and propagate the
uncertainty associated with data truncation into the estimates.
See here
for an example of using this approach on Covid-19 data in England. The
functionality offered by this function is now available in a more principled
manner in the epinowcast
R package.
The model of truncation is as follows:
The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.
This model is then fit using stan
with standard normal, or half normal,
prior for the mean, standard deviation, 1 over the square root of the
overdispersion and additive noise term.
This approach assumes that:
Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.
Arguments
- data
A list of
<data.frame>
s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.- truncation
A call to
trunc_opts()
defining the truncation of the observed data. Defaults totrunc_opts()
, i.e. no truncation. See theestimate_truncation()
help file for an approach to estimating this from data where thedist
list element returned byestimate_truncation()
is used as thetruncation
argument here, thereby propagating the uncertainty in the estimate.- model
A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
- stan
A list of stan options as generated by
stan_opts()
. Defaults tostan_opts()
. Can be used to overridedata
,init
, andverbose
settings if desired.- CrIs
Numeric vector of credible intervals to calculate.
- filter_leading_zeros
Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
- zero_threshold
Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using
fill
.- weigh_delay_priors
Deprecated; use the
weight_prior
option intrunc_opts()
instead.- verbose
Logical, should model fitting progress be returned.
- ...
Additional parameters to pass to
rstan::sampling()
.- obs
Deprecated; use
data
instead.
Value
A list containing: the summary parameters of the truncation
distribution (dist
), which could be passed to the truncation
argument
of epinow()
, regional_epinow()
, and estimate_infections()
, the
estimated CMF of the truncation distribution (cmf
, can be used to
adjusted new data), a <data.frame>
containing the observed truncated
data, latest observed data and the adjusted for
truncation observations (obs
), a <data.frame>
containing the last
observed data (last_obs
, useful for plotting and validation), the data
used for fitting (data
) and the fit object (fit
).
Examples
# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))
# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
verbose = interactive(),
chains = 2, iter = 2000
)
#> WARN [2024-12-20 21:41:04] estimate_truncation (chain: 1): There were 10 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2024-12-20 21:41:04] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#> -
# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#> meanlog:
#> - normal distribution:
#> mean:
#> -1.9
#> sd:
#> 0.46
#> sdlog:
#> - normal distribution:
#> mean:
#> 2.2
#> sd:
#> 0.66
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#> index mean se_mean sd lower_90 lower_50 lower_20
#> <int> <num> <num> <num> <num> <num> <num>
#> 1: 1 1.0000000 2.868849e-18 1.273934e-16 1.0000000 1.0000000 1.0000000
#> 2: 2 0.9972753 4.514903e-05 1.509180e-03 0.9945940 0.9962185 0.9969473
#> 3: 3 0.9939811 9.660325e-05 3.228858e-03 0.9882674 0.9916965 0.9932484
#> 4: 4 0.9899092 1.558923e-04 5.210631e-03 0.9807389 0.9861973 0.9887055
#> 5: 5 0.9847324 2.250755e-04 7.524563e-03 0.9718232 0.9793838 0.9829247
#> 6: 6 0.9779032 3.069497e-04 1.026728e-02 0.9604877 0.9707168 0.9753269
#> 7: 7 0.9684230 4.052489e-04 1.357207e-02 0.9458148 0.9590183 0.9648448
#> 8: 8 0.9542338 5.239514e-04 1.761111e-02 0.9255068 0.9421863 0.9493498
#> 9: 9 0.9301862 6.599340e-04 2.250936e-02 0.8944398 0.9149302 0.9241066
#> 10: 10 0.8775009 7.111290e-04 2.747865e-02 0.8351377 0.8587945 0.8699125
#> 11: 11 0.4201200 3.822738e-04 1.565393e-02 0.3965888 0.4095389 0.4158845
#> median upper_20 upper_50 upper_90
#> <num> <num> <num> <num>
#> 1: 1.0000000 1.0000000 1.0000000 1.0000000
#> 2: 0.9973828 0.9977871 0.9984367 0.9995866
#> 3: 0.9941755 0.9950177 0.9964247 0.9989926
#> 4: 0.9901591 0.9915443 0.9938176 0.9981254
#> 5: 0.9849896 0.9870101 0.9902510 0.9967929
#> 6: 0.9780120 0.9807912 0.9852898 0.9946402
#> 7: 0.9683102 0.9719617 0.9778383 0.9908507
#> 8: 0.9538794 0.9584895 0.9660560 0.9843506
#> 9: 0.9291598 0.9348722 0.9442827 0.9685577
#> 10: 0.8761752 0.8827219 0.8941617 0.9228567
#> 11: 0.4193444 0.4230904 0.4295957 0.4459676
# observations linked to truncation adjusted estimates
print(est$obs)
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> <Date> <num> <num> <Date> <int> <int> <int> <int>
#> 1: 2020-03-24 4764 4789 2020-04-01 5244 0 27 5200
#> 2: 2020-03-25 5191 5249 2020-04-01 5181 1 39 5118
#> 3: 2020-03-26 5102 5210 2020-04-01 6058 1 63 5955
#> 4: 2020-03-27 5924 6153 2020-04-01 5749 2 80 5618
#> 5: 2020-03-28 5567 5959 2020-04-01 5544 3 102 5373
#> 6: 2020-03-29 5289 5974 2020-04-01 4499 3 108 4318
#> 7: 2020-03-30 4183 5217 2020-04-01 3043 2 94 2891
#> 8: 2020-03-31 2668 4050 2020-04-01 4006 3 147 3769
#> 9: 2020-04-01 1681 4053 2020-04-01 0 NA 0 0
#> 10: 2020-03-29 5970 5974 2020-04-06 6006 0 19 5976
#> 11: 2020-03-30 5209 5217 2020-04-06 5262 0 27 5218
#> 12: 2020-03-31 4035 4050 2020-04-06 4097 0 31 4047
#> 13: 2020-04-01 4020 4053 2020-04-06 4111 1 43 4041
#> 14: 2020-04-02 4697 4782 2020-04-06 4851 2 68 4740
#> 15: 2020-04-03 4483 4668 2020-04-06 4699 2 86 4554
#> 16: 2020-04-04 4182 4585 2020-04-06 4498 3 108 4317
#> 17: 2020-04-05 3864 4805 2020-04-06 4407 3 136 4186
#> 18: 2020-04-06 2420 4316 2020-04-06 5768 5 212 5426
#> 19: 2020-04-03 4653 4668 2020-04-11 4681 0 15 4657
#> 20: 2020-04-04 4554 4585 2020-04-11 4600 0 24 4562
#> 21: 2020-04-05 4741 4805 2020-04-11 4814 1 36 4756
#> 22: 2020-04-06 4208 4316 2020-04-11 4303 1 45 4230
#> 23: 2020-04-07 3431 3599 2020-04-11 3543 1 49 3462
#> 24: 2020-04-08 2779 3039 2020-04-11 2913 1 53 2823
#> 25: 2020-04-09 3234 3836 2020-04-11 3478 2 83 3338
#> 26: 2020-04-10 2993 4204 2020-04-11 3414 2 106 3243
#> 27: 2020-04-11 1848 3951 2020-04-11 4404 3 162 4143
#> 28: 2020-04-08 3036 3039 2020-04-16 3054 0 9 3039
#> 29: 2020-04-09 3829 3836 2020-04-16 3868 0 20 3836
#> 30: 2020-04-10 4187 4204 2020-04-16 4252 0 32 4200
#> 31: 2020-04-11 3916 3951 2020-04-16 4004 1 42 3937
#> 32: 2020-04-12 4605 4686 2020-04-16 4756 1 66 4647
#> 33: 2020-04-13 3925 4074 2020-04-16 4114 2 75 3987
#> 34: 2020-04-14 2873 3124 2020-04-16 3090 2 74 2966
#> 35: 2020-04-15 2393 2919 2020-04-16 2729 2 84 2593
#> 36: 2020-04-16 1512 2580 2020-04-16 3603 3 132 3390
#> 37: 2020-04-13 4074 4074 2020-04-21 4098 0 13 4078
#> 38: 2020-04-14 3124 3124 2020-04-21 3155 0 16 3129
#> 39: 2020-04-15 2919 2919 2020-04-21 2964 0 22 2928
#> 40: 2020-04-16 2580 2580 2020-04-21 2638 0 27 2593
#> 41: 2020-04-17 3563 3563 2020-04-21 3679 1 51 3595
#> 42: 2020-04-18 3126 3126 2020-04-21 3277 1 60 3175
#> 43: 2020-04-19 2843 2843 2020-04-21 3058 2 73 2935
#> 44: 2020-04-20 2051 2051 2020-04-21 2339 1 72 2222
#> 45: 2020-04-21 962 962 2020-04-21 2292 1 84 2157
#> date confirm last_confirm report_date mean se_mean sd lower_90
#> lower_50 lower_20 median upper_20 upper_50 upper_90
#> <int> <int> <int> <int> <int> <int>
#> 1: 5223 5235 5242 5250 5263 5292
#> 2: 5152 5169 5179 5190 5209 5249
#> 3: 6012 6040 6057 6073 6102 6167
#> 4: 5693 5727 5749 5769 5804 5885
#> 5: 5474 5518 5544 5571 5613 5714
#> 6: 4429 4474 4501 4526 4571 4676
#> 7: 2983 3022 3045 3066 3106 3194
#> 8: 3912 3973 4008 4041 4104 4238
#> 9: 0 0 0 0 0 0
#> 10: 5991 5999 6004 6010 6019 6040
#> 11: 5241 5253 5260 5268 5281 5311
#> 12: 4074 4088 4096 4105 4119 4151
#> 13: 4080 4098 4110 4121 4141 4185
#> 14: 4803 4832 4850 4868 4897 4966
#> 15: 4640 4677 4699 4722 4758 4843
#> 16: 4428 4473 4500 4525 4570 4675
#> 17: 4321 4377 4410 4441 4499 4626
#> 18: 5633 5719 5770 5818 5909 6102
#> 19: 4669 4676 4680 4684 4691 4708
#> 20: 4582 4592 4599 4606 4617 4643
#> 21: 4787 4803 4813 4823 4840 4878
#> 22: 4270 4290 4302 4314 4334 4381
#> 23: 3508 3529 3543 3556 3577 3627
#> 24: 2876 2899 2913 2927 2949 3002
#> 25: 3424 3459 3480 3499 3534 3615
#> 26: 3347 3390 3415 3440 3485 3583
#> 27: 4301 4367 4406 4443 4512 4659
#> 28: 3046 3051 3053 3056 3061 3072
#> 29: 3852 3861 3867 3872 3882 3904
#> 30: 4228 4242 4250 4259 4275 4308
#> 31: 3974 3992 4004 4015 4034 4077
#> 32: 4709 4737 4755 4772 4801 4868
#> 33: 4062 4094 4114 4134 4165 4240
#> 34: 3042 3073 3092 3108 3140 3212
#> 35: 2676 2710 2731 2750 2786 2865
#> 36: 3519 3573 3605 3635 3691 3812
#> 37: 4088 4094 4097 4101 4108 4122
#> 38: 3143 3150 3155 3159 3167 3185
#> 39: 2947 2957 2963 2969 2980 3003
#> 40: 2618 2630 2638 2645 2657 2686
#> 41: 3643 3665 3679 3692 3715 3767
#> 42: 3235 3261 3277 3292 3317 3377
#> 43: 3010 3041 3059 3076 3107 3178
#> 44: 2293 2323 2340 2357 2388 2455
#> 45: 2239 2273 2294 2313 2348 2425
#> lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
example_truncated[[5]],
truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmplSYXcX/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmplSYXcX/epinow/2020-04-21.log.
#> WARN [2024-12-20 21:41:05] epinow: ! No generation time distribution given.
#> ℹ Now using a fixed generation time of 1 day, i.e. the reproduction number is
#> the same as the daily growth rate.
#> ℹ If this was intended then this warning can be silenced by setting `dist =
#> Fixed(1)`'. -
#> WARN [2024-12-20 21:43:18] epinow: There were 16 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. -
#> WARN [2024-12-20 21:43:18] epinow: Examine the pairs() plot to diagnose sampling problems
#> -
plot(out)
options(old_opts)
# }