Estimate Truncation of Observed Data — estimate

Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data: A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.
truncation: A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.
model: A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
stan: A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.
CrIs: Numeric vector of credible intervals to calculate.
filter_leading_zeros: Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
zero_threshold: Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.
weigh_delay_priors: Deprecated; use the weight_prior option in trunc_opts() instead.
verbose: Logical, should model fitting progress be returned.
...: Additional parameters to pass to rstan::sampling().
obs: Deprecated; use data instead.

Value

A list containing: the summary parameters of the truncation distribution (dist), which could be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections(), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2025-07-13 16:43:09] estimate_truncation (chain: 1): There were 12 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-07-13 16:43:09] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.9
#>       sd:
#>         0.43
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.2
#>       sd:
#>         0.67
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.801529e-18 1.256388e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9971742 4.669561e-05 1.552579e-03 0.9945283 0.9961459 0.9968000
#>  3:     3 0.9937658 1.000945e-04 3.321900e-03 0.9881314 0.9915418 0.9929070
#>  4:     4 0.9895641 1.619024e-04 5.362306e-03 0.9805792 0.9858907 0.9881376
#>  5:     5 0.9842390 2.344679e-04 7.748545e-03 0.9713916 0.9788712 0.9820775
#>  6:     6 0.9772399 3.210931e-04 1.058578e-02 0.9599252 0.9699245 0.9742254
#>  7:     7 0.9675667 4.264622e-04 1.402416e-02 0.9451067 0.9576852 0.9633730
#>  8:     8 0.9531682 5.567981e-04 1.827261e-02 0.9246759 0.9401235 0.9472359
#>  9:     9 0.9289346 7.196446e-04 2.354254e-02 0.8933593 0.9125719 0.9212972
#> 10:    10 0.8762642 7.678436e-04 2.907679e-02 0.8330842 0.8569209 0.8664747
#> 11:    11 0.4195485 3.993806e-04 1.657050e-02 0.3952905 0.4084131 0.4143418
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9972564 0.9976576 0.9983536 0.9996674
#>  3: 0.9939005 0.9947529 0.9962641 0.9991707
#>  4: 0.9897026 0.9910789 0.9935417 0.9984056
#>  5: 0.9843023 0.9863243 0.9899002 0.9972676
#>  6: 0.9771733 0.9799593 0.9848604 0.9954110
#>  7: 0.9670746 0.9708498 0.9773486 0.9921883
#>  8: 0.9521348 0.9571272 0.9653540 0.9859633
#>  9: 0.9268308 0.9334569 0.9440645 0.9701574
#> 10: 0.8733195 0.8809583 0.8936986 0.9251443
#> 11: 0.4180231 0.4223019 0.4291331 0.4466159
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5245       0    28     5199
#>  2: 2020-03-25    5191         5249  2020-04-01  5184       1    40     5115
#>  3: 2020-03-26    5102         5210  2020-04-01  6062       1    65     5951
#>  4: 2020-03-27    5924         6153  2020-04-01  5754       2    83     5610
#>  5: 2020-03-28    5567         5959  2020-04-01  5550       3   106     5364
#>  6: 2020-03-29    5289         5974  2020-04-01  4505       3   113     4311
#>  7: 2020-03-30    4183         5217  2020-04-01  3048       2    99     2883
#>  8: 2020-03-31    2668         4050  2020-04-01  4012       3   155     3763
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6007       0    20     5974
#> 11: 2020-03-30    5209         5217  2020-04-06  5264       0    28     5217
#> 12: 2020-03-31    4035         4050  2020-04-06  4099       0    32     4046
#> 13: 2020-04-01    4020         4053  2020-04-06  4114       1    44     4038
#> 14: 2020-04-02    4697         4782  2020-04-06  4855       2    70     4733
#> 15: 2020-04-03    4483         4668  2020-04-06  4704       2    89     4546
#> 16: 2020-04-04    4182         4585  2020-04-06  4504       3   113     4310
#> 17: 2020-04-05    3864         4805  2020-04-06  4414       3   144     4176
#> 18: 2020-04-06    2420         4316  2020-04-06  5776       5   223     5418
#> 19: 2020-04-03    4653         4668  2020-04-11  4682       0    15     4656
#> 20: 2020-04-04    4554         4585  2020-04-11  4602       0    24     4561
#> 21: 2020-04-05    4741         4805  2020-04-11  4817       1    37     4753
#> 22: 2020-04-06    4208         4316  2020-04-11  4306       1    46     4227
#> 23: 2020-04-07    3431         3599  2020-04-11  3546       1    51     3458
#> 24: 2020-04-08    2779         3039  2020-04-11  2916       1    55     2818
#> 25: 2020-04-09    3234         3836  2020-04-11  3483       2    87     3333
#> 26: 2020-04-10    2993         4204  2020-04-11  3419       2   111     3235
#> 27: 2020-04-11    1848         3951  2020-04-11  4411       4   170     4137
#> 28: 2020-04-08    3036         3039  2020-04-16  3055       0    10     3038
#> 29: 2020-04-09    3829         3836  2020-04-16  3869       0    20     3835
#> 30: 2020-04-10    4187         4204  2020-04-16  4254       1    33     4198
#> 31: 2020-04-11    3916         3951  2020-04-16  4007       1    43     3934
#> 32: 2020-04-12    4605         4686  2020-04-16  4760       2    68     4641
#> 33: 2020-04-13    3925         4074  2020-04-16  4119       2    78     3980
#> 34: 2020-04-14    2873         3124  2020-04-16  3094       2    77     2961
#> 35: 2020-04-15    2393         2919  2020-04-16  2733       2    89     2586
#> 36: 2020-04-16    1512         2580  2020-04-16  3609       3   139     3385
#> 37: 2020-04-13    4074         4074  2020-04-21  4099       0    13     4077
#> 38: 2020-04-14    3124         3124  2020-04-21  3157       0    17     3128
#> 39: 2020-04-15    2919         2919  2020-04-21  2965       0    23     2926
#> 40: 2020-04-16    2580         2580  2020-04-21  2640       0    28     2591
#> 41: 2020-04-17    3563         3563  2020-04-21  3683       1    53     3591
#> 42: 2020-04-18    3126         3126  2020-04-21  3280       1    62     3170
#> 43: 2020-04-19    2843         2843  2020-04-21  3062       2    76     2930
#> 44: 2020-04-20    2051         2051  2020-04-21  2343       1    76     2216
#> 45: 2020-04-21     962          962  2020-04-21  2296       2    88     2153
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5224     5237   5245     5253     5265     5293
#>  2:     5154     5172   5183     5195     5212     5252
#>  3:     6015     6045   6062     6080     6107     6171
#>  4:     5696     5734   5756     5778     5812     5890
#>  5:     5478     5525   5554     5583     5625     5719
#>  6:     4430     4481   4513     4540     4583     4682
#>  7:     2985     3028   3055     3079     3113     3202
#>  8:     3917     3980   4021     4057     4115     4252
#>  9:        0        0      0        0        0        0
#> 10:     5992     6001   6006     6012     6020     6041
#> 11:     5242     5255   5263     5271     5283     5312
#> 12:     4076     4090   4099     4108     4122     4153
#> 13:     4081     4102   4113     4126     4144     4187
#> 14:     4805     4838   4856     4875     4904     4969
#> 15:     4643     4683   4708     4732     4768     4848
#> 16:     4429     4480   4512     4539     4582     4681
#> 17:     4323     4386   4424     4459     4509     4638
#> 18:     5639     5730   5789     5840     5925     6122
#> 19:     4670     4677   4681     4686     4692     4708
#> 20:     4583     4594   4601     4608     4619     4644
#> 21:     4789     4806   4816     4827     4843     4880
#> 22:     4272     4294   4306     4319     4338     4383
#> 23:     3510     3534   3547     3561     3582     3630
#> 24:     2878     2903   2918     2933     2955     3005
#> 25:     3425     3464   3489     3510     3543     3620
#> 26:     3349     3397   3427     3454     3492     3592
#> 27:     4306     4376   4420     4460     4524     4675
#> 28:     3047     3052   3054     3057     3061     3072
#> 29:     3853     3863   3868     3874     3883     3904
#> 30:     4229     4245   4253     4263     4277     4310
#> 31:     3976     3996   4007     4019     4037     4079
#> 32:     4711     4743   4761     4780     4808     4872
#> 33:     4065     4100   4122     4143     4174     4244
#> 34:     3043     3077   3099     3118     3148     3215
#> 35:     2677     2716   2740     2761     2792     2872
#> 36:     3523     3580   3617     3649     3702     3825
#> 37:     4089     4095   4099     4103     4108     4122
#> 38:     3144     3152   3156     3161     3168     3185
#> 39:     2948     2959   2965     2972     2982     3004
#> 40:     2619     2632   2640     2648     2660     2687
#> 41:     3645     3669   3684     3698     3720     3769
#> 42:     3238     3266   3283     3300     3325     3380
#> 43:     3011     3045   3067     3085     3115     3182
#> 44:     2294     2328   2348     2367     2393     2461
#> 45:     2241     2277   2301     2321     2355     2433
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  generation_time = generation_time_opts(example_generation_time),
  example_truncated[[5]],
  truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmpzmHRoz/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmpzmHRoz/epinow/2020-04-21.log.
#> WARN [2025-07-13 16:45:59] epinow: There were 12 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-07-13 16:45:59] epinow: Examine the pairs() plot to diagnose sampling problems
#>  - 
plot(out)

options(old_opts)
# }