Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data

A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

truncation

A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

stan

A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.

CrIs

Numeric vector of credible intervals to calculate.

filter_leading_zeros

Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.

zero_threshold

[Experimental] Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.

weigh_delay_priors

Deprecated; use the weight_prior option in trunc_opts() instead.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

obs

Deprecated; use data instead.

Value

A list containing: the summary parameters of the truncation distribution (dist), which could be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections(), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2025-12-19 17:45:34] estimate_truncation (chain: 1): There were 11 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-12-19 17:45:34] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.9
#>       sd:
#>         0.46
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.2
#>       sd:
#>         0.71
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.944730e-18 1.281895e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9972605 4.867348e-05 1.585199e-03 0.9945104 0.9962054 0.9969804
#>  3:     3 0.9939538 1.040728e-04 3.390517e-03 0.9881623 0.9916583 0.9932890
#>  4:     4 0.9898733 1.679187e-04 5.471613e-03 0.9806566 0.9861976 0.9887057
#>  5:     5 0.9846952 2.426087e-04 7.905554e-03 0.9714968 0.9794354 0.9829350
#>  6:     6 0.9778773 3.336925e-04 1.080196e-02 0.9600425 0.9706240 0.9753013
#>  7:     7 0.9684321 4.483195e-04 1.432055e-02 0.9454541 0.9587650 0.9651087
#>  8:     8 0.9543251 5.938731e-04 1.869399e-02 0.9252447 0.9417461 0.9496416
#>  9:     9 0.9304576 7.603693e-04 2.420187e-02 0.8942182 0.9145460 0.9236810
#> 10:    10 0.8779849 8.008940e-04 3.000521e-02 0.8336619 0.8585523 0.8694527
#> 11:    11 0.4204066 4.108028e-04 1.713876e-02 0.3953593 0.4091003 0.4154806
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9973771 0.9977285 0.9983844 0.9997465
#>  3: 0.9941482 0.9949089 0.9963326 0.9993745
#>  4: 0.9901333 0.9913502 0.9936342 0.9988211
#>  5: 0.9849411 0.9866850 0.9900152 0.9979098
#>  6: 0.9779266 0.9803808 0.9850079 0.9965054
#>  7: 0.9682469 0.9714666 0.9773885 0.9938562
#>  8: 0.9536902 0.9577404 0.9653897 0.9883349
#>  9: 0.9288236 0.9344847 0.9434331 0.9756384
#> 10: 0.8758597 0.8819218 0.8932112 0.9344423
#> 11: 0.4189786 0.4226764 0.4293971 0.4523016
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5244       0    29     5197
#>  2: 2020-03-25    5191         5249  2020-04-01  5181       1    41     5112
#>  3: 2020-03-26    5102         5210  2020-04-01  6058       2    67     5944
#>  4: 2020-03-27    5924         6153  2020-04-01  5749       2    85     5601
#>  5: 2020-03-28    5567         5959  2020-04-01  5544       3   108     5351
#>  6: 2020-03-29    5289         5974  2020-04-01  4498       3   115     4287
#>  7: 2020-03-30    4183         5217  2020-04-01  3042       2   102     2855
#>  8: 2020-03-31    2668         4050  2020-04-01  4005       3   159     3716
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6006       0    20     5973
#> 11: 2020-03-30    5209         5217  2020-04-06  5262       0    29     5215
#> 12: 2020-03-31    4035         4050  2020-04-06  4097       1    32     4043
#> 13: 2020-04-01    4020         4053  2020-04-06  4111       1    45     4034
#> 14: 2020-04-02    4697         4782  2020-04-06  4851       2    71     4726
#> 15: 2020-04-03    4483         4668  2020-04-06  4699       2    91     4535
#> 16: 2020-04-04    4182         4585  2020-04-06  4497       3   115     4286
#> 17: 2020-04-05    3864         4805  2020-04-06  4406       3   147     4135
#> 18: 2020-04-06    2420         4316  2020-04-06  5765       5   230     5350
#> 19: 2020-04-03    4653         4668  2020-04-11  4681       0    15     4655
#> 20: 2020-04-04    4554         4585  2020-04-11  4600       0    25     4559
#> 21: 2020-04-05    4741         4805  2020-04-11  4814       1    38     4750
#> 22: 2020-04-06    4208         4316  2020-04-11  4303       1    47     4222
#> 23: 2020-04-07    3431         3599  2020-04-11  3543       1    52     3452
#> 24: 2020-04-08    2779         3039  2020-04-11  2913       1    56     2811
#> 25: 2020-04-09    3234         3836  2020-04-11  3478       2    89     3314
#> 26: 2020-04-10    2993         4204  2020-04-11  3412       2   114     3202
#> 27: 2020-04-11    1848         3951  2020-04-11  4402       4   175     4085
#> 28: 2020-04-08    3036         3039  2020-04-16  3054       0    10     3037
#> 29: 2020-04-09    3829         3836  2020-04-16  3868       0    21     3833
#> 30: 2020-04-10    4187         4204  2020-04-16  4252       1    34     4195
#> 31: 2020-04-11    3916         3951  2020-04-16  4005       1    44     3929
#> 32: 2020-04-12    4605         4686  2020-04-16  4756       2    70     4633
#> 33: 2020-04-13    3925         4074  2020-04-16  4114       2    80     3971
#> 34: 2020-04-14    2873         3124  2020-04-16  3089       2    79     2944
#> 35: 2020-04-15    2393         2919  2020-04-16  2728       2    91     2560
#> 36: 2020-04-16    1512         2580  2020-04-16  3602       3   143     3342
#> 37: 2020-04-13    4074         4074  2020-04-21  4098       0    14     4076
#> 38: 2020-04-14    3124         3124  2020-04-21  3156       0    17     3127
#> 39: 2020-04-15    2919         2919  2020-04-21  2964       0    23     2925
#> 40: 2020-04-16    2580         2580  2020-04-21  2638       0    29     2589
#> 41: 2020-04-17    3563         3563  2020-04-21  3679       1    54     3585
#> 42: 2020-04-18    3126         3126  2020-04-21  3276       1    64     3162
#> 43: 2020-04-19    2843         2843  2020-04-21  3057       2    78     2913
#> 44: 2020-04-20    2051         2051  2020-04-21  2338       2    78     2194
#> 45: 2020-04-21     962          962  2020-04-21  2291       2    91     2126
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5224     5236   5242     5250     5263     5293
#>  2:     5153     5170   5180     5190     5209     5251
#>  3:     6014     6042   6057     6074     6103     6170
#>  4:     5695     5730   5749     5768     5806     5888
#>  5:     5478     5522   5545     5569     5616     5716
#>  6:     4433     4476   4503     4528     4573     4677
#>  7:     2986     3025   3046     3068     3107     3200
#>  8:     3914     3977   4012     4045     4109     4251
#>  9:        0        0      0        0        0        0
#> 10:     5991     6000   6005     6010     6020     6041
#> 11:     5242     5254   5260     5268     5281     5311
#> 12:     4075     4089   4096     4105     4119     4153
#> 13:     4081     4100   4110     4121     4141     4187
#> 14:     4805     4834   4851     4866     4899     4967
#> 15:     4643     4680   4700     4720     4760     4845
#> 16:     4432     4475   4502     4527     4572     4676
#> 17:     4325     4381   4411     4444     4500     4634
#> 18:     5635     5725   5775     5824     5915     6121
#> 19:     4670     4676   4680     4684     4692     4708
#> 20:     4583     4593   4599     4606     4617     4643
#> 21:     4788     4804   4813     4823     4840     4880
#> 22:     4272     4292   4302     4314     4335     4383
#> 23:     3510     3531   3543     3555     3578     3628
#> 24:     2878     2901   2913     2926     2950     3003
#> 25:     3427     3460   3481     3501     3536     3616
#> 26:     3350     3393   3417     3442     3486     3590
#> 27:     4303     4372   4410     4447     4517     4674
#> 28:     3047     3051   3053     3056     3061     3072
#> 29:     3853     3862   3867     3872     3882     3904
#> 30:     4229     4243   4251     4259     4274     4309
#> 31:     3975     3994   4004     4015     4034     4078
#> 32:     4711     4740   4756     4771     4803     4870
#> 33:     4065     4098   4115     4133     4167     4242
#> 34:     3045     3074   3093     3110     3141     3212
#> 35:     2679     2713   2732     2752     2787     2870
#> 36:     3521     3577   3608     3639     3695     3824
#> 37:     4088     4094   4097     4101     4108     4122
#> 38:     3144     3151   3155     3159     3167     3185
#> 39:     2948     2958   2963     2969     2980     3004
#> 40:     2619     2631   2638     2645     2658     2687
#> 41:     3645     3667   3679     3691     3716     3768
#> 42:     3238     3263   3277     3291     3319     3378
#> 43:     3013     3042   3060     3077     3108     3179
#> 44:     2296     2325   2341     2358     2388     2460
#> 45:     2240     2275   2296     2315     2351     2433
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)
#> Ignoring unknown labels:
#>  fill : "Type"


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  generation_time = generation_time_opts(example_generation_time),
  example_truncated[[5]],
  truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmpPIfNEV/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmpPIfNEV/epinow/2020-04-21.log.
#> WARN [2025-12-19 17:48:18] epinow: There were 2 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-12-19 17:48:18] epinow: Examine the pairs() plot to diagnose sampling problems
#>  - 
plot(out)

options(old_opts)
# }