Estimate Truncation of Observed Data — estimate

Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.
The data set with the latest observations is adjusted for truncation using the truncation distribution.
Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

Current truncation is related to past truncation.
Truncation is a multiplicative scaling of underlying reported cases.
Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data: A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.
truncation: A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.
model: A compiled stan model to override the default model. May be useful for package developers or those developing extensions.
stan: A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.
CrIs: Numeric vector of credible intervals to calculate.
filter_leading_zeros: Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.
zero_threshold: Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.
weigh_delay_priors: Deprecated; use the weight_prior option in trunc_opts() instead.
verbose: Logical, should model fitting progress be returned.
...: Additional parameters to pass to rstan::sampling().
obs: Deprecated; use data instead.

Value

A list containing: the summary parameters of the truncation distribution (dist), which could be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections(), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2025-02-20 09:31:50] estimate_truncation (chain: 1): There were 7 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-02-20 09:31:50] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.9
#>       sd:
#>         0.45
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.2
#>       sd:
#>         0.65
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.996384e-18 1.241081e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9972401 4.119192e-05 1.519567e-03 0.9946467 0.9961854 0.9968822
#>  3:     3 0.9939038 8.772757e-05 3.253960e-03 0.9883974 0.9916302 0.9930961
#>  4:     4 0.9897809 1.408050e-04 5.257126e-03 0.9810246 0.9861410 0.9883933
#>  5:     5 0.9845412 2.019796e-04 7.603159e-03 0.9720983 0.9792543 0.9825026
#>  6:     6 0.9776325 2.732350e-04 1.039598e-02 0.9610044 0.9703445 0.9746933
#>  7:     7 0.9680494 3.568831e-04 1.378307e-02 0.9461513 0.9583565 0.9640005
#>  8:     8 0.9537233 4.545126e-04 1.796668e-02 0.9253513 0.9410345 0.9482091
#>  9:     9 0.9294879 5.604604e-04 2.314474e-02 0.8934592 0.9133065 0.9222476
#> 10:    10 0.8765754 6.184461e-04 2.874597e-02 0.8324576 0.8569271 0.8681280
#> 11:    11 0.4195982 3.465569e-04 1.642918e-02 0.3939804 0.4085308 0.4151351
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9973130 0.9977331 0.9984000 0.9995982
#>  3: 0.9940481 0.9949375 0.9963661 0.9990372
#>  4: 0.9899182 0.9913838 0.9937054 0.9982257
#>  5: 0.9846382 0.9867015 0.9901427 0.9969796
#>  6: 0.9776255 0.9804983 0.9851524 0.9950013
#>  7: 0.9678872 0.9716190 0.9778215 0.9916590
#>  8: 0.9532789 0.9579658 0.9661197 0.9851498
#>  9: 0.9283641 0.9344218 0.9445515 0.9703486
#> 10: 0.8746851 0.8821058 0.8946402 0.9256043
#> 11: 0.4188815 0.4228498 0.4293740 0.4473664
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5244       0    27     5200
#>  2: 2020-03-25    5191         5249  2020-04-01  5182       1    40     5117
#>  3: 2020-03-26    5102         5210  2020-04-01  6060       1    64     5953
#>  4: 2020-03-27    5924         6153  2020-04-01  5751       2    81     5613
#>  5: 2020-03-28    5567         5959  2020-04-01  5547       2   104     5368
#>  6: 2020-03-29    5289         5974  2020-04-01  4503       2   111     4310
#>  7: 2020-03-30    4183         5217  2020-04-01  3046       2    99     2882
#>  8: 2020-03-31    2668         4050  2020-04-01  4012       3   155     3757
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6006       0    19     5975
#> 11: 2020-03-30    5209         5217  2020-04-06  5262       0    27     5218
#> 12: 2020-03-31    4035         4050  2020-04-06  4098       0    31     4047
#> 13: 2020-04-01    4020         4053  2020-04-06  4112       1    43     4040
#> 14: 2020-04-02    4697         4782  2020-04-06  4853       1    69     4736
#> 15: 2020-04-03    4483         4668  2020-04-06  4702       2    88     4550
#> 16: 2020-04-04    4182         4585  2020-04-06  4502       2   111     4309
#> 17: 2020-04-05    3864         4805  2020-04-06  4412       3   143     4174
#> 18: 2020-04-06    2420         4316  2020-04-06  5776       4   224     5409
#> 19: 2020-04-03    4653         4668  2020-04-11  4681       0    15     4657
#> 20: 2020-04-04    4554         4585  2020-04-11  4601       0    24     4562
#> 21: 2020-04-05    4741         4805  2020-04-11  4815       0    37     4755
#> 22: 2020-04-06    4208         4316  2020-04-11  4304       1    45     4229
#> 23: 2020-04-07    3431         3599  2020-04-11  3544       1    50     3459
#> 24: 2020-04-08    2779         3039  2020-04-11  2914       1    54     2820
#> 25: 2020-04-09    3234         3836  2020-04-11  3481       2    86     3332
#> 26: 2020-04-10    2993         4204  2020-04-11  3418       2   111     3233
#> 27: 2020-04-11    1848         3951  2020-04-11  4410       3   171     4130
#> 28: 2020-04-08    3036         3039  2020-04-16  3054       0    10     3038
#> 29: 2020-04-09    3829         3836  2020-04-16  3868       0    20     3835
#> 30: 2020-04-10    4187         4204  2020-04-16  4252       0    32     4199
#> 31: 2020-04-11    3916         3951  2020-04-16  4006       1    42     3935
#> 32: 2020-04-12    4605         4686  2020-04-16  4757       1    67     4643
#> 33: 2020-04-13    3925         4074  2020-04-16  4116       1    77     3984
#> 34: 2020-04-14    2873         3124  2020-04-16  3092       1    76     2960
#> 35: 2020-04-15    2393         2919  2020-04-16  2732       1    88     2585
#> 36: 2020-04-16    1512         2580  2020-04-16  3608       2   140     3379
#> 37: 2020-04-13    4074         4074  2020-04-21  4099       0    13     4077
#> 38: 2020-04-14    3124         3124  2020-04-21  3156       0    16     3129
#> 39: 2020-04-15    2919         2919  2020-04-21  2965       0    22     2927
#> 40: 2020-04-16    2580         2580  2020-04-21  2639       0    28     2592
#> 41: 2020-04-17    3563         3563  2020-04-21  3681       1    52     3592
#> 42: 2020-04-18    3126         3126  2020-04-21  3278       1    61     3173
#> 43: 2020-04-19    2843         2843  2020-04-21  3060       1    75     2929
#> 44: 2020-04-20    2051         2051  2020-04-21  2342       1    76     2215
#> 45: 2020-04-21     962          962  2020-04-21  2296       1    89     2150
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5223     5236   5243     5251     5263     5291
#>  2:     5152     5170   5181     5192     5210     5248
#>  3:     6013     6041   6059     6077     6105     6164
#>  4:     5693     5729   5751     5774     5808     5883
#>  5:     5474     5521   5548     5577     5620     5715
#>  6:     4428     4476   4505     4535     4580     4681
#>  7:     2982     3024   3050     3073     3113     3204
#>  8:     3915     3975   4013     4049     4114     4266
#>  9:        0        0      0        0        0        0
#> 10:     5991     6000   6005     6011     6020     6040
#> 11:     5241     5254   5262     5270     5282     5309
#> 12:     4075     4089   4097     4106     4120     4150
#> 13:     4080     4099   4112     4124     4142     4183
#> 14:     4803     4834   4852     4872     4901     4964
#> 15:     4640     4679   4702     4727     4763     4844
#> 16:     4427     4475   4504     4534     4578     4680
#> 17:     4319     4380   4417     4450     4509     4641
#> 18:     5636     5723   5777     5829     5923     6142
#> 19:     4669     4676   4680     4685     4692     4707
#> 20:     4582     4593   4600     4607     4618     4642
#> 21:     4788     4804   4814     4825     4841     4877
#> 22:     4271     4291   4304     4317     4336     4378
#> 23:     3508     3531   3544     3559     3580     3626
#> 24:     2876     2900   2915     2930     2953     3003
#> 25:     3423     3460   3483     3506     3540     3619
#> 26:     3345     3393   3421     3447     3492     3595
#> 27:     4303     4370   4411     4451     4523     4690
#> 28:     3047     3051   3054     3057     3061     3071
#> 29:     3853     3862   3867     3873     3882     3903
#> 30:     4228     4243   4252     4261     4275     4307
#> 31:     3975     3993   4005     4017     4035     4074
#> 32:     4709     4739   4757     4776     4805     4867
#> 33:     4062     4097   4117     4139     4170     4241
#> 34:     3041     3074   3094     3115     3145     3215
#> 35:     2674     2712   2735     2756     2792     2874
#> 36:     3521     3575   3609     3642     3701     3837
#> 37:     4088     4094   4098     4102     4108     4121
#> 38:     3143     3151   3155     3160     3167     3184
#> 39:     2948     2958   2964     2970     2980     3002
#> 40:     2618     2631   2639     2646     2658     2684
#> 41:     3643     3667   3681     3696     3717     3765
#> 42:     3235     3263   3279     3296     3321     3378
#> 43:     3009     3042   3062     3082     3112     3182
#> 44:     2292     2325   2344     2362     2393     2463
#> 45:     2240     2275   2296     2317     2354     2441
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  generation_time = generation_time_opts(example_generation_time),
  example_truncated[[5]],
  truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmpXOk7Bx/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmpXOk7Bx/epinow/2020-04-21.log.
#> WARN [2025-02-20 09:34:42] epinow: There were 4 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2025-02-20 09:34:42] epinow: Examine the pairs() plot to diagnose sampling problems
#>  - 
plot(out)

options(old_opts)
# }