Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data

A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

truncation

A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

stan

A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.

CrIs

Numeric vector of credible intervals to calculate.

filter_leading_zeros

Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.

zero_threshold

[Experimental] Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.

weigh_delay_priors

Deprecated; use the weight_prior option in trunc_opts() instead.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

obs

Deprecated; use data instead.

Value

A list containing: the summary parameters of the truncation distribution (dist), which could be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections(), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2024-07-25 13:42:58] estimate_truncation (chain: 1): There were 11 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2024-07-25 13:42:58] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.5
#>       sd:
#>         0.42
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.3
#>       sd:
#>         0.74
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.978589e-18 1.285018e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9959462 6.497312e-05 2.369883e-03 0.9918610 0.9942452 0.9953981
#>  3:     3 0.9911081 1.381692e-04 5.049245e-03 0.9825195 0.9874871 0.9898980
#>  4:     4 0.9852131 2.215392e-04 8.112403e-03 0.9716009 0.9793982 0.9831825
#>  5:     5 0.9778386 3.175203e-04 1.166082e-02 0.9583967 0.9694376 0.9749067
#>  6:     6 0.9682883 4.294179e-04 1.583623e-02 0.9422300 0.9567850 0.9640496
#>  7:     7 0.9553150 5.622166e-04 2.083975e-02 0.9217175 0.9402150 0.9497443
#>  8:     8 0.9364028 7.219155e-04 2.694867e-02 0.8937583 0.9172530 0.9287781
#>  9:     9 0.9054164 9.106295e-04 3.442699e-02 0.8520255 0.8820322 0.8952061
#> 10:    10 0.8407753 1.074547e-03 4.215180e-02 0.7763458 0.8124745 0.8283208
#> 11:    11 0.3978766 5.905733e-04 2.367395e-02 0.3610141 0.3822660 0.3910367
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9961208 0.9967760 0.9977140 0.9996405
#>  3: 0.9914047 0.9928148 0.9948491 0.9991301
#>  4: 0.9856095 0.9877776 0.9911060 0.9983564
#>  5: 0.9782034 0.9813310 0.9862180 0.9971823
#>  6: 0.9684357 0.9725519 0.9793960 0.9952651
#>  7: 0.9552000 0.9602784 0.9695843 0.9917761
#>  8: 0.9355103 0.9419886 0.9540390 0.9844180
#>  9: 0.9031839 0.9109961 0.9259069 0.9680421
#> 10: 0.8369670 0.8477014 0.8651704 0.9165417
#> 11: 0.3960778 0.4018679 0.4111110 0.4394924
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5237       0    26     5195
#>  2: 2020-03-25    5191         5249  2020-04-01  5178       1    42     5110
#>  3: 2020-03-26    5102         5210  2020-04-01  6059       1    72     5940
#>  4: 2020-03-27    5924         6153  2020-04-01  5750       2    94     5593
#>  5: 2020-03-28    5567         5959  2020-04-01  5539       3   120     5332
#>  6: 2020-03-29    5289         5974  2020-04-01  4470       3   128     4249
#>  7: 2020-03-30    4183         5217  2020-04-01  2950       2   111     2756
#>  8: 2020-03-31    2668         4050  2020-04-01  2004       2    98     1834
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6023       0    30     5975
#> 11: 2020-03-30    5209         5217  2020-04-06  5287       1    43     5217
#> 12: 2020-03-31    4035         4050  2020-04-06  4127       1    49     4046
#> 13: 2020-04-01    4020         4053  2020-04-06  4152       1    68     4039
#> 14: 2020-04-02    4697         4782  2020-04-06  4919       2   107     4735
#> 15: 2020-04-03    4483         4668  2020-04-06  4791       3   137     4553
#> 16: 2020-04-04    4182         4585  2020-04-06  4625       4   174     4320
#> 17: 2020-04-05    3864         4805  2020-04-06  4607       5   227     4215
#> 18: 2020-04-06    2420         4316  2020-04-06  6103       8   356     5506
#> 19: 2020-04-03    4653         4668  2020-04-11  4694       0    23     4657
#> 20: 2020-04-04    4554         4585  2020-04-11  4622       1    38     4561
#> 21: 2020-04-05    4741         4805  2020-04-11  4849       1    57     4754
#> 22: 2020-04-06    4208         4316  2020-04-11  4346       1    71     4228
#> 23: 2020-04-07    3431         3599  2020-04-11  3593       2    78     3459
#> 24: 2020-04-08    2779         3039  2020-04-11  2970       2    85     2822
#> 25: 2020-04-09    3234         3836  2020-04-11  3576       3   134     3340
#> 26: 2020-04-10    2993         4204  2020-04-11  3568       4   176     3265
#> 27: 2020-04-11    1848         3951  2020-04-11  4660       6   271     4204
#> 28: 2020-04-08    3036         3039  2020-04-16  3063       0    15     3038
#> 29: 2020-04-09    3829         3836  2020-04-16  3886       0    32     3835
#> 30: 2020-04-10    4187         4204  2020-04-16  4282       1    51     4198
#> 31: 2020-04-11    3916         3951  2020-04-16  4045       1    66     3934
#> 32: 2020-04-12    4605         4686  2020-04-16  4822       2   105     4643
#> 33: 2020-04-13    3925         4074  2020-04-16  4195       3   120     3987
#> 34: 2020-04-14    2873         3124  2020-04-16  3177       3   119     2967
#> 35: 2020-04-15    2393         2919  2020-04-16  2853       3   140     2610
#> 36: 2020-04-16    1512         2580  2020-04-16  3813       5   222     3440
#> 37: 2020-04-13    4074         4074  2020-04-21  4110       0    20     4077
#> 38: 2020-04-14    3124         3124  2020-04-21  3171       0    26     3129
#> 39: 2020-04-15    2919         2919  2020-04-21  2985       0    35     2927
#> 40: 2020-04-16    2580         2580  2020-04-21  2665       1    43     2592
#> 41: 2020-04-17    3563         3563  2020-04-21  3731       2    81     3592
#> 42: 2020-04-18    3126         3126  2020-04-21  3341       2    95     3175
#> 43: 2020-04-19    2843         2843  2020-04-21  3144       3   118     2936
#> 44: 2020-04-20    2051         2051  2020-04-21  2445       3   120     2237
#> 45: 2020-04-21     962          962  2020-04-21  2426       3   141     2188
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5217     5228   5236     5243     5256     5283
#>  2:     5147     5165   5176     5189     5209     5251
#>  3:     6006     6036   6056     6076     6110     6181
#>  4:     5684     5724   5748     5774     5818     5908
#>  5:     5454     5507   5537     5568     5625     5738
#>  6:     4384     4440   4471     4503     4560     4680
#>  7:     2881     2928   2953     2980     3024     3131
#>  8:     1942     1983   2008     2029     2068     2165
#>  9:        0        0      0        0        0        0
#> 10:     6000     6013   6021     6030     6045     6076
#> 11:     5255     5273   5285     5298     5318     5361
#> 12:     4091     4111   4124     4138     4162     4210
#> 13:     4104     4133   4151     4169     4201     4266
#> 14:     4844     4891   4917     4945     4995     5095
#> 15:     4698     4759   4792     4826     4887     5015
#> 16:     4516     4590   4630     4671     4741     4908
#> 17:     4466     4558   4616     4664     4755     4977
#> 18:     5886     6021   6109     6188     6330     6703
#> 19:     4677     4686   4693     4700     4711     4735
#> 20:     4594     4610   4620     4631     4649     4687
#> 21:     4807     4831   4846     4863     4890     4946
#> 22:     4296     4326   4345     4364     4398     4466
#> 23:     3538     3572   3591     3612     3649     3722
#> 24:     2912     2950   2970     2992     3029     3109
#> 25:     3492     3549   3580     3612     3666     3795
#> 26:     3459     3530   3576     3613     3683     3855
#> 27:     4495     4598   4665     4725     4834     5118
#> 28:     3051     3057   3062     3066     3074     3090
#> 29:     3863     3876   3884     3894     3909     3940
#> 30:     4245     4266   4280     4294     4318     4368
#> 31:     3998     4026   4043     4062     4092     4156
#> 32:     4749     4795   4820     4848     4897     4996
#> 33:     4114     4166   4195     4225     4279     4391
#> 34:     3102     3153   3180     3209     3257     3371
#> 35:     2765     2822   2859     2888     2945     3082
#> 36:     3677     3762   3817     3866     3955     4188
#> 37:     4095     4103   4109     4115     4125     4146
#> 38:     3152     3162   3169     3177     3189     3215
#> 39:     2959     2974   2984     2994     3011     3045
#> 40:     2634     2652   2664     2676     2696     2738
#> 41:     3674     3710   3730     3751     3789     3865
#> 42:     3276     3318   3341     3365     3408     3497
#> 43:     3070     3120   3147     3175     3223     3336
#> 44:     2370     2419   2450     2476     2524     2641
#> 45:     2340     2393   2428     2460     2516     2664
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  example_truncated[[5]],
  truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the EpiNow2 logger
#> Writing EpiNow2 logs to the console and: /tmp/RtmpndnyWS/regional-epinow/2020-04-21.log
#> Logging threshold set at INFO for the EpiNow2.epinow logger
#> Writing EpiNow2.epinow logs to the console and: /tmp/RtmpndnyWS/epinow/2020-04-21.log
#> WARN [2024-07-25 13:43:00] epinow: No generation time distribution given. Assuming a fixed generation time of 1 day, i.e. the reproduction number is the same as the daily growth rate. If this was intended then this warning can be silenced by setting `dist` explicitly to `Fixed(1)`. - gt_opts
#> WARN [2024-07-25 13:47:31] epinow: There were 45 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2024-07-25 13:47:31] epinow: Examine the pairs() plot to diagnose sampling problems
#>  - 
plot(out)

options(old_opts)
# }