Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used passed to the truncation argument in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data and propagate the uncertainty associated with data truncation into the estimates.

See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  data,
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...,
  obs
)

Arguments

data

A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

truncation

A call to trunc_opts() defining the truncation of the observed data. Defaults to trunc_opts(), i.e. no truncation. See the estimate_truncation() help file for an approach to estimating this from data where the dist list element returned by estimate_truncation() is used as the truncation argument here, thereby propagating the uncertainty in the estimate.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

stan

A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.

CrIs

Numeric vector of credible intervals to calculate.

filter_leading_zeros

Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.

zero_threshold

[Experimental] Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.

weigh_delay_priors

Deprecated; use the weight_prior option in trunc_opts() instead.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

obs

Deprecated; use data instead.

Value

A list containing: the summary parameters of the truncation distribution (dist), which could be passed to the truncation argument of epinow(), regional_epinow(), and estimate_infections(), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# \donttest{
# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2024-12-20 21:41:04] estimate_truncation (chain: 1): There were 10 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2024-12-20 21:41:04] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.9
#>       sd:
#>         0.46
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.2
#>       sd:
#>         0.66
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.868849e-18 1.273934e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9972753 4.514903e-05 1.509180e-03 0.9945940 0.9962185 0.9969473
#>  3:     3 0.9939811 9.660325e-05 3.228858e-03 0.9882674 0.9916965 0.9932484
#>  4:     4 0.9899092 1.558923e-04 5.210631e-03 0.9807389 0.9861973 0.9887055
#>  5:     5 0.9847324 2.250755e-04 7.524563e-03 0.9718232 0.9793838 0.9829247
#>  6:     6 0.9779032 3.069497e-04 1.026728e-02 0.9604877 0.9707168 0.9753269
#>  7:     7 0.9684230 4.052489e-04 1.357207e-02 0.9458148 0.9590183 0.9648448
#>  8:     8 0.9542338 5.239514e-04 1.761111e-02 0.9255068 0.9421863 0.9493498
#>  9:     9 0.9301862 6.599340e-04 2.250936e-02 0.8944398 0.9149302 0.9241066
#> 10:    10 0.8775009 7.111290e-04 2.747865e-02 0.8351377 0.8587945 0.8699125
#> 11:    11 0.4201200 3.822738e-04 1.565393e-02 0.3965888 0.4095389 0.4158845
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9973828 0.9977871 0.9984367 0.9995866
#>  3: 0.9941755 0.9950177 0.9964247 0.9989926
#>  4: 0.9901591 0.9915443 0.9938176 0.9981254
#>  5: 0.9849896 0.9870101 0.9902510 0.9967929
#>  6: 0.9780120 0.9807912 0.9852898 0.9946402
#>  7: 0.9683102 0.9719617 0.9778383 0.9908507
#>  8: 0.9538794 0.9584895 0.9660560 0.9843506
#>  9: 0.9291598 0.9348722 0.9442827 0.9685577
#> 10: 0.8761752 0.8827219 0.8941617 0.9228567
#> 11: 0.4193444 0.4230904 0.4295957 0.4459676
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5244       0    27     5200
#>  2: 2020-03-25    5191         5249  2020-04-01  5181       1    39     5118
#>  3: 2020-03-26    5102         5210  2020-04-01  6058       1    63     5955
#>  4: 2020-03-27    5924         6153  2020-04-01  5749       2    80     5618
#>  5: 2020-03-28    5567         5959  2020-04-01  5544       3   102     5373
#>  6: 2020-03-29    5289         5974  2020-04-01  4499       3   108     4318
#>  7: 2020-03-30    4183         5217  2020-04-01  3043       2    94     2891
#>  8: 2020-03-31    2668         4050  2020-04-01  4006       3   147     3769
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6006       0    19     5976
#> 11: 2020-03-30    5209         5217  2020-04-06  5262       0    27     5218
#> 12: 2020-03-31    4035         4050  2020-04-06  4097       0    31     4047
#> 13: 2020-04-01    4020         4053  2020-04-06  4111       1    43     4041
#> 14: 2020-04-02    4697         4782  2020-04-06  4851       2    68     4740
#> 15: 2020-04-03    4483         4668  2020-04-06  4699       2    86     4554
#> 16: 2020-04-04    4182         4585  2020-04-06  4498       3   108     4317
#> 17: 2020-04-05    3864         4805  2020-04-06  4407       3   136     4186
#> 18: 2020-04-06    2420         4316  2020-04-06  5768       5   212     5426
#> 19: 2020-04-03    4653         4668  2020-04-11  4681       0    15     4657
#> 20: 2020-04-04    4554         4585  2020-04-11  4600       0    24     4562
#> 21: 2020-04-05    4741         4805  2020-04-11  4814       1    36     4756
#> 22: 2020-04-06    4208         4316  2020-04-11  4303       1    45     4230
#> 23: 2020-04-07    3431         3599  2020-04-11  3543       1    49     3462
#> 24: 2020-04-08    2779         3039  2020-04-11  2913       1    53     2823
#> 25: 2020-04-09    3234         3836  2020-04-11  3478       2    83     3338
#> 26: 2020-04-10    2993         4204  2020-04-11  3414       2   106     3243
#> 27: 2020-04-11    1848         3951  2020-04-11  4404       3   162     4143
#> 28: 2020-04-08    3036         3039  2020-04-16  3054       0     9     3039
#> 29: 2020-04-09    3829         3836  2020-04-16  3868       0    20     3836
#> 30: 2020-04-10    4187         4204  2020-04-16  4252       0    32     4200
#> 31: 2020-04-11    3916         3951  2020-04-16  4004       1    42     3937
#> 32: 2020-04-12    4605         4686  2020-04-16  4756       1    66     4647
#> 33: 2020-04-13    3925         4074  2020-04-16  4114       2    75     3987
#> 34: 2020-04-14    2873         3124  2020-04-16  3090       2    74     2966
#> 35: 2020-04-15    2393         2919  2020-04-16  2729       2    84     2593
#> 36: 2020-04-16    1512         2580  2020-04-16  3603       3   132     3390
#> 37: 2020-04-13    4074         4074  2020-04-21  4098       0    13     4078
#> 38: 2020-04-14    3124         3124  2020-04-21  3155       0    16     3129
#> 39: 2020-04-15    2919         2919  2020-04-21  2964       0    22     2928
#> 40: 2020-04-16    2580         2580  2020-04-21  2638       0    27     2593
#> 41: 2020-04-17    3563         3563  2020-04-21  3679       1    51     3595
#> 42: 2020-04-18    3126         3126  2020-04-21  3277       1    60     3175
#> 43: 2020-04-19    2843         2843  2020-04-21  3058       2    73     2935
#> 44: 2020-04-20    2051         2051  2020-04-21  2339       1    72     2222
#> 45: 2020-04-21     962          962  2020-04-21  2292       1    84     2157
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5223     5235   5242     5250     5263     5292
#>  2:     5152     5169   5179     5190     5209     5249
#>  3:     6012     6040   6057     6073     6102     6167
#>  4:     5693     5727   5749     5769     5804     5885
#>  5:     5474     5518   5544     5571     5613     5714
#>  6:     4429     4474   4501     4526     4571     4676
#>  7:     2983     3022   3045     3066     3106     3194
#>  8:     3912     3973   4008     4041     4104     4238
#>  9:        0        0      0        0        0        0
#> 10:     5991     5999   6004     6010     6019     6040
#> 11:     5241     5253   5260     5268     5281     5311
#> 12:     4074     4088   4096     4105     4119     4151
#> 13:     4080     4098   4110     4121     4141     4185
#> 14:     4803     4832   4850     4868     4897     4966
#> 15:     4640     4677   4699     4722     4758     4843
#> 16:     4428     4473   4500     4525     4570     4675
#> 17:     4321     4377   4410     4441     4499     4626
#> 18:     5633     5719   5770     5818     5909     6102
#> 19:     4669     4676   4680     4684     4691     4708
#> 20:     4582     4592   4599     4606     4617     4643
#> 21:     4787     4803   4813     4823     4840     4878
#> 22:     4270     4290   4302     4314     4334     4381
#> 23:     3508     3529   3543     3556     3577     3627
#> 24:     2876     2899   2913     2927     2949     3002
#> 25:     3424     3459   3480     3499     3534     3615
#> 26:     3347     3390   3415     3440     3485     3583
#> 27:     4301     4367   4406     4443     4512     4659
#> 28:     3046     3051   3053     3056     3061     3072
#> 29:     3852     3861   3867     3872     3882     3904
#> 30:     4228     4242   4250     4259     4275     4308
#> 31:     3974     3992   4004     4015     4034     4077
#> 32:     4709     4737   4755     4772     4801     4868
#> 33:     4062     4094   4114     4134     4165     4240
#> 34:     3042     3073   3092     3108     3140     3212
#> 35:     2676     2710   2731     2750     2786     2865
#> 36:     3519     3573   3605     3635     3691     3812
#> 37:     4088     4094   4097     4101     4108     4122
#> 38:     3143     3150   3155     3159     3167     3185
#> 39:     2947     2957   2963     2969     2980     3003
#> 40:     2618     2630   2638     2645     2657     2686
#> 41:     3643     3665   3679     3692     3715     3767
#> 42:     3235     3261   3277     3292     3317     3377
#> 43:     3010     3041   3059     3076     3107     3178
#> 44:     2293     2323   2340     2357     2388     2455
#> 45:     2239     2273   2294     2313     2348     2425
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


# Pass the truncation distribution to `epinow()`.
# Note, we're using the last snapshot as the observed data as it contains
# all the previous snapshots. Also, we're using the default options for
# illustrative purposes only.
out <- epinow(
  example_truncated[[5]],
  truncation = trunc_opts(est$dist)
)
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2 logs to the console and:
#> /tmp/RtmplSYXcX/regional-epinow/2020-04-21.log.
#> Logging threshold set at INFO for the name logger
#> Writing EpiNow2.epinow logs to the console and:
#> /tmp/RtmplSYXcX/epinow/2020-04-21.log.
#> WARN [2024-12-20 21:41:05] epinow: ! No generation time distribution given.
#>  Now using a fixed generation time of 1 day, i.e. the reproduction number is
#>   the same as the daily growth rate.
#>  If this was intended then this warning can be silenced by setting `dist =
#>   Fixed(1)`'. - 
#> WARN [2024-12-20 21:43:18] epinow: There were 16 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2024-12-20 21:43:18] epinow: Examine the pairs() plot to diagnose sampling problems
#>  - 
plot(out)

options(old_opts)
# }