Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow(), epinow(), and estimate_infections() to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  obs,
  max_truncation,
  trunc_max = 10,
  trunc_dist = "lognormal",
  truncation = trunc_opts(LogNormal(meanlog = Normal(0, 1), sdlog = Normal(1, 1), max =
    10)),
  model = NULL,
  stan = stan_opts(),
  CrIs = c(0.2, 0.5, 0.9),
  filter_leading_zeros = FALSE,
  zero_threshold = Inf,
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

obs

A list of <data.frame>s each containing a date variable and a confirm (numeric) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Deprecated; use truncation instead.

trunc_max

Deprecated; use truncation instead.

trunc_dist

Deprecated; use truncation instead.

truncation

A call to trunc_opts() defining the truncation of observed data. Defaults to trunc_opts(). See estimate_truncation() for an approach to estimating truncation from data.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

stan

A list of stan options as generated by stan_opts(). Defaults to stan_opts(). Can be used to override data, init, and verbose settings if desired.

CrIs

Numeric vector of credible intervals to calculate.

filter_leading_zeros

Logical, defaults to TRUE. Should zeros at the start of the time series be filtered out.

zero_threshold

[Experimental] Numeric defaults to Inf. Indicates if detected zero cases are meaningful by using a threshold number of cases based on the 7-day average. If the average is above this threshold then the zero is replaced using fill.

weigh_delay_priors

Deprecated; use the weight_prior option in trunc_opts() instead.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling().

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a <data.frame> containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a <data.frame> containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Examples

# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# fit model to example data
# See [example_truncated] for more details
est <- estimate_truncation(example_truncated,
  verbose = interactive(),
  chains = 2, iter = 2000
)
#> WARN [2024-04-25 14:35:16] estimate_truncation (chain: 1): There were 25 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them. - 
#> WARN [2024-04-25 14:35:16] estimate_truncation (chain: 1): Examine the pairs() plot to diagnose sampling problems
#>  - 

# summary of the distribution
est$dist
#> - lognormal distribution (max: 10):
#>   meanlog:
#>     - normal distribution:
#>       mean:
#>         -1.5
#>       sd:
#>         0.42
#>   sdlog:
#>     - normal distribution:
#>       mean:
#>         2.3
#>       sd:
#>         0.78
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>     <int>     <num>        <num>        <num>     <num>     <num>     <num>
#>  1:     1 1.0000000 2.965431e-18 1.266896e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9958943 7.811206e-05 2.425520e-03 0.9916766 0.9941911 0.9953472
#>  3:     3 0.9909983 1.665071e-04 5.164482e-03 0.9820701 0.9873934 0.9897497
#>  4:     4 0.9850382 2.678167e-04 8.292710e-03 0.9708321 0.9792489 0.9830146
#>  5:     5 0.9775902 3.857553e-04 1.191441e-02 0.9575502 0.9693891 0.9745650
#>  6:     6 0.9679567 5.257750e-04 1.617684e-02 0.9414292 0.9567064 0.9637027
#>  7:     7 0.9548908 6.962624e-04 2.129397e-02 0.9208823 0.9397821 0.9488982
#>  8:     8 0.9358834 9.118719e-04 2.758054e-02 0.8930050 0.9161742 0.9276654
#>  9:     9 0.9048426 1.196486e-03 3.545357e-02 0.8529270 0.8796020 0.8938938
#> 10:    10 0.8404026 1.527546e-03 4.452953e-02 0.7789812 0.8095312 0.8263420
#> 11:    11 0.3977947 8.724564e-04 2.540405e-02 0.3631330 0.3808849 0.3895427
#>        median  upper_20  upper_50  upper_90
#>         <num>     <num>     <num>     <num>
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9960525 0.9966948 0.9977037 0.9996963
#>  3: 0.9912985 0.9926157 0.9948007 0.9992552
#>  4: 0.9854057 0.9875406 0.9910707 0.9985619
#>  5: 0.9778528 0.9809524 0.9861202 0.9973901
#>  6: 0.9679971 0.9721551 0.9792921 0.9953694
#>  7: 0.9543219 0.9598981 0.9694710 0.9921415
#>  8: 0.9345028 0.9418159 0.9535646 0.9849487
#>  9: 0.9020340 0.9113486 0.9262196 0.9684578
#> 10: 0.8366951 0.8467094 0.8644616 0.9176681
#> 11: 0.3952949 0.4010334 0.4109629 0.4415563
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>         <Date>   <num>        <num>      <Date> <int>   <int> <int>    <int>
#>  1: 2020-03-24    4764         4789  2020-04-01  5238       0    27     5194
#>  2: 2020-03-25    5191         5249  2020-04-01  5179       1    43     5109
#>  3: 2020-03-26    5102         5210  2020-04-01  6060       2    74     5939
#>  4: 2020-03-27    5924         6153  2020-04-01  5752       3    96     5592
#>  5: 2020-03-28    5567         5959  2020-04-01  5541       4   123     5330
#>  6: 2020-03-29    5289         5974  2020-04-01  4473       4   131     4246
#>  7: 2020-03-30    4183         5217  2020-04-01  2953       3   114     2754
#>  8: 2020-03-31    2668         4050  2020-04-01  2005       3   103     1831
#>  9: 2020-04-01    1681         4053  2020-04-01     0      NA     0        0
#> 10: 2020-03-29    5970         5974  2020-04-06  6024       1    31     5974
#> 11: 2020-03-30    5209         5217  2020-04-06  5288       1    44     5216
#> 12: 2020-03-31    4035         4050  2020-04-06  4128       1    50     4045
#> 13: 2020-04-01    4020         4053  2020-04-06  4154       2    69     4038
#> 14: 2020-04-02    4697         4782  2020-04-06  4921       3   109     4734
#> 15: 2020-04-03    4483         4668  2020-04-06  4794       4   140     4551
#> 16: 2020-04-04    4182         4585  2020-04-06  4628       5   179     4318
#> 17: 2020-04-05    3864         4805  2020-04-06  4610       7   237     4210
#> 18: 2020-04-06    2420         4316  2020-04-06  6107      12   373     5480
#> 19: 2020-04-03    4653         4668  2020-04-11  4695       0    24     4656
#> 20: 2020-04-04    4554         4585  2020-04-11  4623       1    39     4560
#> 21: 2020-04-05    4741         4805  2020-04-11  4850       1    59     4753
#> 22: 2020-04-06    4208         4316  2020-04-11  4348       2    72     4227
#> 23: 2020-04-07    3431         3599  2020-04-11  3594       2    80     3458
#> 24: 2020-04-08    2779         3039  2020-04-11  2971       2    87     2821
#> 25: 2020-04-09    3234         3836  2020-04-11  3579       4   138     3339
#> 26: 2020-04-10    2993         4204  2020-04-11  3571       6   183     3261
#> 27: 2020-04-11    1848         3951  2020-04-11  4663       9   285     4185
#> 28: 2020-04-08    3036         3039  2020-04-16  3063       0    15     3038
#> 29: 2020-04-09    3829         3836  2020-04-16  3887       1    32     3834
#> 30: 2020-04-10    4187         4204  2020-04-16  4283       1    52     4197
#> 31: 2020-04-11    3916         3951  2020-04-16  4046       2    67     3934
#> 32: 2020-04-12    4605         4686  2020-04-16  4824       3   107     4641
#> 33: 2020-04-13    3925         4074  2020-04-16  4197       4   123     3984
#> 34: 2020-04-14    2873         3124  2020-04-16  3179       4   123     2966
#> 35: 2020-04-15    2393         2919  2020-04-16  2855       4   146     2607
#> 36: 2020-04-16    1512         2580  2020-04-16  3815       7   233     3424
#> 37: 2020-04-13    4074         4074  2020-04-21  4111       0    21     4077
#> 38: 2020-04-14    3124         3124  2020-04-21  3171       0    26     3128
#> 39: 2020-04-15    2919         2919  2020-04-21  2986       1    36     2926
#> 40: 2020-04-16    2580         2580  2020-04-21  2666       1    44     2592
#> 41: 2020-04-17    3563         3563  2020-04-21  3733       2    83     3591
#> 42: 2020-04-18    3126         3126  2020-04-21  3343       3    98     3173
#> 43: 2020-04-19    2843         2843  2020-04-21  3146       4   121     2935
#> 44: 2020-04-20    2051         2051  2020-04-21  2447       4   125     2235
#> 45: 2020-04-21     962          962  2020-04-21  2427       4   148     2178
#>           date confirm last_confirm report_date  mean se_mean    sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>        <int>    <int>  <int>    <int>    <int>    <int>
#>  1:     5218     5229   5236     5244     5257     5285
#>  2:     5147     5166   5177     5190     5210     5255
#>  3:     6007     6039   6058     6078     6111     6186
#>  4:     5684     5726   5751     5776     5818     5913
#>  5:     5455     5509   5542     5573     5627     5743
#>  6:     4386     4441   4476     4509     4565     4684
#>  7:     2880     2927   2957     2984     3033     3128
#>  8:     1944     1985   2009     2034     2076     2157
#>  9:        0        0      0        0        0        0
#> 10:     6001     6014   6022     6031     6046     6078
#> 11:     5255     5274   5286     5299     5319     5365
#> 12:     4091     4113   4126     4140     4162     4213
#> 13:     4105     4135   4152     4171     4201     4270
#> 14:     4844     4893   4921     4949     4997     5100
#> 15:     4701     4759   4797     4832     4893     5020
#> 16:     4515     4588   4636     4678     4754     4903
#> 17:     4469     4563   4618     4676     4773     4960
#> 18:     5888     6034   6122     6212     6353     6664
#> 19:     4677     4687   4693     4701     4712     4737
#> 20:     4595     4611   4621     4632     4650     4690
#> 21:     4807     4833   4848     4864     4890     4951
#> 22:     4296     4328   4347     4366     4398     4469
#> 23:     3539     3574   3595     3615     3650     3725
#> 24:     2914     2950   2973     2995     3033     3111
#> 25:     3491     3548   3585     3617     3676     3791
#> 26:     3462     3534   3577     3621     3697     3842
#> 27:     4496     4608   4674     4744     4851     5089
#> 28:     3051     3058   3062     3067     3074     3091
#> 29:     3863     3877   3885     3895     3910     3944
#> 30:     4245     4268   4281     4296     4319     4372
#> 31:     3998     4028   4045     4063     4093     4159
#> 32:     4750     4797   4825     4852     4900     5000
#> 33:     4116     4167   4200     4231     4284     4395
#> 34:     3101     3152   3185     3214     3266     3368
#> 35:     2768     2826   2860     2895     2956     3071
#> 36:     3679     3770   3824     3881     3969     4163
#> 37:     4095     4104   4109     4116     4126     4148
#> 38:     3152     3163   3170     3177     3190     3217
#> 39:     2960     2975   2985     2995     3011     3048
#> 40:     2634     2653   2665     2677     2696     2740
#> 41:     3675     3711   3733     3754     3791     3869
#> 42:     3278     3319   3345     3369     3412     3500
#> 43:     3069     3119   3151     3180     3232     3333
#> 44:     2372     2422   2451     2482     2533     2632
#> 45:     2340     2398   2433     2469     2525     2649
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


options(old_opts)