Skip to contents

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow, epinow, and estimate_infections to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

  • Current truncation is related to past truncation.

  • Truncation is a multiplicative scaling of underlying reported cases.

  • Truncation is log normally distributed.

Usage

estimate_truncation(
  obs,
  max_truncation,
  trunc_max = 10,
  trunc_dist = c("lognormal"),
  model = NULL,
  CrIs = c(0.2, 0.5, 0.9),
  verbose = TRUE,
  ...
)

Arguments

obs

A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Deprecated; use trunc_max instead.

trunc_max

Integer, defaults to 10. Maximum number of days to include in the truncation distribution.

trunc_dist

Character, defaults to "lognormal". The parametric distribution to be used for truncation.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

CrIs

Numeric vector of credible intervals to calculate.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling.

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a data frame containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a data frame containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Author

Sam Abbott

Examples

# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# get example case counts
reported_cases <- example_confirmed[1:60]

# define example truncation distribution (note not integer adjusted)
trunc <- list(
  mean = convert_to_logmean(3, 2),
  mean_sd = 0.1,
  sd = convert_to_logsd(3, 2),
  sd_sd = 0.1,
  max = 10
)

# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
  set.seed(index)
  cmf <- cumsum(
    dlnorm(
      1:(dist$max + 1),
      rnorm(1, dist$mean, dist$mean_sd),
      rnorm(1, dist$sd, dist$sd_sd)
    )
  )
  cmf <- cmf / cmf[dist$max + 1]
  cmf <- rev(cmf)[-1]
  trunc_cases <- data.table::copy(cases)[1:(.N - index)]
  trunc_cases[(.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)]
  return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
  construct_truncation,
  cases = reported_cases,
  dist = trunc
)

# fit model to example data
est <- estimate_truncation(example_data,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# summary of the distribution
est$dist
#> $mean
#> [1] 0.745
#> 
#> $mean_sd
#> [1] 0.049
#> 
#> $sd
#> [1] 0.841
#> 
#> $sd_sd
#> [1] 0.058
#> 
#> $max
#> [1] 10
#> 
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
#>     index      mean      se_mean           sd  lower_90  lower_50  lower_20
#>  1:     1 1.0000000 2.323258e-18 1.016578e-16 1.0000000 1.0000000 1.0000000
#>  2:     2 0.9895688 9.306199e-05 2.598503e-03 0.9849659 0.9879570 0.9890751
#>  3:     3 0.9749787 2.037676e-04 5.702845e-03 0.9649637 0.9714167 0.9737968
#>  4:     4 0.9540969 3.332207e-04 9.351138e-03 0.9379148 0.9481680 0.9520518
#>  5:     5 0.9234165 4.790112e-04 1.348741e-02 0.9004103 0.9147455 0.9202467
#>  6:     6 0.8769760 6.296617e-04 1.781110e-02 0.8472604 0.8654910 0.8726362
#>  7:     7 0.8043144 7.520740e-04 2.144412e-02 0.7695263 0.7902834 0.7985765
#>  8:     8 0.6867509 7.661309e-04 2.231039e-02 0.6515176 0.6719414 0.6807497
#>  9:     9 0.4926862 5.222733e-04 1.709335e-02 0.4654699 0.4813277 0.4880190
#> 10:    10 0.1941655 2.602055e-04 1.199601e-02 0.1759491 0.1860716 0.1905802
#>        median  upper_20  upper_50  upper_90
#>  1: 1.0000000 1.0000000 1.0000000 1.0000000
#>  2: 0.9897168 0.9903550 0.9913599 0.9935947
#>  3: 0.9752392 0.9766642 0.9788681 0.9839588
#>  4: 0.9543179 0.9567695 0.9604400 0.9690790
#>  5: 0.9235168 0.9271353 0.9325527 0.9454211
#>  6: 0.8767923 0.8816532 0.8886413 0.9063534
#>  7: 0.8037239 0.8094892 0.8186491 0.8400298
#>  8: 0.6859477 0.6914729 0.7015577 0.7232533
#>  9: 0.4923436 0.4964657 0.5035816 0.5215085
#> 10: 0.1936145 0.1962295 0.2016757 0.2153953
# observations linked to truncation adjusted estimates
print(est$obs)
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>  1: 2020-03-24    4751         4789  2020-04-01 4801       0  12     4781
#>  2: 2020-03-25    5163         5249  2020-04-01 5295       1  31     5247
#>  3: 2020-03-26    5049         5210  2020-04-01 5292       1  52     5210
#>  4: 2020-03-27    5804         6153  2020-04-01 6286       3  92     6139
#>  5: 2020-03-28    5345         5959  2020-04-01 6097       4 124     5897
#>  6: 2020-03-29    4860         5974  2020-04-01 6046       5 161     5785
#>  7: 2020-03-30    3474         5217  2020-04-01 5063       5 164     4803
#>  8: 2020-03-31    1721         4050  2020-04-01 3497       3 121     3300
#>  9: 2020-04-01     502         4053  2020-04-01 2595       3 158     2330
#> 10: 2020-03-29    5852         5974  2020-04-06 5913       0  15     5889
#> 11: 2020-03-30    5025         5217  2020-04-06 5154       1  30     5106
#> 12: 2020-03-31    3808         4050  2020-04-06 3991       1  39     3929
#> 13: 2020-04-01    3676         4053  2020-04-06 3981       2  58     3888
#> 14: 2020-04-02    4105         4782  2020-04-06 4682       3  95     4529
#> 15: 2020-04-03    3668         4668  2020-04-06 4563       4 121     4366
#> 16: 2020-04-04    3093         4585  2020-04-06 4508       5 146     4276
#> 17: 2020-04-05    2423         4805  2020-04-06 4923       5 170     4646
#> 18: 2020-04-06    1106         4316  2020-04-06 5717       7 349     5134
#> 19: 2020-04-03    4636         4668  2020-04-11 4684       0  12     4665
#> 20: 2020-04-04    4522         4585  2020-04-11 4638       0  27     4595
#> 21: 2020-04-05    4681         4805  2020-04-11 4906       1  48     4830
#> 22: 2020-04-06    4115         4316  2020-04-11 4457       2  65     4352
#> 23: 2020-04-07    3298         3599  2020-04-11 3762       2  76     3638
#> 24: 2020-04-08    2580         3039  2020-04-11 3209       3  85     3071
#> 25: 2020-04-09    2788         3836  2020-04-11 4063       4 131     3854
#> 26: 2020-04-10    2156         4204  2020-04-11 4381       4 152     4134
#> 27: 2020-04-11     789         3951  2020-04-11 4078       5 249     3663
#> 28: 2020-04-13    4050         4050  2020-04-21 4092       0  10     4076
#> 29: 2020-04-14    3089         3089  2020-04-21 3168       0  18     3139
#> 30: 2020-04-15    2861         2861  2020-04-21 2998       1  29     2952
#> 31: 2020-04-16    2491         2491  2020-04-21 2698       1  39     2634
#> 32: 2020-04-17    3350         3350  2020-04-21 3821       2  77     3696
#> 33: 2020-04-18    2793         2793  2020-04-21 3474       3  92     3324
#> 34: 2020-04-19    2284         2284  2020-04-21 3329       3 108     3157
#> 35: 2020-04-20    1291         1291  2020-04-21 2623       2  91     2475
#> 36: 2020-04-21     302          302  2020-04-21 1561       2  95     1402
#>           date confirm last_confirm report_date mean se_mean  sd lower_90
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
#>  1:     4792     4797   4800     4803     4808     4823
#>  2:     5274     5286   5294     5301     5314     5350
#>  3:     5256     5277   5290     5303     5325     5383
#>  4:     6223     6260   6284     6307     6344     6445
#>  5:     6014     6062   6096     6125     6175     6308
#>  6:     5936     6003   6046     6085     6149     6315
#>  7:     4951     5024   5064     5103     5170     5332
#>  8:     3417     3466   3495     3526     3575     3697
#>  9:     2489     2558   2592     2634     2697     2853
#> 10:     5903     5908   5912     5916     5923     5941
#> 11:     5133     5145   5152     5160     5172     5207
#> 12:     3964     3980   3990     3999     4016     4060
#> 13:     3941     3964   3980     3994     4018     4082
#> 14:     4619     4656   4681     4704     4742     4845
#> 15:     4480     4531   4563     4593     4641     4766
#> 16:     4408     4473   4509     4543     4603     4747
#> 17:     4811     4880   4921     4964     5033     5205
#> 18:     5484     5636   5712     5803     5943     6285
#> 19:     4676     4681   4684     4687     4692     4706
#> 20:     4619     4630   4636     4643     4655     4686
#> 21:     4873     4892   4905     4916     4936     4990
#> 22:     4412     4438   4455     4471     4498     4570
#> 23:     3711     3740   3761     3779     3810     3892
#> 24:     3151     3187   3210     3230     3264     3352
#> 25:     3974     4031   4064     4095     4149     4279
#> 26:     4281     4342   4379     4417     4479     4631
#> 27:     3912     4020   4075     4139     4240     4484
#> 28:     4085     4089   4092     4094     4099     4111
#> 29:     3155     3162   3167     3172     3179     3201
#> 30:     2978     2990   2997     3005     3017     3050
#> 31:     2671     2686   2697     2706     2723     2766
#> 32:     3769     3799   3820     3838     3870     3953
#> 33:     3411     3450   3475     3497     3534     3629
#> 34:     3255     3303   3329     3355     3399     3505
#> 35:     2563     2600   2622     2645     2682     2773
#> 36:     1497     1539   1559     1584     1623     1716
#>     lower_50 lower_20 median upper_20 upper_50 upper_90
# validation plot of observations vs estimates
plot(est)


options(old_opts)