Approximate Sampling a Distribution using Counts

sample_approx_dist(
  cases = NULL,
  dist_fn = NULL,
  max_value = 120,
  earliest_allowed_mapped = NULL,
  direction = "backwards",
  type = "sample",
  truncate_future = TRUE
)

Arguments

cases

A dataframe of cases (in date order) with the following variables: date and cases.

dist_fn

Function that takes two arguments with the first being numeric and the second being logical (and defined as dist). Should return the probability density or a sample from the defined distribution. See the examples for more.

max_value

Numeric, maximum value to allow. Defaults to 120 days

earliest_allowed_mapped

A character string representing a date ("2020-01-01"). Indicates the earlies allowed mapped value.

direction

Character string, defato "backwards". Direction in which to map cases. Supports either "backwards" or "forwards".

type

Character string indicating the method to use to transfrom counts. Supports either "sample" which approximates sampling or "median" would shift by the median of the distribution.

truncate_future

Logical, should cases be truncted if they occur after the first date reported in the data. Defaults to TRUE.

Value

A data.table of cases by date of onset

Examples

cases <- data.table::as.data.table(EpiSoon::example_obs_cases) cases <- cases[, cases := as.integer(cases)] ## Reported case distribution print(cases)
#> cases date #> 1: 1 2020-01-20 #> 2: 0 2020-01-21 #> 3: 1 2020-01-22 #> 4: 0 2020-01-23 #> 5: 0 2020-01-24 #> 6: 0 2020-01-25 #> 7: 1 2020-01-26 #> 8: 0 2020-01-27 #> 9: 0 2020-01-28 #> 10: 0 2020-01-29 #> 11: 0 2020-01-30 #> 12: 1 2020-01-31 #> 13: 1 2020-02-01 #> 14: 1 2020-02-02 #> 15: 1 2020-02-03 #> 16: 1 2020-02-04 #> 17: 1 2020-02-05 #> 18: 1 2020-02-06 #> 19: 1 2020-02-07 #> 20: 1 2020-02-08 #> 21: 1 2020-02-09 #> 22: 1 2020-02-10 #> 23: 1 2020-02-11 #> 24: 1 2020-02-12 #> 25: 1 2020-02-13 #> 26: 1 2020-02-14 #> 27: 1 2020-02-15 #> 28: 2 2020-02-16 #> 29: 2 2020-02-17 #> 30: 2 2020-02-18 #> 31: 3 2020-02-19 #> 32: 3 2020-02-20 #> 33: 4 2020-02-21 #> 34: 6 2020-02-22 #> 35: 7 2020-02-23 #> 36: 9 2020-02-24 #> 37: 11 2020-02-25 #> 38: 14 2020-02-26 #> 39: 18 2020-02-27 #> 40: 21 2020-02-28 #> 41: 26 2020-02-29 #> 42: 31 2020-03-01 #> 43: 37 2020-03-02 #> 44: 45 2020-03-03 #> 45: 54 2020-03-04 #> 46: 63 2020-03-05 #> 47: 73 2020-03-06 #> 48: 88 2020-03-07 #> 49: 102 2020-03-08 #> 50: 116 2020-03-09 #> 51: 141 2020-03-10 #> 52: 167 2020-03-11 #> 53: 194 2020-03-12 #> 54: 208 2020-03-13 #> 55: 251 2020-03-14 #> 56: 273 2020-03-15 #> 57: 266 2020-03-16 #> 58: 296 2020-03-17 #> 59: 343 2020-03-18 #> 60: 399 2020-03-19 #> 61: 454 2020-03-20 #> 62: 605 2020-03-21 #> 63: 367 2020-03-22 #> cases date
## Total cases sum(cases$cases)
#> [1] 4720
delay_fn <- function(n, dist, cum) { if(dist) { pgamma(n + 0.9999, 2, 1) - pgamma(n - 1e-5, 2, 1) }else{ as.integer(rgamma(n, 2, 1)) } } onsets <- sample_approx_dist(cases = cases, dist_fn = delay_fn) ## Estimated onset distribution print(onsets)
#> date cases #> 1: 2020-01-19 1 #> 2: 2020-01-20 1 #> 3: 2020-01-21 0 #> 4: 2020-01-22 0 #> 5: 2020-01-23 0 #> 6: 2020-01-24 0 #> 7: 2020-01-25 0 #> 8: 2020-01-26 0 #> 9: 2020-01-27 1 #> 10: 2020-01-28 0 #> 11: 2020-01-29 1 #> 12: 2020-01-30 1 #> 13: 2020-01-31 2 #> 14: 2020-02-01 1 #> 15: 2020-02-02 1 #> 16: 2020-02-03 0 #> 17: 2020-02-04 1 #> 18: 2020-02-05 0 #> 19: 2020-02-06 1 #> 20: 2020-02-07 0 #> 21: 2020-02-08 2 #> 22: 2020-02-09 1 #> 23: 2020-02-10 0 #> 24: 2020-02-11 1 #> 25: 2020-02-12 1 #> 26: 2020-02-13 1 #> 27: 2020-02-14 1 #> 28: 2020-02-15 2 #> 29: 2020-02-16 1 #> 30: 2020-02-17 4 #> 31: 2020-02-18 4 #> 32: 2020-02-19 3 #> 33: 2020-02-20 3 #> 34: 2020-02-21 6 #> 35: 2020-02-22 7 #> 36: 2020-02-23 8 #> 37: 2020-02-24 10 #> 38: 2020-02-25 18 #> 39: 2020-02-26 23 #> 40: 2020-02-27 28 #> 41: 2020-02-28 28 #> 42: 2020-02-29 38 #> 43: 2020-03-01 41 #> 44: 2020-03-02 50 #> 45: 2020-03-03 55 #> 46: 2020-03-04 70 #> 47: 2020-03-05 80 #> 48: 2020-03-06 108 #> 49: 2020-03-07 101 #> 50: 2020-03-08 138 #> 51: 2020-03-09 132 #> 52: 2020-03-10 162 #> 53: 2020-03-11 210 #> 54: 2020-03-12 213 #> 55: 2020-03-13 249 #> 56: 2020-03-14 288 #> 57: 2020-03-15 277 #> 58: 2020-03-16 330 #> 59: 2020-03-17 352 #> 60: 2020-03-18 401 #> 61: 2020-03-19 402 #> 62: 2020-03-20 380 #> 63: 2020-03-21 286 #> 64: 2020-03-22 104 #> date cases
## Check that sum is equal to reported cases total_onsets <- median( purrr::map_dbl(1:1000, ~ sum(sample_approx_dist(cases = cases, dist_fn = delay_fn)$cases))) total_onsets
#> [1] 4716
## Map from onset cases to reported reports <- sample_approx_dist(cases = cases, dist_fn = delay_fn, direction = "forwards") ## Map from onset cases to reported using a mean shift reports <- sample_approx_dist(cases = cases, dist_fn = delay_fn, direction = "forwards", type = "median")