Analysing infectious disease transmission chains with bpmodels

Note

bpmodels is now retired and will no longer be maintained. We recommend using {epichains} instead. If you need help converting your code to use epichains, please open a discussion on epichains.

bpmodels provides methods to analyse and simulate the size and length of branching processes with an arbitrary offspring distribution. These can be used, for example, to analyse the distribution of chain sizes or length of infectious disease outbreaks, as discussed in Farrington, Kanaan, and Gay (2003) and Blumberg and Lloyd-Smith (2013).

Quick start

library("bpmodels")

`chain_ll()`: calculate log-likelihoods

The chain_ll() function calculates the log-likelihood of a distribution of chain sizes or lengths given an offspring distribution and its associated parameters.

For example, if we have observed a distribution of chains of sizes 1, 1, 4, 7, we can calculate the log-likelihood of this observed chain by assuming the offspring per generation is Poisson distributed with a mean number (which can be interpreted as the reproduction number, R_0) of 0.5.

To do this, we run

chain_sizes <- c(1, 1, 4, 7) # example of observed chain sizes
chain_ll(x = chain_sizes, offspring = "pois", stat = "size", lambda = 0.5)
#> [1] -8.607196

The first argument of chain_ll() is the chain size (or length, in number of generations that a chain lasted) distribution to analyse.

The second argument, offspring, specifies the offspring distribution. This is specified as a string of the name of the function used to generate random offspring. It can be any probability distribution implemented in R, that is, one that has a corresponding function for generating random numbers beginning with the letter r. In the case of the example above, since random Poisson numbers are generated in R using a function called rpois(), the string to pass to the offspring argument is "pois".

The third argument, stat, determines whether to analyse chain sizes ("size", the default if this argument is not specified) or lengths ("length"). Lastly, any named arguments not recognised by chain_ll() are interpreted as parameters of the corresponding probability distribution, here lambda = 0.5 as the mean of the Poisson distribution (see the R help page for the Poisson distribution for more information).

Imperfect observations

By default, chain_ll() assumes perfect observation, where obs_prob = 1 (See ?chain_ll), meaning that all transmission events are observed and recorded in the data. If observations are imperfect, chain_ll() provides the argument, obs_prob, for specifying the probability of observation. This probability is used to determine the likelihood of observing the specified chain sizes or lengths. In the case of imperfect observation, true chain sizes or lengths are simulated repeatedly (the number of times given by the nsim_obs argument), and the likelihood calculated for each of these simulations.

For example, if the probability of observing each case is obs_prob = 0.30, we use

chain_sizes <- c(1, 1, 4, 7) # example of observed chain sizes
ll <- chain_ll(chain_sizes, "pois", "size", obs_prob = 0.3, lambda = 0.5,
               nsim_obs = 10)
ll
#>  [1] -20.47284 -22.52364 -26.77325 -24.33340 -21.86479 -22.11361 -32.01951
#>  [8] -19.77477 -22.90776 -27.59180

This returns 10 likelihood values (because nsim_obs = 10), which can be averaged to come up with an overall likelihood estimate.

To find out about the usage of the chain_ll() function, you can run ?chain_ll to access its R help file.

How `chain_ll()` works

If the probability distribution of chain sizes or lengths has an analytical solution, this will be used. chain_ll() currently supports the Poisson and negative binomial size distribution and the Poisson and geometric length distribution.

If an analytical solution does not exist, simulations are used to approximate this probability distributions (using a linear approximation to the cumulative distribution for unobserved sizes/lengths). In that case, an extra argument nsim_offspring must be passed to chain_ll() to specify the number of simulations to be used for this approximation.

For example, to get offspring drawn from a binomial distribution with probability prob = 0.5, we run

set.seed(12)
chain_sizes <- c(1, 1, 4, 7) # example of observed chain sizes
chain_ll(chain_sizes, "binom", "size", size = 1, prob = 0.5,
         nsim_offspring = 100)
#> [1] -8.496803

`chain_sim()`: simulate transmission chains with branching processes

To simulate a branching process, we use the chain_sim() function. This function follows the same syntax as chain_ll(). Note that chain_sim() is stochastic, so a seed needs to be set to ensure that the results can be reproduced.

Below, we are simulating 5 chains, assuming the offspring are generated using a Poisson distribution with mean, lambda = 0.5. By default, chain_sim() returns a vector of chain sizes/lengths. If we instead want to return a tree of infectees and infectors, we need to specify a function for the serial interval and set tree = TRUE (see next section).

set.seed(12)
chain_sim(n = 5, offspring = "pois", stat = "size", lambda = 0.5)
#> [1] 1 2 4 1 1

Simulating trees

To simulate a tree of transmission chains, we specify the serial interval generation function (serial_interval()) and set tree = TRUE as follows:

set.seed(12)
serial_interval <- function(n) {
  rlnorm(n, meanlog = 0.58, sdlog = 1.58)
}
chains_df <- chain_sim(
  n = 5, offspring = "pois", lambda = 0.5, stat = "length",
  infinite = 100, serial = serial_interval, tree = TRUE
)
head(chains_df)
#>   n id ancestor generation       time
#> 1 1  1       NA          1 0.00000000
#> 2 2  1       NA          1 0.00000000
#> 3 3  1       NA          1 0.00000000
#> 4 4  1       NA          1 0.00000000
#> 5 5  1       NA          1 0.00000000
#> 6 2  2        1          2 0.09968906

References

Blumberg, S., and J. O. Lloyd-Smith. 2013. “Comparing Methods for Estimating R0 from the Size Distribution of Subcritical Transmission Chains.” Epidemics 5 (3): 131–45. https://doi.org/10.1016/j.epidem.2013.05.002.

Farrington, C. P., M. N. Kanaan, and N. J. Gay. 2003. “Branching Process Models for Surveillance of Infectious Diseases Controlled by Mass Vaccination.” Biostatistics (Oxford, England) 4 (2): 279–95. https://doi.org/10.1093/biostatistics/4.2.279.

Sebastian Funk, James Azam