Analysing infectious disease transmission chains with bpmodels
Sebastian Funk, James Azam
Source:vignettes/bpmodels.Rmd
bpmodels.Rmd
Note
bpmodels is now retired and will no longer be maintained. We recommend using
{epichains}
instead. If you need help converting your code to use{epichains}
, please open a discussion on epichains.
bpmodels provides methods to analyse and simulate the size and length of branching processes with an arbitrary offspring distribution. These can be used, for example, to analyse the distribution of chain sizes or length of infectious disease outbreaks, as discussed in Farrington, Kanaan, and Gay (2003) and Blumberg and Lloyd-Smith (2013).
Quick start
chain_ll()
: calculate log-likelihoods
The chain_ll()
function calculates the log-likelihood of
a distribution of chain sizes or lengths given an offspring distribution
and its associated parameters.
For example, if we have observed a distribution of chains of sizes 1, 1, 4, 7, we can calculate the log-likelihood of this observed chain by assuming the offspring per generation is Poisson distributed with a mean number (which can be interpreted as the reproduction number, R_0) of 0.5.
To do this, we run
chain_sizes <- c(1, 1, 4, 7) # example of observed chain sizes
chain_ll(x = chain_sizes, offspring = "pois", stat = "size", lambda = 0.5)
#> [1] -8.607196
The first argument of chain_ll()
is the chain size (or
length, in number of generations that a chain lasted) distribution to
analyse.
The second argument, offspring
, specifies the offspring
distribution. This is specified as a string of the name of the function
used to generate random offspring. It can be any probability
distribution implemented in R
, that is, one that has a
corresponding function for generating random numbers beginning with the
letter r
. In the case of the example above, since random
Poisson numbers are generated in R
using a function called
rpois()
, the string to pass to the offspring
argument is "pois"
.
The third argument, stat
, determines whether to analyse
chain sizes ("size"
, the default if this argument is not
specified) or lengths ("length"
). Lastly, any named
arguments not recognised by chain_ll()
are interpreted as
parameters of the corresponding probability distribution, here
lambda = 0.5
as the mean of the Poisson distribution (see
the R
help page for the Poisson
distribution for more information).
Imperfect observations
By default, chain_ll()
assumes perfect observation,
where obs_prob = 1
(See ?chain_ll
), meaning
that all transmission events are observed and recorded in the data. If
observations are imperfect, chain_ll()
provides the
argument, obs_prob
, for specifying the probability of
observation. This probability is used to determine the likelihood of
observing the specified chain sizes or lengths. In the case of imperfect
observation, true chain sizes or lengths are simulated repeatedly (the
number of times given by the nsim_obs
argument), and the
likelihood calculated for each of these simulations.
For example, if the probability of observing each case is
obs_prob = 0.30
, we use
chain_sizes <- c(1, 1, 4, 7) # example of observed chain sizes
ll <- chain_ll(chain_sizes, "pois", "size", obs_prob = 0.3, lambda = 0.5,
nsim_obs = 10)
ll
#> [1] -20.47284 -22.52364 -26.77325 -24.33340 -21.86479 -22.11361 -32.01951
#> [8] -19.77477 -22.90776 -27.59180
This returns 10
likelihood values (because
nsim_obs = 10
), which can be averaged to come up with an
overall likelihood estimate.
To find out about the usage of the chain_ll()
function,
you can run ?chain_ll
to access its R
help
file.
How chain_ll()
works
If the probability distribution of chain sizes or lengths has an
analytical solution, this will be used. chain_ll()
currently supports the Poisson and negative binomial size distribution
and the Poisson and geometric length distribution.
If an analytical solution does not exist, simulations are used to
approximate this probability distributions (using
a linear approximation to the cumulative distribution for unobserved
sizes/lengths). In that case, an extra argument
nsim_offspring
must be passed to chain_ll()
to
specify the number of simulations to be used for this approximation.
For example, to get offspring drawn from a binomial distribution with
probability prob = 0.5
, we run
chain_sim()
: simulate transmission chains with
branching processes
To simulate a branching process, we use the chain_sim()
function. This function follows the same syntax as
chain_ll()
. Note that chain_sim()
is
stochastic, so a seed needs to be set to ensure that the results can be
reproduced.
Below, we are simulating 5 chains,
assuming the offspring are generated using a Poisson distribution with
mean, lambda = 0.5
. By default, chain_sim()
returns a vector of chain sizes/lengths. If we instead want to return a
tree of infectees and infectors, we need to specify a function for the
serial interval and set tree = TRUE
(see next section).
Simulating trees
To simulate a tree of transmission chains, we specify the serial
interval generation function (serial_interval()
) and set
tree = TRUE
as follows:
set.seed(12)
serial_interval <- function(n) {
rlnorm(n, meanlog = 0.58, sdlog = 1.58)
}
chains_df <- chain_sim(
n = 5, offspring = "pois", lambda = 0.5, stat = "length",
infinite = 100, serial = serial_interval, tree = TRUE
)
head(chains_df)
#> n id ancestor generation time
#> 1 1 1 NA 1 0.00000000
#> 2 2 1 NA 1 0.00000000
#> 3 3 1 NA 1 0.00000000
#> 4 4 1 NA 1 0.00000000
#> 5 5 1 NA 1 0.00000000
#> 6 2 2 1 2 0.09968906