Overview
estimate_dist() fits a parametric delay distribution
from interval-censored linelist data via MCMC. The model accounts for
double interval censoring of the primary and secondary events, right
truncation due to a finite observation period, and optional left
truncation. The likelihood is provided by the primarycensored
package[1]; see its
documentation for derivations and a fuller account of why these
adjustments are needed[2,3].
The conceptual motivation in EpiNow2 is covered in
vignette("delays"), and a worked example in
vignette("estimate_dist_workflow"). For analyses needing
time-varying delays, partial pooling across strata, or regression on
covariates, epidist builds
on the same likelihood and is the recommended next step.
Data and notation
Each observation is a record of a primary event (e.g. infection) and a secondary event (e.g. symptom onset). Both events are observed only as intervals. Let be the primary event window and be the secondary event window. We refer to as the primary window width and to as the secondary window width. At daily resolution both are typically equal to one.
The observed delay interval is , the difference between the secondary window and the start of the primary window. Each observation also has a relative observation time and an optional left truncation point , both measured from . is the time at which the data were extracted relative to the primary event start and acts as the right truncation point. Per-observation weights allow identical combinations to be aggregated; this affects only computational efficiency, not the likelihood.
The continuous, unobserved delay between the true primary event time and the true secondary event time is assumed to follow a parametric distribution with cumulative distribution function , where is the parameter vector. We denote the primary event distribution within its window by . The model treats as unknown and as known.
Likelihood
Continuous formulation
If has density on , the distribution of the delay from the primary event to a fixed reference point (e.g. ) is the convolution of with the underlying delay distribution. The corresponding cumulative distribution function adjusted for primary event censoring is
Without primary censoring the distribution of the discrete observed
delay would simply be obtained by differencing
at the secondary window boundaries. With primary censoring,
is replaced by
throughout. For uniform
this can be evaluated analytically for the lognormal, gamma, and Weibull
delay distributions; for other combinations the integral is evaluated
numerically. The vendored Stan code dispatches to either branch through
check_for_analytical() in
primarycensored.stan.
Truncation
The model handles right truncation at and optional left truncation at by renormalising :
for and zero outside the interval. may be set to to indicate no right truncation; in that case the denominator reduces to . defaults to and is not currently exposed at the R interface.
Discrete observation likelihood
The probability mass for observation is the difference of the truncated, primary-censored CDF at the secondary window boundaries:
The aggregated log-likelihood used in estimate_dist.stan
is
implemented as the call to primarycensored_lpmf() inside
the model block.
Untruncated approximation
When
is much larger than the largest observed delay, the right truncation
factor is numerically indistinguishable from one and the renormalisation
in the equation above can be skipped. estimate_dist()
applies this approximation by setting
to
whenever
,
where
is the obs_time_threshold argument and defaults to
following the same heuristic used by epidist. Setting
obs_time_threshold = Inf disables the approximation.
Primary event distribution
Two primary event distributions are supported:
- Uniform (default): is uniform on . This is appropriate for daily reporting where the within-day timing of the primary event is unknown and assumed equally likely at any point in the window.
-
Exponential growth:
is proportional to
on
,
with growth rate
supplied by the user via
primary_params. This adjusts for the bias introduced when the primary event rate is changing rapidly within the window, for example in the early phase of an outbreak[2]. The growth rate is treated as fixed data and is not estimated.
The primary event parameters are passed to Stan in the
primary_id and primary_params data fields and
are not part of
.
Delay families and parameterisations
The model supports five delay families, selected by the
dist argument. Parameter names match the natural
parameterisation used elsewhere in EpiNow2.
dist |
Parameters | Density support |
|---|---|---|
"lognormal" |
meanlog, sdlog
|
|
"gamma" |
shape, rate
|
|
"normal" |
mean, sd
|
|
"exp" |
rate |
|
"weibull" |
shape, scale
|
For "lognormal", "gamma", and
"weibull" with a uniform primary, the primary-censored CDF
has a closed form and is computed analytically. The other combinations
use numerical integration over the primary window.
Priors
Priors are passed via the priors argument as a named
list of dist_spec objects, with names matching the
parameter names of the chosen family. Each prior is converted into a
Normal() density on the parameter and applied
independently. Lower bounds are imposed for parameters that must be
positive (e.g. sdlog, rate,
shape, scale); the truncation is handled by
params_lp() in
inst/stan/functions/params.stan.
The default priors are family-specific:
These defaults are deliberately wide and are intended as starting
points rather than fits to any particular delay. Users are encouraged to
set priors that reflect domain knowledge about the delay being
estimated; see vignette("estimate_dist_workflow") for
guidance on translating prior beliefs into the lognormal
parameterisation.
