Analysis walkthrough
The Epuyén 2018–19 outbreak in north-west Patagonia was the first cluster where person-to-person Andes hantavirus transmission was documented at scale. The line list bundled with this package is hand-encoded from Table S2 of Martínez et al. 2020 — 34 cases with exposure windows, symptom-onset dates, and attributed source cases.
This page fits the joint model in TransmissionLinelist.jl to that line list and renders the headline outputs. Four quantities are estimated together: the incubation period, the transmission timing of each secondary relative to its source's symptom onset (δ), a weekly time-varying reproduction number R(t), and the offspring dispersion k of a Negative-Binomial. Exposure and onset dates are interval-censored. The model handles that by giving each case a continuous latent infection time and a continuous latent onset time, each sampled within its recorded window. Generation interval and serial interval are derived in post-processing as the transmission timing plus an incubation period (the source's for GI, the secondary's for SI). Fitting all four jointly propagates uncertainty between them that a delay-then-R(t) pipeline would lose.
Priors, the data-augmentation construction, and per-pair GI > 0 constraint are detailed on the Model page. Caveats around exposure encoding, late R(t) bins, and right-truncation are on the Limitations page.
using TransmissionLinelist
using Chain
using DataFrames
using DataFramesMeta
using FlexiChains
using Printf
using Random
using CairoMakie
using AlgebraOfGraphics
using PairPlots
Random.seed!(20260508)Random.TaskLocalRNG()Load the line list
load_linelist parses the bundled CSV and drops the _alt sensitivity rows. build_data re-encodes exposure / onset windows as day offsets from t0 (60 days before the first onset); bin_edges_day returns the weekly R(t) knot dates as day offsets.
ll = load_linelist()
d = build_data(ll)
edges = bin_edges_day(d.t0)
model = joint_model(d, edges)
@chain ll begin
@select(:patient_id, :exposure_lower, :exposure_upper,
:onset_date, :source_case, :Z)
first(8)
end| Row | patient_id | exposure_lower | exposure_upper | onset_date | source_case | Z |
|---|---|---|---|---|---|---|
| String | Date? | Date? | Date | String15? | Int64 | |
| 1 | 1 | missing | missing | 2018-11-03 | missing | 5 |
| 2 | 2 | 2018-11-03 | 2018-11-03 | 2018-11-23 | 1 | 6 |
| 3 | 3 | 2018-11-03 | 2018-11-03 | 2018-11-20 | 1 | 0 |
| 4 | 4 | 2018-11-03 | 2018-11-03 | 2018-11-27 | 1 | 0 |
| 5 | 5 | 2018-11-03 | 2018-11-03 | 2018-11-26 | 1 | 1 |
| 6 | 6 | 2018-11-03 | 2018-11-03 | 2018-11-25 | 1 | 0 |
| 7 | 7 | 2018-11-23 | 2018-11-23 | 2018-12-07 | 2 | 0 |
| 8 | 8 | 2018-11-23 | 2018-11-23 | 2018-12-13 | 2 | 1 |
What the data looks like
plot_data(ll)
Model
@model function joint_model(d, edges)
# Population-level parameters
μ_inc ~ Normal(3.0, 0.5) # log-mean Inc (≈ log 20 d)
σ_inc ~ truncated(Normal(0.0, 0.5); lower = 0) # log-SD Inc
μ_δ ~ Normal(0.0, 5.0) # population mean transmission timing (d from source onset)
σ_δ ~ truncated(Normal(0.0, 1.0); lower = 0) # population SD of transmission timing (d)
# NB offspring dispersion via Stan's reciprocal-sqrt reparameterisation:
# 1/√k is the SD multiplier in Var = μ + μ²·(1/√k)². Half-Normal(0, 1)
# spans Poisson (1/√k → 0) to heavy super-spreader (1/√k ≈ 2)
# symmetrically on the overdispersion scale.
phi_inv_sqrt ~ truncated(Normal(0.0, 1.0); lower = 0)
k := 1.0 / phi_inv_sqrt^2
σ_rw ~ truncated(Normal(0.0, 0.5); lower = 0) # log-R RW innovation SD allows sharp R(t) swings under interventions
# Concrete element type derived from a sampled scalar — avoids the
# dynamic-dispatch tax that `Vector{Real}` imposes on AD backends.
T = typeof(μ_inc)
# Non-centred random walk on log R(t) at the weekly knots. log_R[b] is
# the value at knot b; R(t) is linearly interpolated between knots.
n_knots = length(edges)
log_R_init ~ Normal(log(1.5), 1.0)
ε ~ Turing.filldist(Normal(zero(T), one(T)), n_knots - 1)
log_R := vcat(log_R_init, log_R_init .+ accumulate(+, σ_rw .* ε))
inc_dist = LogNormal(μ_inc, σ_inc)
# T_onset is a latent over the recorded onset window (defaults to a
# one-day window when only a single onset date was recorded).
T_onset = Vector{T}(undef, d.N)
for i in 1:d.N
T_onset[i] ~ Uniform(d.onset_lo_day[i], d.onset_hi_day[i])
end
T_inf = Vector{T}(undef, d.N)
for i in 1:d.N
if d.source_idx[i] == 0
# Zoonotic index: free latent T_inf pre-onset.
T_inf[i] ~ Uniform(d.onset_lo_day[i] - 80.0, T_onset[i] - 1e-6)
inc_i = T_onset[i] - T_inf[i]
Turing.@addlogprob! logpdf(inc_dist, inc_i)
else
# Sourced case: T_inf anchored to listed exposure window.
# GI > 0 enforced by rejecting trajectories where the secondary
# was infected before its source.
src = d.source_idx[i]
T_inf[i] ~ Uniform(d.exp_lo_day[i], d.exp_hi_day[i])
if T_inf[i] <= T_inf[src]
Turing.@addlogprob! -Inf
else
inc_i = T_onset[i] - T_inf[i]
δ_pair = T_inf[i] - T_onset[src]
Turing.@addlogprob! logpdf(inc_dist, inc_i)
Turing.@addlogprob! logpdf(Normal(μ_δ, σ_δ), δ_pair)
end
end
# Case reproduction number indexed by source onset: μ_δ ≈ 0,
# σ_δ ≈ 0.6 d, so a case's offspring are infected within ~1 d of
# the case becoming symptomatic.
R_i = exp(log_R_at(T_onset[i], edges, log_R))
d.Zobs[i] ~ NegativeBinomial(k, k / (k + R_i))
end
endPrior predictives
Implied prior distributions for the incubation period, transmission timing δ, and the derived generation / serial interval before any data are seen.
plot_prior_predictives()
Fitting
sample_fit wraps the package's default NUTS configuration: Enzyme reverse-mode AD, chains initialised from the prior, 1000 post-warmup draws across 4 chains, target_accept = 0.95.
chn = sample_fit(model)╭─FlexiChain (1000 iterations, 4 chains) ──────────────────────────────────────╮
│ ↓ iter = 501:1500 │
│ → chain = 1:4 │
│ │
│ Parameters (12) ── AbstractPPL.VarName │
│ Float64 μ_inc, σ_inc, μ_δ, σ_δ, phi_inv_sqrt, k, σ_rw, log_R_init │
│ Vector{Float64} ε, log_R, T_onset, T_inf │
│ │
│ Extras (14) │
│ Int64 n_steps, tree_depth │
│ Bool is_accept, numerical_error │
│ Float64 acceptance_rate, log_density, hamiltonian_energy, │
│ hamiltonian_energy_error, max_hamiltonian_energy_error, step_size, │
│ nom_step_size, logprior, loglikelihood, logjoint │
╰──────────────────────────────────────────────────────────────────────────────╯Diagnostics
Maximum R̂, minimum bulk ESS, divergence count, and wall-clock sampling time (seconds, approximated by the slowest chain under MCMCThreads).
diagnostics_table(chn)| Row | rhat_max | ess_min | divergences | runtime_seconds |
|---|---|---|---|---|
| Float64 | Float64 | Int64 | Float64 | |
| 1 | 1.00434 | 1717.78 | 0 | 58.4237 |
Key outputs
summary_table(chn)| Row | quantity | median | lower_95 | upper_95 |
|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | |
| 1 | Incubation mean (d) | 22.4907 | 20.2603 | 25.4829 |
| 2 | Incubation 95th pct (d) | 36.1218 | 31.3083 | 44.0981 |
| 3 | Incubation 99th pct (d) | 44.9487 | 37.5034 | 58.1167 |
| 4 | μ_δ (d from source onset) | 0.172271 | -0.161352 | 0.489418 |
| 5 | σ_δ (d) | 0.607586 | 0.451213 | 0.837362 |
| 6 | GI / SI mean (d) | 22.6759 | 20.4225 | 25.6669 |
| 7 | GI / SI SD (d) | 7.38071 | 5.61271 | 10.3836 |
| 8 | NB dispersion k | 0.382977 | 0.157457 | 1.07489 |
R(t) over weekly knots
Spaghetti of thinned posterior draws through the weekly knots (linearly interpolated); reverts to the prior in late-January knots where cases are thin (see Limitations).
plot_rt(chn)
Pair plot of population parameters
Corner plot for μ_inc, σ_inc, μ_δ, σ_δ, k.
plot_pair(chn)
Predictive distributions for the delays
Implied population distributions for the incubation period, transmission timing δ, and the derived generation and serial intervals under the fitted posterior — i.e. what a new case or transmission pair would look like. This is not a check against the observed data; for that see the sense-check and PPC panels below. Inc and δ panels show the posterior over the parametric density (median PDF with a 95% pointwise ribbon across draws) overlaid with one predictive realisation per draw; GI and SI show the predictive-sample histogram only.
plot_predictive_distributions(chn)
δ sense check
Compare the per-pair posterior medians of δ to the fitted population Normal(μ_δ, σ_δ).
plot_delta_sense_check(chn, d)
Incubation-period sense check
Compare the per-case posterior medians of T_onset[i] − T_inf[i] to the fitted population LogNormal(μ_inc, σ_inc).
plot_inc_sense_check(chn, d)
Offspring posterior-predictive check
Joint-draw posterior-predictive check. For each posterior draw, replicate Z_rep[i] ~ NegativeBinomial(k, k/(k+R_i)) per case, with R_i = exp(log_R_at(T_inf[i], edges, log_R)) evaluated at the same draw's T_inf[i], log_R, and k (matching the model's likelihood, clamp and all). The left panel compares frequencies of each Z value against the observed line list. The right column has three stacked subpanels — one per discrete test statistic (sum(Z), max(Z), count(Z = 0)) — each showing the histogram of the replicated statistic with the observed value as a dashed vertical rule.
plot_z_ppc(chn, d)
Numeric values for each test statistic — observed, replicated median + 95% CrI, and the two-sided Bayesian posterior-predictive p-value 2 · min(P(T_rep ≥ T_obs), P(T_rep ≤ T_obs)).
z_ppc_summary(chn, d)| Row | statistic | observed | rep_median | rep_lower_95 | rep_upper_95 | p_ppp |
|---|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | sum(Z) | 32.0 | 29.0 | 8.975 | 82.0 | 0.8655 |
| 2 | max(Z) | 10.0 | 8.0 | 2.0 | 33.0 | 0.748 |
| 3 | count(Z = 0) | 24.0 | 23.0 | 16.0 | 30.0 | 0.9305 |