Skip to content

Analysis walkthrough

The Epuyén 2018–19 outbreak in north-west Patagonia was the first cluster where person-to-person Andes hantavirus transmission was documented at scale. The line list bundled with this package is hand-encoded from Table S2 of Martínez et al. 2020 — 34 cases with exposure windows, symptom-onset dates, and attributed source cases.

This page fits the joint model in TransmissionLinelist.jl to that line list and renders the headline outputs. Four quantities are estimated together: the incubation period, the transmission timing of each secondary relative to its source's symptom onset (δ), a weekly time-varying reproduction number R(t), and the offspring dispersion k of a Negative-Binomial. Exposure and onset dates are interval-censored. The model handles that by giving each case a continuous latent infection time and a continuous latent onset time, each sampled within its recorded window. Generation interval and serial interval are derived in post-processing as the transmission timing plus an incubation period (the source's for GI, the secondary's for SI). Fitting all four jointly propagates uncertainty between them that a delay-then-R(t) pipeline would lose.

Priors, the data-augmentation construction, and per-pair GI > 0 constraint are detailed on the Model page. Caveats around exposure encoding, late R(t) bins, and right-truncation are on the Limitations page.

julia
using TransmissionLinelist
using Chain
using DataFrames
using DataFramesMeta
using FlexiChains
using Printf
using Random
using CairoMakie
using AlgebraOfGraphics
using PairPlots

Random.seed!(20260508)
Random.TaskLocalRNG()

Load the line list

load_linelist parses the bundled CSV and drops the _alt sensitivity rows. build_data re-encodes exposure / onset windows as day offsets from t0 (60 days before the first onset); bin_edges_day returns the weekly R(t) knot dates as day offsets.

julia
ll = load_linelist()
d = build_data(ll)
edges = bin_edges_day(d.t0)
model = joint_model(d, edges)

@chain ll begin
    @select(:patient_id, :exposure_lower, :exposure_upper,
        :onset_date, :source_case, :Z)
    first(8)
end
8×6 DataFrame
Rowpatient_idexposure_lowerexposure_upperonset_datesource_caseZ
StringDate?Date?DateString15?Int64
11missingmissing2018-11-03missing5
222018-11-032018-11-032018-11-2316
332018-11-032018-11-032018-11-2010
442018-11-032018-11-032018-11-2710
552018-11-032018-11-032018-11-2611
662018-11-032018-11-032018-11-2510
772018-11-232018-11-232018-12-0720
882018-11-232018-11-232018-12-1321

What the data looks like

julia
plot_data(ll)

Model

julia
@model function joint_model(d, edges)
    # Population-level parameters
    μ_inc ~ Normal(3.0, 0.5)                       # log-mean Inc (≈ log 20 d)
    σ_inc ~ truncated(Normal(0.0, 0.5); lower = 0) # log-SD Inc
    μ_δ ~ Normal(0.0, 5.0)                       # population mean transmission timing (d from source onset)
    σ_δ ~ truncated(Normal(0.0, 1.0); lower = 0) # population SD of transmission timing (d)
    # NB offspring dispersion via Stan's reciprocal-sqrt reparameterisation:
    # 1/√k is the SD multiplier in Var = μ + μ²·(1/√k)². Half-Normal(0, 1)
    # spans Poisson (1/√k → 0) to heavy super-spreader (1/√k ≈ 2)
    # symmetrically on the overdispersion scale.
    phi_inv_sqrt ~ truncated(Normal(0.0, 1.0); lower = 0)
    k := 1.0 / phi_inv_sqrt^2
    σ_rw ~ truncated(Normal(0.0, 0.5); lower = 0) # log-R RW innovation SD allows sharp R(t) swings under interventions

    # Concrete element type derived from a sampled scalar — avoids the
    # dynamic-dispatch tax that `Vector{Real}` imposes on AD backends.
    T = typeof(μ_inc)

    # Non-centred random walk on log R(t) at the weekly knots. log_R[b] is
    # the value at knot b; R(t) is linearly interpolated between knots.
    n_knots = length(edges)
    log_R_init ~ Normal(log(1.5), 1.0)
    ε ~ Turing.filldist(Normal(zero(T), one(T)), n_knots - 1)
    log_R := vcat(log_R_init, log_R_init .+ accumulate(+, σ_rw .* ε))

    inc_dist = LogNormal(μ_inc, σ_inc)

    # T_onset is a latent over the recorded onset window (defaults to a
    # one-day window when only a single onset date was recorded).
    T_onset = Vector{T}(undef, d.N)
    for i in 1:d.N
        T_onset[i] ~ Uniform(d.onset_lo_day[i], d.onset_hi_day[i])
    end

    T_inf = Vector{T}(undef, d.N)
    for i in 1:d.N
        if d.source_idx[i] == 0
            # Zoonotic index: free latent T_inf pre-onset.
            T_inf[i] ~ Uniform(d.onset_lo_day[i] - 80.0, T_onset[i] - 1e-6)
            inc_i = T_onset[i] - T_inf[i]
            Turing.@addlogprob! logpdf(inc_dist, inc_i)
        else
            # Sourced case: T_inf anchored to listed exposure window.
            # GI > 0 enforced by rejecting trajectories where the secondary
            # was infected before its source.
            src = d.source_idx[i]
            T_inf[i] ~ Uniform(d.exp_lo_day[i], d.exp_hi_day[i])
            if T_inf[i] <= T_inf[src]
                Turing.@addlogprob! -Inf
            else
                inc_i = T_onset[i] - T_inf[i]
                δ_pair = T_inf[i] - T_onset[src]
                Turing.@addlogprob! logpdf(inc_dist, inc_i)
                Turing.@addlogprob! logpdf(Normal(μ_δ, σ_δ), δ_pair)
            end
        end
        # Case reproduction number indexed by source onset: μ_δ ≈ 0,
        # σ_δ ≈ 0.6 d, so a case's offspring are infected within ~1 d of
        # the case becoming symptomatic.
        R_i = exp(log_R_at(T_onset[i], edges, log_R))
        d.Zobs[i] ~ NegativeBinomial(k, k / (k + R_i))
    end
end

Prior predictives

Implied prior distributions for the incubation period, transmission timing δ, and the derived generation / serial interval before any data are seen.

julia
plot_prior_predictives()

Fitting

sample_fit wraps the package's default NUTS configuration: Enzyme reverse-mode AD, chains initialised from the prior, 1000 post-warmup draws across 4 chains, target_accept = 0.95.

julia
chn = sample_fit(model)
╭─FlexiChain (1000 iterations, 4 chains) ──────────────────────────────────────
 ↓ iter  = 501:1500
 → chain = 1:4

 Parameters (12) ── AbstractPPL.VarName
  Float64          μ_inc, σ_inc, μ_δ, σ_δ, phi_inv_sqrt, k, σ_rw, log_R_init  
  Vector{Float64}  ε, log_R, T_onset, T_inf                                   

 Extras (14)
  Int64    n_steps, tree_depth                                                
  Bool     is_accept, numerical_error                                         
  Float64  acceptance_rate, log_density, hamiltonian_energy,                  
           hamiltonian_energy_error, max_hamiltonian_energy_error, step_size, 
           nom_step_size, logprior, loglikelihood, logjoint                   
╰──────────────────────────────────────────────────────────────────────────────╯

Diagnostics

Maximum R̂, minimum bulk ESS, divergence count, and wall-clock sampling time (seconds, approximated by the slowest chain under MCMCThreads).

julia
diagnostics_table(chn)
1×4 DataFrame
Rowrhat_maxess_mindivergencesruntime_seconds
Float64Float64Int64Float64
11.004341717.78058.4237

Key outputs

julia
summary_table(chn)
8×4 DataFrame
Rowquantitymedianlower_95upper_95
StringFloat64Float64Float64
1Incubation mean (d)22.490720.260325.4829
2Incubation 95th pct (d)36.121831.308344.0981
3Incubation 99th pct (d)44.948737.503458.1167
4μ_δ (d from source onset)0.172271-0.1613520.489418
5σ_δ (d)0.6075860.4512130.837362
6GI / SI mean (d)22.675920.422525.6669
7GI / SI SD (d)7.380715.6127110.3836
8NB dispersion k0.3829770.1574571.07489

R(t) over weekly knots

Spaghetti of thinned posterior draws through the weekly knots (linearly interpolated); reverts to the prior in late-January knots where cases are thin (see Limitations).

julia
plot_rt(chn)

Pair plot of population parameters

Corner plot for μ_inc, σ_inc, μ_δ, σ_δ, k.

julia
plot_pair(chn)

Predictive distributions for the delays

Implied population distributions for the incubation period, transmission timing δ, and the derived generation and serial intervals under the fitted posterior — i.e. what a new case or transmission pair would look like. This is not a check against the observed data; for that see the sense-check and PPC panels below. Inc and δ panels show the posterior over the parametric density (median PDF with a 95% pointwise ribbon across draws) overlaid with one predictive realisation per draw; GI and SI show the predictive-sample histogram only.

julia
plot_predictive_distributions(chn)

δ sense check

Compare the per-pair posterior medians of δ to the fitted population Normal(μ_δ, σ_δ).

julia
plot_delta_sense_check(chn, d)

Incubation-period sense check

Compare the per-case posterior medians of T_onset[i] − T_inf[i] to the fitted population LogNormal(μ_inc, σ_inc).

julia
plot_inc_sense_check(chn, d)

Offspring posterior-predictive check

Joint-draw posterior-predictive check. For each posterior draw, replicate Z_rep[i] ~ NegativeBinomial(k, k/(k+R_i)) per case, with R_i = exp(log_R_at(T_inf[i], edges, log_R)) evaluated at the same draw's T_inf[i], log_R, and k (matching the model's likelihood, clamp and all). The left panel compares frequencies of each Z value against the observed line list. The right column has three stacked subpanels — one per discrete test statistic (sum(Z), max(Z), count(Z = 0)) — each showing the histogram of the replicated statistic with the observed value as a dashed vertical rule.

julia
plot_z_ppc(chn, d)

Numeric values for each test statistic — observed, replicated median + 95% CrI, and the two-sided Bayesian posterior-predictive p-value 2 · min(P(T_rep ≥ T_obs), P(T_rep ≤ T_obs)).

julia
z_ppc_summary(chn, d)
3×6 DataFrame
Rowstatisticobservedrep_medianrep_lower_95rep_upper_95p_ppp
StringFloat64Float64Float64Float64Float64
1sum(Z)32.029.08.97582.00.8655
2max(Z)10.08.02.033.00.748
3count(Z = 0)24.023.016.030.00.9305