Skip to content

Analysis walkthrough

The Epuyén 2018–19 outbreak in north-west Patagonia was the first cluster where person-to-person Andes hantavirus transmission was documented at scale. The line list bundled with this package is hand-encoded from Table S2 of Martínez et al. 2020 — 34 cases with exposure windows, symptom-onset dates, and attributed source cases.

This page fits the joint model in TransmissionLinelist.jl to that line list and renders the headline outputs. Four quantities are estimated together: the incubation period, the transmission timing of each secondary relative to its source's symptom onset (δ), a weekly time-varying reproduction number R(t), and the offspring dispersion k of a Negative-Binomial. Exposure and onset dates are interval-censored. The model handles that by giving each case a continuous latent infection time and a continuous latent onset time, each sampled within its recorded window. Generation interval and serial interval are derived in post-processing as the transmission timing plus an incubation period (the source's for GI, the secondary's for SI). Fitting all four jointly propagates uncertainty between them that a delay-then-R(t) pipeline would lose.

Priors, the data-augmentation construction, and per-pair GI > 0 constraint are detailed on the Model page. Caveats around exposure encoding, late R(t) bins, and right-truncation are on the Limitations page.

julia
using TransmissionLinelist
using Chain
using DataFrames
using DataFramesMeta
using Distributions: LogNormal
using FlexiChains
using Printf
using Random
using Statistics: quantile
using CairoMakie
using AlgebraOfGraphics
using PairPlots

Random.seed!(20260508)
Random.TaskLocalRNG()

Load the line list

load_linelist parses the bundled CSV and drops the _alt sensitivity rows. build_data re-encodes exposure / onset windows as day offsets from t0 (60 days before the first onset), and prepare_rt_edges builds the weekly R(t) knot dates from the same origin. Passing those into joint_model instantiates the Turing model directly.

julia
ll = load_linelist()
d = build_data(ll)
edges = prepare_rt_edges(d.t0)
model = joint_model(d, edges)

@chain ll begin
    @select(:patient_id, :exposure_lower, :exposure_upper,
        :onset_date, :source_case, :Z)
    first(8)
end
8×6 DataFrame
Rowpatient_idexposure_lowerexposure_upperonset_datesource_caseZ
StringDate?Date?DateString15?Int64
11missingmissing2018-11-03missing5
222018-11-032018-11-032018-11-2316
332018-11-032018-11-032018-11-2010
442018-11-032018-11-032018-11-2710
552018-11-032018-11-032018-11-2611
662018-11-032018-11-032018-11-2510
772018-11-232018-11-232018-12-0720
882018-11-232018-11-232018-12-1321

What the data looks like

julia
plot_data(ll)

Model

julia
@model function joint_model(d, edges;
        incubation = incubation_model(),
        transmission = transmission_delta_model(),
        rt = random_walk_rt_model(length(edges)),
        dispersion = nb_dispersion_model(),
        cases = case_model)
    delays ~ to_submodel(
        delays_only_model(d; incubation, transmission), false)
    case_obs ~ to_submodel(
        cases(d.Zobs, edges, delays.T_onset, delays.p;
            rt, dispersion), false)
end

Prior predictives

Implied prior distributions for the incubation period, transmission timing δ, and the derived generation / serial interval before any data are seen.

julia
plot_prior_predictives()

Fitting

sample_fit wraps the package's default NUTS configuration: Mooncake reverse-mode AD, chains initialised from the prior, 1000 post-warmup draws across 4 chains, target_accept = 0.95.

julia
chn = sample_fit(model)
╭─FlexiChain (1000 iterations, 4 chains) ──────────────────────────────────────
 ↓ iter  = 501:1500
 → chain = 1:4

 Parameters (12) ── AbstractPPL.VarName
  Float64          μ_inc, σ_inc, μ_δ, σ_δ, σ_rw, log_R_init, phi_inv_sqrt, k  
  Vector{Float64}  T_onset, T_inf, ε, log_R                                   

 Extras (14)
  Int64    n_steps, tree_depth                                                
  Bool     is_accept, numerical_error                                         
  Float64  acceptance_rate, log_density, hamiltonian_energy,                  
           hamiltonian_energy_error, max_hamiltonian_energy_error, step_size, 
           nom_step_size, logprior, loglikelihood, logjoint                   
╰──────────────────────────────────────────────────────────────────────────────╯

Drop a compact regression snapshot to output/regression/analysis.csv for the docs CI to diff against the checked-in baseline. Flags any future PR whose fit drifts outside MCMC noise.

julia
save_regression_summary(
    joinpath(@__DIR__, "..", "..", "output",
        "regression", "analysis.csv"), chn)
"/home/runner/work/andv-linelist-analysis/andv-linelist-analysis/docs/build/../../output/regression/analysis.csv"

Diagnostics

Maximum R̂, minimum bulk ESS, divergence count, and wall-clock sampling time (seconds, approximated by the slowest chain under MCMCThreads).

julia
diagnostics_table(chn)
1×4 DataFrame
Rowrhat_maxess_mindivergencesruntime_seconds
Float64Float64Int64Float64
11.006972328.280137.194

Key outputs

julia
summary_table(chn)
8×4 DataFrame
Rowquantitymedianlower_95upper_95
StringFloat64Float64Float64
1Incubation mean (d)22.552720.235125.5767
2Incubation 95th pct (d)36.208731.167744.4239
3Incubation 99th pct (d)45.026937.346958.3775
4μ_δ (d from source onset)0.172518-0.1705690.495142
5σ_δ (d)0.6090410.4561880.835941
6GI / SI mean (d)22.71420.386125.6692
7GI / SI SD (d)7.389465.5728510.4537
8NB dispersion k0.3033290.1332120.876006

Observed offspring counts

Each case is plotted at its observed offspring count Z (with small vertical jitter to break ties) against time. Thin segments are posterior draws of the latent infection time T_inf joined to the latent onset time T_onset; filled dots are the posterior medians of T_inf and hollow dots the medians of T_onset. Index and sourced cases are coloured separately. This is a direct view of what the model has to explain before R(t) enters the picture; the posterior-predictive comparison against modelled offspring counts comes later under plot_z_ppc.

julia
plot_z_dumbbell(chn, d)

R(t) over weekly knots

Spaghetti of thinned posterior draws through the weekly knots (linearly interpolated); reverts to the prior in late-January knots where cases are thin (see Limitations).

julia
plot_rt(chn)

Pair plot of population parameters

Corner plot for μ_inc, σ_inc, μ_δ, σ_δ, k.

julia
plot_pair(chn)

Predictive distributions for the delays

Implied population distributions for the incubation period, transmission timing δ, and the derived generation and serial intervals under the fitted posterior — i.e. what a new case or transmission pair would look like. This is not a check against the observed data; for that see the sense-check and PPC panels below. Inc and δ panels show the posterior over the parametric density (median PDF with a 95% pointwise ribbon across draws) overlaid with one predictive realisation per draw; GI and SI show the predictive-sample histogram only.

julia
plot_predictive_distributions(chn)

δ sense check

Compare the per-pair posterior medians of δ to the fitted population Normal(μ_δ, σ_δ).

julia
plot_delta_sense_check(chn, d)

Incubation-period sense check

Compare the per-case posterior medians of T_onset[i] − T_inf[i] to the fitted population LogNormal(μ_inc, σ_inc).

julia
plot_inc_sense_check(chn, d)

Offspring posterior-predictive check

Joint-draw posterior-predictive check. For each posterior draw, replicate Z_rep[i] ~ NegativeBinomial(k, k/(k+R_i)) per case, with R_i = exp(log_R_at(T_inf[i], edges, log_R)) evaluated at the same draw's T_inf[i], log_R, and k (matching the model's likelihood, clamp and all). The left panel compares frequencies of each Z value against the observed line list. The right column has three stacked subpanels — one per discrete test statistic (sum(Z), max(Z), count(Z = 0)) — each showing the histogram of the replicated statistic with the observed value as a dashed vertical rule.

julia
plot_z_ppc(model, chn, d; edges = edges)

Numeric values for each test statistic — observed, replicated median + 95% CrI, and the two-sided Bayesian posterior-predictive p-value 2 · min(P(T_rep ≥ T_obs), P(T_rep ≤ T_obs)).

julia
z_ppc_summary(model, chn, d; edges = edges)
3×6 DataFrame
Rowstatisticobservedrep_medianrep_lower_95rep_upper_95p_ppp
StringFloat64Float64Float64Float64Float64
1sum(Z)32.030.09.085.00.92
2max(Z)10.08.02.029.00.708
3count(Z = 0)24.023.015.029.00.873

Comparison with Martínez et al. 2020

The line list used here is hand-encoded from Table S2 of Martínez et al. 2020, so the joint model and the source paper are fitted to the same outbreak. The table below places our posterior estimates next to the values that the NEJM paper reports for the same outbreak, with our column built directly from the fitted chain in scope (chn). Martínez values are pasted from the abstract and main text of the paper; where the paper does not report a quantity we estimate, the study columns are left as missing rather than fabricated. Our our_80_ci column is the central 80% credible interval (10th–90th percentile across posterior draws), chosen as a band that is comparable in tail mass to the empirical 9–40 day incubation range reported by the paper.

julia
post = summarise(chn)

inc_median = exp.(post.μ_inc)
inc_q95 = [quantile(LogNormal(post.μ_inc[i], post.σ_inc[i]), 0.95)
           for i in eachindex(post.μ_inc)]
rt_peak = [maximum(exp(post.log_R_chain[b][i])
           for b in eachindex(post.log_R_chain))
           for i in eachindex(post.μ_inc)]

q10(x) = quantile(x, 0.1)
q50(x) = quantile(x, 0.5)
q90(x) = quantile(x, 0.9)
fmt2(x) = @sprintf("%.2f", x)
ci80(x) = string(fmt2(q10(x)), "–", fmt2(q90(x)))

martinez_rows = [
    (parameter = "Incubation period — median (d)",
        study_value = "21",
        study_ci = "range 9–40",
        our_median = fmt2(q50(inc_median)),
        our_80_ci = ci80(inc_median),
        notes = "Martínez reports the empirical median and range of " *
                "observed incubation periods; our value is the median " *
                "of the fitted LogNormal."),
    (parameter = "Incubation period — μ_inc (log-scale mean)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.μ_inc)),
        our_80_ci = ci80(post.μ_inc),
        notes = "Martínez does not fit a parametric incubation " *
                "distribution, so no log-scale mean is reported."),
    (parameter = "Incubation period — σ_inc (log-scale SD)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.σ_inc)),
        our_80_ci = ci80(post.σ_inc),
        notes = "Not reported by Martínez (no parametric " *
                "distribution fitted)."),
    (parameter = "Incubation period — 95th percentile (d)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(inc_q95)),
        our_80_ci = ci80(inc_q95),
        notes = "Martínez reports the maximum observed value (40 d) " *
                "rather than a fitted 95th percentile."),
    (parameter = "Serial interval — mean (d)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.mean_gi_si)),
        our_80_ci = ci80(post.mean_gi_si),
        notes = "Martínez does not report a serial-interval " *
                "distribution; our SI mean and SD are derived from " *
                "the joint posterior over δ and the incubation period."),
    (parameter = "Serial interval — SD (d)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.sd_gi_si)),
        our_80_ci = ci80(post.sd_gi_si),
        notes = "As above; GI and SI share the same marginal in this " *
                "model so the same row covers the generation interval."),
    (parameter = "Generation interval — mean (d)",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.mean_gi_si)),
        our_80_ci = ci80(post.mean_gi_si),
        notes = "Martínez does not report a generation-interval " *
                "distribution."),
    (parameter = "Offspring dispersion k",
        study_value = "missing",
        study_ci = "missing",
        our_median = fmt2(q50(post.k)),
        our_80_ci = ci80(post.k),
        notes = "Martínez identifies three super-spreaders " *
                "qualitatively but does not fit a Negative-Binomial " *
                "dispersion."),
    (parameter = "Maximum R(t) over weekly knots",
        study_value = "2.12",
        study_ci = "missing",
        our_median = fmt2(q50(rt_peak)),
        our_80_ci = ci80(rt_peak),
        notes = "Martínez reports a pre-intervention R of 2.12 " *
                "(falling to 0.96 after isolation); no CI given. " *
                "Our value is the per-draw maximum of exp(log_R) " *
                "across weekly knots, so it sits above any single " *
                "weekly posterior median."),
    (parameter = "Secondary attack rate",
        study_value = "missing",
        study_ci = "missing",
        our_median = "missing",
        our_80_ci = "missing",
        notes = "Martínez describes super-spreading qualitatively " *
                "without a numeric SAR; our model targets R(t) and " *
                "k rather than a per-contact attack rate, so neither " *
                "side reports a comparable value.")
]

DataFrame(martinez_rows)
10×6 DataFrame
Rowparameterstudy_valuestudy_ciour_medianour_80_cinotes
StringStringStringStringStringString
1Incubation period — median (d)21range 9–4021.4219.93–23.02Martínez reports the empirical median and range of observed incubation periods; our value is the median of the fitted LogNormal.
2Incubation period — μ_inc (log-scale mean)missingmissing3.062.99–3.14Martínez does not fit a parametric incubation distribution, so no log-scale mean is reported.
3Incubation period — σ_inc (log-scale SD)missingmissing0.320.27–0.38Not reported by Martínez (no parametric distribution fitted).
4Incubation period — 95th percentile (d)missingmissing36.2132.75–40.93Martínez reports the maximum observed value (40 d) rather than a fitted 95th percentile.
5Serial interval — mean (d)missingmissing22.7121.16–24.47Martínez does not report a serial-interval distribution; our SI mean and SD are derived from the joint posterior over δ and the incubation period.
6Serial interval — SD (d)missingmissing7.396.16–9.18As above; GI and SI share the same marginal in this model so the same row covers the generation interval.
7Generation interval — mean (d)missingmissing22.7121.16–24.47Martínez does not report a generation-interval distribution.
8Offspring dispersion kmissingmissing0.300.18–0.59Martínez identifies three super-spreaders qualitatively but does not fit a Negative-Binomial dispersion.
9Maximum R(t) over weekly knots2.12missing1.620.91–3.24Martínez reports a pre-intervention R of 2.12 (falling to 0.96 after isolation); no CI given. Our value is the per-draw maximum of exp(log_R) across weekly knots, so it sits above any single weekly posterior median.
10Secondary attack ratemissingmissingmissingmissingMartínez describes super-spreading qualitatively without a numeric SAR; our model targets R(t) and k rather than a per-contact attack rate, so neither side reports a comparable value.