Contributing

Issues and pull requests are welcome at epiforecasts/BVDOutbreakSize. This page covers how the project is laid out, how to run it, and the conventions to follow when changing it.

Repository layout

src/BVDOutbreakSize.jl — the package: data loading (load_observations), NUTS sampling (nuts_sample), the shared Gauss-Legendre integrators (integrate, delay_convolution, integrate_cumulative, integrate_exports_deaths), summary and comparison tables, plotting, the no-onward-deaths projection (predict_no_onward_deaths) and forecast helpers (forecast_reported). The published Imperial point estimates live here as REPORT_SCENARIOS.
docs/examples/analysis.jl — the Literate walkthrough that is the analysis. It defines the Turing submodels and composers, runs the fits, and writes every output. This is the main artifact.
docs/make.jl — DocumenterVitepress build. Copies README.md to index.md, executes the literate to analysis.md, and builds the bibliography.
data/observations.toml — single source of truth for observation data (case and death counts, traveller volumes, sources). Loaded via load_observations() and never hardcoded. Update this one file for a new situation report and the analysis picks it up. The literate re-binds its observation consts from the loaded TOML, so the package constants are defaults only.
scripts/run.jl — regenerates published results by including the literate and writes CSVs to output/.
test/ — one file per feature, driven by test/runtests.jl.
external/bdbv-linelist-analysis — git submodule, source of the onset-to-death delay priors.

Running and testing

There is no Taskfile. Use the julia --project commands:

bash

# Instantiate the package environment
julia --project=. -e 'using Pkg; Pkg.instantiate()'

# Run the analysis
julia --project=. docs/examples/analysis.jl

# Regenerate the published output CSVs into output/
julia --project=. scripts/run.jl

# Run the full test suite
julia --project=. -e 'using Pkg; Pkg.test()'

# Build the docs (executes the literate, HTML in docs/build/)
julia --project=docs -e 'using Pkg; \
  Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
julia --project=docs docs/make.jl

A build streams per-fit progress by default: every NUTS fit writes logs/<fit>.log (iteration, log-density, divergences) and a TensorBoard run under logs/tensorboard/<fit>/, controlled by BVD_FIT_LOG (all when unset, or progress, tensorboard, none). CI release builds set BVD_FIT_LOG=none. Tail a log for quick liveness, or run task tensorboard to view all fits in the worktree. The logs live under the git-ignored logs/, so each worktree keeps its own.

test/runtests.jl includes each test/test_*.jl. To iterate on one file, run it inside a REPL after using BVDOutbreakSize, or temporarily comment out the others in runtests.jl.

CI runs the test suite (.github/workflows/test.yml) and builds the docs, publishing output/ as a GitHub Release on each push to main (.github/workflows/docs.yml).

Model architecture

The model is assembled from small, swappable Turing submodels rather than one monolithic block (the build-up is drawn as a flowchart on the Analysis page). There are three layers.

Building-block submodels, one per parameter family, each owning its own priors:

exponential_growth_model samples the doubling time τ and the doubling-time multiplier m = T/τ, not τ and T directly, to break the C(T) = exp(rT) ridge.
delay_model is the gamma onset-to-death delay.
cfr_model is the case-fatality ratio.
detection_window_model is the McCabe rectangular export detection window w; the default export mechanism instead reuses the DRC onset-to-report delay f_rep (report_delay_model) as the onset-to-detection delay.
surveillance_dispersion_model samples on the 1/√k scale.
pooled_ascertainment_model partially pools the DRC and Uganda reporting fractions p_drc and p_uganda on the logit scale.

Observation submodels, one per data stream, each taking the growth state, adding its forward integral and likelihood: exports_model (Poisson), deaths_model (NegBinomial), reported_cases_model (NegBinomial), confirmed_cases_model (NegBinomial), and exports_deaths_model (Poisson). The Uganda export streams have delay-convolution variants (exports_delay_model, exports_deaths_delay_model, exports_detection_timing_delay_model) that replace the rectangular detection window with an onset-to-detection delay reusing the DRC f_rep; these are the bvd_joint defaults.

Composers stitch the blocks into full generative models: exports_only_model, deaths_only_model, cases_only_model, exports_deaths_only_model, imperial_only_model (exports and deaths, the Imperial joint configuration), and bvd_joint (all four streams). Each composer conditionally includes only the likelihoods for the streams it carries. A single-stream composer never instantiates the other observation submodels, so a discrete stream is never left sampled, which would trip Turing's model check. Pass a stream as missing to drop its likelihood; bvd_joint with all streams missing is the generator used for the prior and posterior predictive checks.

Conventions

Maximum 80 characters per line of code.
One sentence per line in prose and markdown; do not wrap prose at 80 characters.
The abstract is single-sourced in README.md, wrapped in  /  markers. Edit the abstract in README.md only. docs/examples/analysis.jl loads it at build time via a Documenter @eval block that reads README.md and regex-extracts the text between those markers, so do not duplicate it into the analysis page.
Table-construction and other setup code in analysis.jl is hidden inside <details> dropdowns via #md # @raw html blocks; the bare result object follows (with #hide) so only the output renders.
The surveillance dispersion prior is a half-normal truncated(Normal(0, 1); lower = 0) on inv_sqrt_k.
Docstrings use DocStringExtensions ($(TYPEDSIGNATURES)).
The AD backend is Mooncake reverse-mode; integrals use Gauss-Legendre quadrature (DEATH_INTEGRAL_ALG with n = 64, CUMULATIVE_INTEGRAL_ALG with n = 32); models compose via ~ to_submodel(...). The deaths-among-exports CDF is written as an inner integral of the density because the reverse-mode AD backend does not support the gamma CDF shape-parameter derivative.
NaN and Inf safe clamps (safe_nbinomial, eps-flooring of expected counts) guard against extreme NUTS warmup proposals; keep them when editing the likelihoods.

Analysis report prose

These apply to the narrative prose in docs/examples/analysis.jl. Use the existing report text as the template for tone.

No code references in the narrative. Do not name functions, parameters, files, or :symbols in the prose. Describe each quantity in words, and define a derived quantity in words the first time it appears, near its figure or table.
Concise and direct. Cut filler and adjectives. Avoid the LLM-indicator words: comprehensive, leverage, robust, framework (when vague), utilise, facilitate, novel, landscape, foster, harness, streamline, pivotal, nuanced, multifaceted, cornerstone, synergy, overarching.
Report intervals as sentences, without a leading median. Write the credible interval as a phrase, not a "median (lower, upper)" construction.
Minimise colons and dashes in prose; use them only when needed.
UK English throughout.
Section and subsection titles are just the title. No descriptive suffix after a title (not "Reproduction number — weekly random walk with intervention ramp", just "Reproduction number"), and no detail-dump in the first sentence after a heading.
Order the methods generatively, infections through to observation endpoints: the infection process first, then the epidemiological processes (delays, case-fatality ratio), then the observation models (surveillance streams before exports), then the joint model.
Define every quantity before it is used. Define the reproduction number before the seeding that relies on it; introduce the initial infection count before describing how it arises; define every symbol and operator (including convolution) the first time it appears. Never use a symbol the reader has not met.
Do not repeat. State a convention once (the credible-interval levels, the delay discretisation) and do not restate it per bullet or subsection. Cut sentences that duplicate earlier content.
Cite the source of each prior and carry the uncertainty the source reports. When a source gives a distribution with uncertainty (a shape and scale with intervals), propagate that, not a self-assigned weakly-informative spread. Do not write "with an assumed weakly-informative spread" repeatedly. If a prior is our own choice, say so plainly ("we use a prior of ...").
State assumptions as assumptions ("we assume a single seed case", "we assume the response scale-up takes about three weeks"). Do not assert a false rationale for a modelling choice (not "a Poisson because the count is small").
Do not editorialise or justify priors in the narrative (not "a diffuse prior would let the background absorb the whole stream"). State what the model does.
Methods belong in the methods. Do not leave model description (the intervention model, the counterfactual, the forecast, the evaluation) in the results; move it to the methods and keep the results to findings.
Label quantities accurately. Do not call suspected cases onsets; prefer "current cumulative" over "final cumulative".
For a latent quantity (infections, onsets, deaths) report the modelled estimate without overlaying observed data that sits downstream of unmodelled processes.
Plots use the same credible-interval ribbons as the tables, not a bare median, and show only the period being estimated rather than greying out the rest.
Model code shown in the report is clean; strip working comments before it is displayed.
Flag a future improvement as a GitHub issue, not a buried caveat in the prose.

Pull requests

main is branch-protected; changes go through pull requests.
Run the test suite before opening a pull request.
Add a bullet to the News page under Unreleased for any user-visible change.

Contributing ​

Repository layout ​

Running and testing ​

Model architecture ​

Conventions ​

Analysis report prose ​