Chain statistics, likelihood, and fitting
Compute chain size and length statistics, evaluate likelihoods, and fit offspring distributions — analytically where possible, simulation-based otherwise.
Chain statistics
using EpiBranch
using Distributions
using DataFrames
using StableRNGs
model = BranchingProcess(Poisson(0.9), Exponential(5.0))
rng = StableRNG(42)
state = simulate(model; sim_opts = SimOpts(n_initial = 20, max_cases = 10_000), rng = rng)
cs = chain_statistics(state)
first(cs, 10)| Row | chain_id | size | length |
|---|---|---|---|
| Int64 | Int64 | Int64 | |
| 1 | 1 | 1 | 0 |
| 2 | 2 | 6 | 4 |
| 3 | 3 | 2 | 1 |
| 4 | 4 | 1 | 0 |
| 5 | 5 | 1 | 0 |
| 6 | 6 | 47 | 19 |
| 7 | 7 | 1 | 0 |
| 8 | 8 | 14 | 6 |
| 9 | 9 | 2 | 1 |
| 10 | 10 | 1 | 0 |
println("Mean size: $(round(mean(cs.size), digits=2)), Max: $(maximum(cs.size))")
println("Mean length: $(round(mean(cs.length), digits=2))")Mean size: 6.75, Max: 47
Mean length: 2.6Analytical chain size distributions
For certain offspring distributions, the chain size distribution has a closed form:
# Poisson → Borel
d_borel = chain_size_distribution(Poisson(0.8))
println("Borel(0.8) mean: $(round(mean(d_borel), digits=2))")
# NegBin → Lagrange inversion formula
d_nb = chain_size_distribution(NegBin(0.8, 0.5))
println("NegBin chain size P(1): $(round(pdf(d_nb, 1), digits=4))")Borel(0.8) mean: 5.0
NegBin chain size P(1): 0.6202Likelihood
Given observed chain sizes, evaluate the log-likelihood using loglikelihood with data wrapped in ChainSizes:
data = ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2])
# Compare offspring parameters
for λ in [0.3, 0.5, 0.7, 0.9]
ll = loglikelihood(data, Poisson(λ))
println("Poisson($λ): LL = $(round(ll, digits=2))")
endPoisson(0.3): LL = -12.98
Poisson(0.5): LL = -12.49
Poisson(0.7): LL = -13.4
Poisson(0.9): LL = -14.99With Negative Binomial offspring:
ll = loglikelihood(data, NegBin(0.9, 0.5))
println("NegBin(R=0.9, k=0.5): LL = $(round(ll, digits=2))")NegBin(R=0.9, k=0.5): LL = -14.07Imperfect observation
Account for incomplete case ascertainment by wrapping a model in PartiallyObserved. Each case in a chain is detected independently with the given probability:
data = ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2])
base = BranchingProcess(Poisson(0.9))
full = PartiallyObserved(base, 1.0)
partial = PartiallyObserved(base, 0.7)
println("Full observation: $(round(loglikelihood(data, full), digits=2))")
println("70% observation: $(round(loglikelihood(data, partial), digits=2))")Full observation: -14.99
70% observation: -16.16If you call PartiallyObserved with only the detection probability, you get a function that wraps whatever model you apply it to. Julia's pipe then composes:
ll_pipe = loglikelihood(data, base |> PartiallyObserved(0.7))
println("Via pipe: $(round(ll_pipe, digits=2))")Via pipe: -16.16Stacking the same wrapper multiplies the detection probabilities. Two stages at 0.5 give the same observed distribution as one stage at 0.25:
ll_stacked = loglikelihood(data,
base |> PartiallyObserved(0.5) |> PartiallyObserved(0.5))
ll_single = loglikelihood(data, base |> PartiallyObserved(0.25))
println("Stacked: $(round(ll_stacked, digits=2))")
println("Single: $(round(ll_single, digits=2))")Stacked: -20.02
Single: -20.02Offspring counts
If you have direct observations of secondary case counts (who infected whom), wrap them in OffspringCounts:
offspring_data = OffspringCounts([0, 1, 2, 0, 3, 1, 0, 2, 5, 0])
ll = loglikelihood(offspring_data, Poisson(1.4))
println("Offspring LL: $(round(ll, digits=2))")Offspring LL: -17.25Chain lengths
length_data = ChainLengths([0, 1, 0, 2, 1, 0, 0, 3, 0, 1])
ll = loglikelihood(length_data, Poisson(0.5))
println("Chain length LL: $(round(ll, digits=2))")Chain length LL: -12.48Simulation-based likelihood
For models with interventions, use the simulation-based likelihood by passing a BranchingProcess model instead of a distribution:
model = BranchingProcess(Poisson(2.0), Exponential(5.0))
iso = Isolation(delay = Exponential(2.0))
ll = loglikelihood(ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2]), model;
interventions = [iso],
attributes = clinical_presentation(incubation_period = LogNormal(1.5, 0.5)),
n_sim = 1000,
rng = StableRNG(42),
)
println("LL with interventions: $(round(ll, digits=2))")LL with interventions: -48.22Forward simulation and likelihood evaluation share the same code path. Likelihoods can therefore be evaluated under interventions — something not possible when simulation and inference are in separate packages.
Fitting
Use fit to find the maximum likelihood offspring distribution. The same interface works for all data types:
# From offspring counts
offspring_data = OffspringCounts([0, 1, 2, 0, 3, 1, 0, 2, 5, 0])
d = fit(Poisson, offspring_data)
println("Poisson MLE from offspring: R = $(round(mean(d), digits=2))")Poisson MLE from offspring: R = 1.4d = fit(NegativeBinomial, offspring_data)
println("NegBin MLE from offspring: R = $(round(mean(d), digits=2)), k = $(round(d.r, digits=2))")NegBin MLE from offspring: R = 1.4, k = 1.45# From chain sizes (subcritical only)
rng = StableRNG(42)
model = BranchingProcess(Poisson(0.5), Exponential(5.0))
states = simulate_batch(model, 500; rng = rng)
sizes = Int[]
for s in states
cs = chain_statistics(s)
append!(sizes, cs.size)
end
d = fit(Poisson, ChainSizes(sizes))
println("Poisson MLE from chain sizes: R = $(round(mean(d), digits=2))")┌ Warning: Assignment to `cs` in soft scope is ambiguous because a global variable by the same name exists: `cs` will be treated as a new local. Disambiguate by using `local cs` to suppress this warning or `global cs` to assign to the existing global variable.
└ @ chains.md:168
Poisson MLE from chain sizes: R = 0.48Bayesian inference with Turing.jl
The loglikelihood functions work directly with Turing.jl via @addlogprob!:
using Turing
@model function chain_model(data)
R ~ LogNormal(-0.5, 1.0)
Turing.@addlogprob! loglikelihood(ChainSizes(data), Poisson(R))
end