Chain statistics, likelihood, and fitting

Compute chain size and length statistics, evaluate likelihoods, and fit offspring distributions — analytically where possible, simulation-based otherwise.

Chain statistics

julia

using EpiBranch
using Distributions
using DataFrames
using StableRNGs

model = BranchingProcess(Poisson(0.9), Exponential(5.0))

rng = StableRNG(42)
state = simulate(model; sim_opts = SimOpts(n_initial = 20, max_cases = 10_000), rng = rng)

cs = chain_statistics(state)
first(cs, 10)

10×3 DataFrame

Row	chain_id	size	length
	Int64	Int64	Int64
1	1	1	0
2	2	6	4
3	3	2	1
4	4	1	0
5	5	1	0
6	6	47	19
7	7	1	0
8	8	14	6
9	9	2	1
10	10	1	0

julia

println("Mean size: $(round(mean(cs.size), digits=2)), Max: $(maximum(cs.size))")
println("Mean length: $(round(mean(cs.length), digits=2))")

Mean size: 6.75, Max: 47
Mean length: 2.6

Analytical chain size distributions

For certain offspring distributions, the chain size distribution has a closed form:

julia

# Poisson → Borel
d_borel = chain_size_distribution(Poisson(0.8))
println("Borel(0.8) mean: $(round(mean(d_borel), digits=2))")

# NegBin → Lagrange inversion formula
d_nb = chain_size_distribution(NegBin(0.8, 0.5))
println("NegBin chain size P(1): $(round(pdf(d_nb, 1), digits=4))")

Borel(0.8) mean: 5.0
NegBin chain size P(1): 0.6202

Likelihood

Given observed chain sizes, evaluate the log-likelihood using loglikelihood with data wrapped in ChainSizes:

julia

data = ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2])

# Compare offspring parameters
for λ in [0.3, 0.5, 0.7, 0.9]
    ll = loglikelihood(data, Poisson(λ))
    println("Poisson($λ): LL = $(round(ll, digits=2))")
end

Poisson(0.3): LL = -12.98
Poisson(0.5): LL = -12.49
Poisson(0.7): LL = -13.4
Poisson(0.9): LL = -14.99

With Negative Binomial offspring:

julia

ll = loglikelihood(data, NegBin(0.9, 0.5))
println("NegBin(R=0.9, k=0.5): LL = $(round(ll, digits=2))")

NegBin(R=0.9, k=0.5): LL = -14.07

Imperfect observation

Account for incomplete case ascertainment by wrapping a model in PartiallyObserved. Each case in a chain is detected independently with the given probability:

julia

data = ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2])
base = BranchingProcess(Poisson(0.9))
full = PartiallyObserved(base, 1.0)
partial = PartiallyObserved(base, 0.7)
println("Full observation:  $(round(loglikelihood(data, full), digits=2))")
println("70% observation:   $(round(loglikelihood(data, partial), digits=2))")

Full observation:  -14.99
70% observation:   -16.16

If you call PartiallyObserved with only the detection probability, you get a function that wraps whatever model you apply it to. Julia's pipe then composes:

julia

ll_pipe = loglikelihood(data, base |> PartiallyObserved(0.7))
println("Via pipe: $(round(ll_pipe, digits=2))")

Via pipe: -16.16

Stacking the same wrapper multiplies the detection probabilities. Two stages at 0.5 give the same observed distribution as one stage at 0.25:

julia

ll_stacked = loglikelihood(data,
    base |> PartiallyObserved(0.5) |> PartiallyObserved(0.5))
ll_single = loglikelihood(data, base |> PartiallyObserved(0.25))
println("Stacked: $(round(ll_stacked, digits=2))")
println("Single:  $(round(ll_single, digits=2))")

Stacked: -20.02
Single:  -20.02

Offspring counts

If you have direct observations of secondary case counts (who infected whom), wrap them in OffspringCounts:

julia

offspring_data = OffspringCounts([0, 1, 2, 0, 3, 1, 0, 2, 5, 0])
ll = loglikelihood(offspring_data, Poisson(1.4))
println("Offspring LL: $(round(ll, digits=2))")

Offspring LL: -17.25

Chain lengths

julia

length_data = ChainLengths([0, 1, 0, 2, 1, 0, 0, 3, 0, 1])
ll = loglikelihood(length_data, Poisson(0.5))
println("Chain length LL: $(round(ll, digits=2))")

Chain length LL: -12.48

Simulation-based likelihood

For models with interventions, use the simulation-based likelihood by passing a BranchingProcess model instead of a distribution:

julia

model = BranchingProcess(Poisson(2.0), Exponential(5.0))
iso = Isolation(delay = Exponential(2.0))

ll = loglikelihood(ChainSizes([1, 1, 2, 1, 3, 1, 1, 5, 1, 2]), model;
    interventions = [iso],
    attributes = clinical_presentation(incubation_period = LogNormal(1.5, 0.5)),
    n_sim = 1000,
    rng = StableRNG(42),
)
println("LL with interventions: $(round(ll, digits=2))")

LL with interventions: -48.22

Forward simulation and likelihood evaluation share the same code path. Likelihoods can therefore be evaluated under interventions — something not possible when simulation and inference are in separate packages.

Fitting

Use fit to find the maximum likelihood offspring distribution. The same interface works for all data types:

julia

# From offspring counts
offspring_data = OffspringCounts([0, 1, 2, 0, 3, 1, 0, 2, 5, 0])
d = fit(Poisson, offspring_data)
println("Poisson MLE from offspring: R = $(round(mean(d), digits=2))")

Poisson MLE from offspring: R = 1.4

julia

d = fit(NegativeBinomial, offspring_data)
println("NegBin MLE from offspring: R = $(round(mean(d), digits=2)), k = $(round(d.r, digits=2))")

NegBin MLE from offspring: R = 1.4, k = 1.45

julia

# From chain sizes (subcritical only)
rng = StableRNG(42)
model = BranchingProcess(Poisson(0.5), Exponential(5.0))
states = simulate_batch(model, 500; rng = rng)
sizes = Int[]
for s in states
    cs = chain_statistics(s)
    append!(sizes, cs.size)
end

d = fit(Poisson, ChainSizes(sizes))
println("Poisson MLE from chain sizes: R = $(round(mean(d), digits=2))")

┌ Warning: Assignment to `cs` in soft scope is ambiguous because a global variable by the same name exists: `cs` will be treated as a new local. Disambiguate by using `local cs` to suppress this warning or `global cs` to assign to the existing global variable.
└ @ chains.md:168
Poisson MLE from chain sizes: R = 0.48

Bayesian inference with Turing.jl

The loglikelihood functions work directly with Turing.jl via @addlogprob!:

julia

using Turing

@model function chain_model(data)
    R ~ LogNormal(-0.5, 1.0)
    Turing.@addlogprob! loglikelihood(ChainSizes(data), Poisson(R))
end

Chain statistics, likelihood, and fitting ​