Skip to contents

This model deals with the problem of nowcasting, or adjusting for right-truncation in reported count data. This occurs when the quantity being observed, for example cases, hospitalisations or deaths, is reported with a delay, resulting in an underestimation of recent counts. The estimate_truncation() model infers parameters of the underlying delay distribution from multiple snapshots of past data. This can be thought of as a Bayesian form of the chain-ladder nowcasting approach implemented in the baselinenowcast package, with the added benefit of joint uncertainty quantification and delay estimation. For settings requiring time-varying delays or more detailed reporting structure, see the epinowcast package.

Both estimate_truncation() and estimate_dist() return a delay distribution that downstream functions such as estimate_infections(), estimate_secondary(), or a further call to estimate_truncation() can consume. The main difference is the data they expect: estimate_truncation() takes successive snapshots of the same aggregate counts (a reporting triangle), while estimate_dist() takes individual-level (linelist) data with primary and secondary event dates. Because it works from aggregate counts rather than individual records, estimate_truncation() also fits an observation model for the counts on top of the delay, whereas estimate_dist() estimates the delay distribution alone. As a rough decision rule, use estimate_dist() when you have a linelist and estimate_truncation() when you have repeated snapshots of aggregate counts.

Model

Given snapshots CtiC^{i}_{t} reflecting reported counts for time tt where i=1Si=1\ldots S is in order of recency (earliest snapshots first) and SS is the number of past snapshots used for estimation, we infer the parameters 𝛉\boldsymbol{\theta} of a discrete truncation distribution with cumulative mass function Z(τ|𝛉)Z(\tau | \boldsymbol{\theta}). The truncation distribution can be any family supported by dist_spec (e.g. log-normal, gamma).

The model assumes that final counts DtD_{t} are related to observed snapshots via the truncation distribution such that

Cti<SF(Z(Tit|𝛉)D(t)+σ)\begin{equation} C^{i < S}_{t} \sim F\left(Z(T_i - t | \boldsymbol{\theta}) \cdot D(t) + \sigma\right) \end{equation}

where TiT_i is the date of the final observation in snapshot ii, Z(τ)Z(\tau) is defined to be zero for negative values of τ\tau, σ\sigma is an additive noise term (controlled via the noise argument), and FF is the observation model (Poisson or negative binomial, controlled via obs_opts()).

The final counts DtD_{t} are estimated from the most recent snapshot as

Dt=CtSZ(TSt|𝛉)\begin{equation} D_t = \frac{C^{S}_{t}}{Z(T_S - t | \boldsymbol{\theta})} \end{equation}

Priors

𝛉as specified by 𝚝𝚛𝚞𝚗𝚌_𝚘𝚙𝚝𝚜()φas specified by 𝚘𝚋𝚜_𝚘𝚙𝚝𝚜()(negative binomial only)σas specified by 𝚗𝚘𝚒𝚜𝚎\begin{align} \boldsymbol{\theta} &\sim \text{as specified by } \texttt{trunc\_opts()} \\ \varphi &\sim \text{as specified by } \texttt{obs\_opts()} \quad \text{(negative binomial only)} \\ \sigma &\sim \text{as specified by } \texttt{noise} \end{align}