For correspondence: sebastian.funk@lshtm.ac.uk

Repeated measurements of cross-sectional prevalence of Polymerase Chain Reaction (PCR) positivity or seropositivity provide rich insight into the dynamics of an infection. The UK Office for National Statistics (ONS) Community Infection Survey publishes such measurements for SARS-CoV-2 on a weekly basis based on testing enrolled households, contributing to situational awareness in the country. Here we present estimates of time-varying and static epidemiological quantities that were derived from the estimates published by ONS.

We used a gaussian process to model incidence of infections and then estimated observed PCR prevalence by convolving our modelled incidence estimates with a previously published PCR detection curve describing the probability of a positive test as a function of the time since infection. We refined our incidence estimates using time-varying estimates of antibody prevalence combined with a model of antibody positivity and waning that moved individuals between compartments with or without antibodies based on estimates of new infections, vaccination, probability of seroconversion and waning.

We produced incidence curves of infection describing the UK epidemic from late April 2020 until early 2022. We used these estimates of incidence to estimate the time-varying growth rate of infections, and combined them with estimates of the generation interval to estimate time-varying reproduction numbers. Biological parameters describing seroconversion and waning, while based on a simple model, were broadly in line with plausible ranges from individual-level studies.

Beyond informing situational awareness and allowing for estimates using individual-level data, repeated cross-sectional studies make it possible to estimate epidemiological parameters from population-level models. Studies or public health surveillance methods based on similar designs offer opportunities for further improving our understanding of the dynamics of SARS-CoV-2 or other pathogens and their interaction with population-level immunity.

Infectious disease surveillance serves to monitor the health of populations and identify new threats as quickly as possible after they arise (Murray & Cohen, 2017). It is often based on healthcare-based reporting systems whereby primary care providers or hospitals report numbers of individuals identified as likely cases of a disease to central authorities where these numbers are collated and reported as aggregates. During the Covid-19 pandemic in the United Kingdom, reporting of cases has mostly involved collating numbers of laboratory-identified infections with SARS-CoV-2 via self-reporting, community testing sites or hospitals.

A separate and independent system of collating information on the state of the pandemic has been run by the Office for National Statistics (ONS) via its Community Infection Survey, which conducts repeated cross-sectional surveys of Polymerase Chain Reaction (PCR) positivity indicating infection with SARS-CoV-2, as well as antibody seroprevalence via household visits (Pouwels et al., 2020). By adjusting for biases in the sampled population, the study has been used to estimate daily population-wide estimates of infection prevalence, unaffected by testing capacity or reporting behaviour that often varies by age as well as sociodemographic or other factors.

While repeated randomised cross-sectional sampling of positivity and antibodies provides utility in themselves for tracking an epidemic in real time, they can also be used for estimating epidemiological quantities by combining them with information on infection kinetics and immunological responses. Here we present a semi-mechanistic model that combines PCR positivity curves, generation interval estimates and vaccination data with ONS PCR positivity and antibody data to estimate infection incidence and its growth rates, reproduction numbers and rates of antibody waning.

We obtained the published estimates of daily prevalence of Polymerase Chain Reaction (PCR) positivity beginning on 26 April, 2020, from the ONS Community infection survey separately by nation, region, age group and variant, alongside their 95% credible intervals, from the published spreadsheets on the ONS web site. ONS estimates of a given prevalence vary between publication dates as the internal model to calculate prevalence involves smoothing, such that new data points in the present affect the estimates of times past. We aggregated estimates of PCR positivity for a single day produced for different publication dates by calculating the central estimate and credible intervals as the medians of the different respective central estimates and credible intervals.

We developed a Bayesian model to estimate epidemiological quantities from ONS PCR positivity estimates and, optionally, population level antibody prevalence estimates and vaccination coverage.

We estimated the population proportion newly infected in the population \(I(t)\) as a latent variable that is convolved with a PCR positivity curve \(p(s)\), the probability of someone infected at time \(s=0\) to test PCR positive to yield prevalence of PCR positivity \(P(t)\). \[ P(t) = \sum_{s= 0}^{t_\text{p,max}} p(s) I(t - s) \] where \(t_\text{p,max}=60\) is the maximum time modelled for which a person can stay PCR positive. We assumed each \(p(s)\) to have an independent normal prior distribution at each time \(s\) after infection with given mean and standard deviation estimates from the posterior estimates of another study (Hellewell et al., 2021). Infection incidence \(I(t)\) is distinct from the estimates of PCR positivity incidence provided by ONS alongside the prevalence estimates, as it allows for the probability of infections yielding negative PCR results as a function of the time since infection and is indexed by date of infection rather than the date of first testing positive.

We used Gaussian Process (GP) priors to ensure smoothness of the estimates and deal with data gaps, whereby alternatively either \(I(t)\) is has a GP prior with exponential quadratic kernel. To reduce the computational requirements of our approach we used an approximate rather than exact GP (Riutort-Mayol et al., 2020). \[ \begin{aligned} \text{logit} \left( I(t) \right) &\sim i_0 + i(t)\\ i(t) &\sim \text{GP}(t) \end{aligned} \] where \(i_0\) is the estimated mean of the GP, or the GP prior is applied to higher order differences when infections are non-stationary, for example the growth rate such as \[ i(t) - i(t - 1) \sim \text{GP}(t) \] which implies that growth tends to zero when outside the range of the data, usually leading to better real-time performance (Abbott et al., 2020). The results shown here were obtained using this formulation with a GP prior on the growth rate.

We assumed that the probability of observing prevalence \(Y_{\text{P}, t}\) at time \(t\) was given by independent normal distributions with mean \(P(t)\) and standard deviation \[\sigma_{\text{P}, t} = \sqrt{\sigma_\text{P}^2 + Y^\sigma_{\text{P}^2, t}}\] where \(\sigma_\text{P}\) was estimated as part of the inference procedure and \(Y^\sigma_{\text{P}, t}\) calculated based on the reported credible intervals in the ONS data, assuming independent normal errors. For data sets where only weekly estimates were reported by ONS, for example at the sub-regional level, we calculated average prevalence across the time period reported from our daily prevalence estimates.

Using the estimate infection incidences \(I(t)\) we estimated growth rates \(r(t)\) as \[ r(t) = \log I(t) - \log I(t - 1) \] and reproduction numbers \(R(t)\) using the renewal equation as \[ R(t) = \frac{I(t)}{\sum_{s=0}^{t_\text{g,max}} g(s) I(t - s)} \] where \(g(s)\) is the distribution of the generation interval since the time of infection (Fraser, 2007). We assumed a maximum generation interval of \(t_\text{g,max}=14\). We use re-estimated generation intervals from early in the pandemic in Singapore as reported previously (Abbott et al., 2020).

When additionally using antibodies we convolve the modelled infections \(I(t)\) as well as input data on vaccinations \(Y_{\text{V}, t}\) with distributions quantifying the delay to generating detectable antibodies following infection (by default set to 4 weeks for both infection and vaccination), yielding potentially antibody-generating time series from infection \(I_{\text{A}}\) and \(V_{\text{A}}\). We then calculate antibodies from infection as \[ A_{\text{I}}(t) = A_{\text{I}}(t - 1) + \beta I_{\text{A}}(t) (1 - A(t - 1))^k - \gamma_\text{I} A_{\text{I}}(t - 1) \] and antibodies from vaccination as \[ A_{\text{V}}(t) = A_{\text{V}}(t - 1) + \delta V_{\text{A}}(t) (1 - A(t - 1))^l - \gamma_\text{V} A_{\text{V}}(t - 1) \] with the total population proportion with antibodies given as the sum of the two, \[ A(t) = A_{\text{I}}(t) + A_{\text{V}}(t) \]

Here, the additional parameter \(\beta\) can be interpreted as proportion of new infections that does not increase the population proportion with antibodies, either due to lack of seroconversion or because they are breakthrough infections in those with existing antibodies, and parameters \(k\) and \(l\) govern the degree to which new seropositives preferentially arise in those not seropositive so far. Additional parameters \(\gamma_\text{I}\) and \(\gamma_\text{V}\) can be interpreted as rates of waning from natural infection and vaccination, respectively. This formulation implies simplifying assumptions that the rate of waning of detectable antibodies is exponential, that vaccine doses are allocated randomly amongst those with or without existing antibodies, and that the proportion of new vaccinations that lead to seroconversion \(\delta\) is constant and independent of age, vaccine use, and dose number.

The model was implemented in *Stan* and using the *cmdstanr* R package (Gabry & ÄŒeÅ¡novar, 2021; Stan Development Team, 2022).
All code needed to reproduce the results shown here is available at https://github.com/epiforecasts/inc2prev.