Overview
We make use of Gaussian Processes in several places in
EpiNow2. For example, the default model for
estimate_infections() uses a Gaussian Process to model the
1st order difference on the log scale of the reproduction number. This
vignette describes the implementation details of the approximate
Gaussian Process used in EpiNow2.
Definition
The one-dimensional Gaussian Processes
()
we use can be written as
where
and
are the mean and covariance functions, respectively. In our case as set
out above, we have
with the following choices available for the kernel
Matérn 3/2 covariance kernel (the default)
with
and
the length scale and magnitude, respectively, of the kernel. Note that
here and later we use a slightly different definition of
compared to Riutort-Mayol et al.[1], where this is defined as our
.
Squared exponential kernel
Ornstein-Uhlenbeck (Matérn 1/2) kernel
Matérn 5/2 covariance kernel
Hilbert space approximation
In order to make our models computationally tractable, we approximate
the Gaussian Process using a Hilbert space approximation to the Gaussian
Process[1], centred around mean
zero.
with
the number of basis functions to use in the approximation, which we
calculate from the number of time points
to which the Gaussian Process is being applied (rounded up to give an
integer value), as is recommended[1].
and values of
given by
where
is a positive number termed boundary condition, and
are regression weights with standard normal prior
The function
is the spectral density relating to a particular covariance function
.
In the case of the Matérn kernel of order
this is given by
For
(the default in EpiNow2) this simplifies to
For
it is
and for
it is
In the case of a squared exponential the spectral kernel density is
given by
The functions
are the eigenfunctions of the Laplace operator,
with time rescaled linearly to be between -1 and 1,
Relevant default priors are
with
additionally constrained with an upper bound of
and
and
calculated using a mean of 21 and standard deviation of 7.
Furthermore, by default we set.
These values as well as the prior distributions of relevant
parameters can all be changed by the user.
Modelling the reproduction number
The default estimate_infections() model uses the GP
to model the first-order difference of
,
with the trajectory itself built by cumulative summation of GP values
.
Writing
for the length of the observation window and
for the initial reproduction number, this construction is
Without further constraint this parameterisation makes inference
difficult: a constant added to
and subtracted from the cumulative sum of GP values leaves
unchanged for all
,
so
and the long-run drift of the GP are not jointly identified by the data.
The likelihood is flat along this direction (a ridge in the joint
posterior), giving the sampler poor geometry and, on some random seeds,
stuck chains and large
values.
This parallels a common issue in standard regression problems, where
the intercept and the slope can trade off against each other unless the
predictors are centred. There, a common fix is to centre
the predictors so the intercept becomes the response value at the
mean of the predictors rather than at zero, which decorrelates the
intercept from the slope and gives the sampler a cleaner geometry; this
is how, for example, brms
parameterises intercepts internally. Here the cumulative GP increments
play the role of the slope (they determine the drift of
)
and
plays the role of the intercept.
To apply the same fix we look for an alternative parameter that
is well-identified by the data. Averaging the construction
above over the observation window gives
where
is the empirical mean over the observation window of the cumulative
sum of GP values. Unlike
,
the mean of
over the observation window is informed by the entire data series and so
does not suffer from the ridge. We therefore sample
as the internal parameter, recover
,
and rewrite the construction as
Centring only over the observation window (not the forecast horizon)
ensures that the fitted historical
does not shift when the forecast horizon is changed.
The user’s prior on the initial reproduction number is then applied
to the derived quantity
via a change
of variables. The Jacobian of this transform is
so we need to add a Jacobian correction of
to the target log density.
References
1.
Riutort-Mayol, G., Bürkner, P.-C., Andersen, M.
R., Solin, A., & Vehtari, A. (2020).
Practical hilbert space
approximate bayesian gaussian processes for probabilistic
programming.
https://arxiv.org/abs/2004.11408