Forecast using branching processes at a target date
Usage
forecast(
obs,
forecast_date = max(obs$date),
seq_date = forecast_date,
case_date = forecast_date,
data_list = forecast.vocs::fv_as_data_list,
inits = forecast.vocs::fv_inits,
fit = forecast.vocs::fv_sample,
posterior = forecast.vocs::fv_tidy_posterior,
extract_forecast = forecast.vocs::fv_extract_forecast,
horizon = 4,
r_init = c(0, 0.25),
r_step = 1,
r_forecast = TRUE,
beta = c(0, 0.1),
lkj = 0.5,
period = NULL,
special_periods = c(),
voc_scale = c(0, 0.2),
voc_label = "VOC",
strains = 2,
variant_relationship = "correlated",
overdispersion = TRUE,
models = NULL,
likelihood = TRUE,
output_loglik = FALSE,
debug = FALSE,
keep_fit = TRUE,
scale_r = 1,
digits = 3,
timespan = 7,
probs = c(0.05, 0.2, 0.8, 0.95),
id = 0,
...
)
Arguments
- obs
A
data.frame
with the following variables:date
,cases
,seq_voc
, andseq_total
,cases_available
, andseq_available
.seq_available
andcase_available
must be uniquely define data rows but other rows can be duplicated based on data availability. This data format allows for multiple versions of case and sequence data for a given date with different reporting dates. This is important when using the package in evaluation settings or in real-time where data sources are liable to be updated as new data becomes available. See germany_covid19_delta_obs for an example of a supported data set.- forecast_date
Date at which to forecast. Defaults to the maximum date in
obs
.- seq_date
Date from which to use available sequence data. Defaults to the
date
.- case_date
Date from which to use available case data. Defaults to the
date
.- data_list
A function that returns a list of data as ingested by the
inits
andfit
function. Must use arguments as defined infv_as_data_list()
. If not supplied the package defaultfv_as_data_list()
is used.- inits
A function that returns a function to samples initial conditions with the same arguments as
fv_inits()
. If not supplied the package defaultfv_inits()
is used.- fit
A function that fits the supplied model with the same arguments and return values as
fv_sample()
. If not supplied the package defaultfv_sample()
is used which performs MCMC sampling using cmdstanr.- posterior
A function that summarises the output from the supplied fitting function with the same arguments and return values (depending on the requirement for downstream package functionality to function) as
fv_tidy_posterior()
. If not supplied the package defaultfv_tidy_posterior()
is used.- extract_forecast
A function that extracts the forecast from the summarised
posterior
. If not supplied the package defaultfv_extract_forecast()
is used.- horizon
Integer forecast horizon. Defaults to 4.
- r_init
Numeric vector of length 2. Mean and standard deviation for the normal prior on the initial log growth rate.
- r_step
Integer, defaults to 1. The number of observations between each change in the growth rate.
- r_forecast
Logical, defaults
TRUE
. Should the growth rate be forecast beyond the data horizon.- beta
Numeric vector, defaults to c(0, 0.5). Represents the mean and standard deviation of the normal prior (truncated at 1 and -1) on the weighting in the differenced AR process of the previous difference. Placing a tight prior around zero effectively reduces the AR process to a random walk on the growth rate.
- lkj
Numeric defaults to 0.5. The assumed prior covariance between variants growth rates when using the "correlated" model. This sets the shape parameter for the Lewandowski-Kurowicka-Joe (LKJ) prior distribution. If set to 1 assigns a uniform prior for all correlations, values less than 1 indicate increased belief in strong correlations and values greater than 1 indicate increased belief weaker correlations. Our default setting places increased weight on some correlation between strains.
- period
Logical defaults to
NULL
. If specified should be a function that accepts a vector of dates. This can be used to assign periodic effects to dates which will then be adjusted for in the case model. An example is adjusting for day of the week effects for which thefv_dow_period()
can be used.- special_periods
A vector of dates to pass to the
period
function argument with the same name to be treated as "special" for example holidays being treated as sundays infv_dow_period()
.- voc_scale
Numeric vector of length 2. Prior mean and standard deviation for the initial growth rate modifier due to the variant of concern.
- voc_label
A character string, default to "VOC". Defines the label to assign to variant of concern specific parameters. Example usage is to rename parameters to use variant specific terminology.
- strains
Integer number of strains to use. Defaults to 2. Current maximum is 2. A numeric vector can be passed if forecasts from multiple strain models are desired.
- variant_relationship
Character string, defaulting to "correlated". Controls the relationship of strains with options being "correlated" (strains growth rates are correlated over time), "scaled" (a fixed scaling between strains), and "independent" (fully independent strains after initial scaling).
- overdispersion
Logical, defaults to
TRUE
. Should the observations used include overdispersion.- models
A model as supplied by
fv_model()
. If not supplied the default for that strain is used. If multiple strain models are being forecast thenmodels
should be a list models.- likelihood
Logical, defaults to
TRUE
. Should the likelihood be included in the model- output_loglik
Logical, defaults to
FALSE
. Should the log-likelihood be output. Disabling this will speed up fitting if evaluating the model fit is not required.- debug
Logical, defaults to
FALSE
. Should within model debug information be returned.- keep_fit
Logical, defaults to
TRUE
. Should the stan model fit be kept and returned. Dropping this can substantially reduce memory usage in situations where multiple models are being fit.- scale_r
Numeric, defaults to 1. Rescale the timespan over which the growth rate and reproduction number is calculated. An example use case is rescaling the growth rate from weekly to be scaled by the mean of the generation time (for COVID-19 for example this would be 5.5 / 7.
- digits
Numeric, defaults to 3. Number of digits to round summary statistics to.
- timespan
Integer, defaults to 7. Indicates the number of days between each observation. Defaults to a week.
- probs
A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work (such as
plot_cases()
,plot_rt()
, andplot_voc_frac()
).- id
ID to assign to this forecast. Defaults to 0.
- ...
Additional parameters passed to
fv_sample()
.
Value
A data.frame
containing the output of fv_sample()
in each row as
well as the summarised posterior, forecast and information about the
parameters specified.
See also
Functions used for forecasting across models, dates, and scenarios
forecast_across_dates()
,
forecast_across_scenarios()
,
forecast_n_strain()
,
plot.fv_forecast()
,
summary.fv_forecast()
,
unnest_posterior()
Examples
if (FALSE) { # interactive()
options(mc.cores = 4)
forecasts <- forecast(
germany_covid19_delta_obs,
forecast_date = as.Date("2021-06-12"),
horizon = 4,
strains = c(1, 2),
adapt_delta = 0.99,
max_treedepth = 15,
variant_relationship = "scaled"
)
# inspect forecasts
forecasts
# extract the model summary
summary(forecasts, type = "model")
# plot case posterior predictions
plot(forecasts, log = TRUE)
# plot voc posterior predictions
plot(forecasts, type = "voc_frac")
# extract the case forecast
summary(forecasts, type = "cases", forecast = TRUE)
}