Skip to contents

Forecast using branching processes at a target date

Usage

forecast(
  obs,
  forecast_date = max(obs$date),
  seq_date = forecast_date,
  case_date = forecast_date,
  data_list = forecast.vocs::fv_as_data_list,
  inits = forecast.vocs::fv_inits,
  fit = forecast.vocs::fv_sample,
  posterior = forecast.vocs::fv_tidy_posterior,
  extract_forecast = forecast.vocs::fv_extract_forecast,
  horizon = 4,
  r_init = c(0, 0.25),
  r_step = 1,
  r_forecast = TRUE,
  beta = c(0, 0.1),
  lkj = 0.5,
  period = NULL,
  special_periods = c(),
  voc_scale = c(0, 0.2),
  voc_label = "VOC",
  strains = 2,
  variant_relationship = "correlated",
  overdispersion = TRUE,
  models = NULL,
  likelihood = TRUE,
  output_loglik = FALSE,
  debug = FALSE,
  keep_fit = TRUE,
  scale_r = 1,
  digits = 3,
  timespan = 7,
  probs = c(0.05, 0.2, 0.8, 0.95),
  id = 0,
  ...
)

Arguments

obs

A data.frame with the following variables: date, cases, seq_voc, and seq_total, cases_available, and seq_available. seq_available and case_available must be uniquely define data rows but other rows can be duplicated based on data availability. This data format allows for multiple versions of case and sequence data for a given date with different reporting dates. This is important when using the package in evaluation settings or in real-time where data sources are liable to be updated as new data becomes available. See germany_covid19_delta_obs for an example of a supported data set.

forecast_date

Date at which to forecast. Defaults to the maximum date in obs.

seq_date

Date from which to use available sequence data. Defaults to the date.

case_date

Date from which to use available case data. Defaults to the date.

data_list

A function that returns a list of data as ingested by the inits and fit function. Must use arguments as defined in fv_as_data_list(). If not supplied the package default fv_as_data_list() is used.

inits

A function that returns a function to samples initial conditions with the same arguments as fv_inits(). If not supplied the package default fv_inits() is used.

fit

A function that fits the supplied model with the same arguments and return values as fv_sample(). If not supplied the package default fv_sample() is used which performs MCMC sampling using cmdstanr.

posterior

A function that summarises the output from the supplied fitting function with the same arguments and return values (depending on the requirement for downstream package functionality to function) as fv_tidy_posterior(). If not supplied the package default fv_tidy_posterior() is used.

extract_forecast

A function that extracts the forecast from the summarised posterior. If not supplied the package default fv_extract_forecast() is used.

horizon

Integer forecast horizon. Defaults to 4.

r_init

Numeric vector of length 2. Mean and standard deviation for the normal prior on the initial log growth rate.

r_step

Integer, defaults to 1. The number of observations between each change in the growth rate.

r_forecast

Logical, defaults TRUE. Should the growth rate be forecast beyond the data horizon.

beta

Numeric vector, defaults to c(0, 0.5). Represents the mean and standard deviation of the normal prior (truncated at 1 and -1) on the weighting in the differenced AR process of the previous difference. Placing a tight prior around zero effectively reduces the AR process to a random walk on the growth rate.

lkj

Numeric defaults to 0.5. The assumed prior covariance between variants growth rates when using the "correlated" model. This sets the shape parameter for the Lewandowski-Kurowicka-Joe (LKJ) prior distribution. If set to 1 assigns a uniform prior for all correlations, values less than 1 indicate increased belief in strong correlations and values greater than 1 indicate increased belief weaker correlations. Our default setting places increased weight on some correlation between strains.

period

Logical defaults to NULL. If specified should be a function that accepts a vector of dates. This can be used to assign periodic effects to dates which will then be adjusted for in the case model. An example is adjusting for day of the week effects for which the fv_dow_period() can be used.

special_periods

A vector of dates to pass to the period function argument with the same name to be treated as "special" for example holidays being treated as sundays in fv_dow_period().

voc_scale

Numeric vector of length 2. Prior mean and standard deviation for the initial growth rate modifier due to the variant of concern.

voc_label

A character string, default to "VOC". Defines the label to assign to variant of concern specific parameters. Example usage is to rename parameters to use variant specific terminology.

strains

Integer number of strains to use. Defaults to 2. Current maximum is 2. A numeric vector can be passed if forecasts from multiple strain models are desired.

variant_relationship

Character string, defaulting to "correlated". Controls the relationship of strains with options being "correlated" (strains growth rates are correlated over time), "scaled" (a fixed scaling between strains), and "independent" (fully independent strains after initial scaling).

overdispersion

Logical, defaults to TRUE. Should the observations used include overdispersion.

models

A model as supplied by fv_model(). If not supplied the default for that strain is used. If multiple strain models are being forecast then models should be a list models.

likelihood

Logical, defaults to TRUE. Should the likelihood be included in the model

output_loglik

Logical, defaults to FALSE. Should the log-likelihood be output. Disabling this will speed up fitting if evaluating the model fit is not required.

debug

Logical, defaults to FALSE. Should within model debug information be returned.

keep_fit

Logical, defaults to TRUE. Should the stan model fit be kept and returned. Dropping this can substantially reduce memory usage in situations where multiple models are being fit.

scale_r

Numeric, defaults to 1. Rescale the timespan over which the growth rate and reproduction number is calculated. An example use case is rescaling the growth rate from weekly to be scaled by the mean of the generation time (for COVID-19 for example this would be 5.5 / 7.

digits

Numeric, defaults to 3. Number of digits to round summary statistics to.

timespan

Integer, defaults to 7. Indicates the number of days between each observation. Defaults to a week.

probs

A vector of numeric probabilities to produce quantile summaries for. By default these are the 5%, 20%, 80%, and 95% quantiles which are also the minimum set required for plotting functions to work (such as plot_cases(), plot_rt(), and plot_voc_frac()).

id

ID to assign to this forecast. Defaults to 0.

...

Additional parameters passed to fv_sample().

Value

A data.frame containing the output of fv_sample() in each row as well as the summarised posterior, forecast and information about the parameters specified.

See also

Functions used for forecasting across models, dates, and scenarios forecast_across_dates(), forecast_across_scenarios(), forecast_n_strain(), plot.fv_forecast(), summary.fv_forecast(), unnest_posterior()

Examples

if (FALSE) { # interactive()
options(mc.cores = 4)

forecasts <- forecast(
  germany_covid19_delta_obs,
  forecast_date = as.Date("2021-06-12"),
  horizon = 4,
  strains = c(1, 2),
  adapt_delta = 0.99,
  max_treedepth = 15,
  variant_relationship = "scaled"
)

# inspect forecasts
forecasts

# extract the model summary
summary(forecasts, type = "model")

# plot case posterior predictions
plot(forecasts, log = TRUE)

# plot voc posterior predictions
plot(forecasts, type = "voc_frac")

# extract the case forecast
summary(forecasts, type = "cases", forecast = TRUE)
}