Introduction

The stackr package provides an easy way to combine predictions from individual time series or panel data models to an ensemble. stackr stacks Models according to the Continuous Ranked Probability Score (CRPS) over k-step ahead predictions. It is therefore especially suited for timeseries and panel data. A function for leave-one-out CRPS may be added in the future. Predictions need to be predictive distributions represented by predictive samples. Usually, these will be sets of posterior predictive simulation draws generated by an MCMC algorithm.

Given some training data with true observed values as well as predictive samples generated from different models, crps_weights finds the optimal (in the sense of minimizing expected cross-validation predictive error) weights to form an ensemble from these models. Using these weights, mixture_from_samples can then provide samples from the optimal model mixture by drawing from the predictice samples of the individual models in the correct proportion. This gives a mixture model solely based on predictive samples and is in this regard superior to other ensembling techniques like Bayesian Model Averaging.

Usage

Here is an example of how to use stackr with a toy data set. The data contains true observed values as well as predictive samples generated by different models.

Load example data and split into train and test data

library("data.table")
splitdate <- as.Date("2020-03-28")
print(example_data)
#>         geography  model sample_id       date predicted observed
#>            <char> <char>     <num>     <Date>     <num>    <num>
#>      1:  Tatooine  ARIMA         1 2020-03-14  1.719445 1.655068
#>      2:  Tatooine  ARIMA         2 2020-03-14  1.896555 1.655068
#>      3:  Tatooine  ARIMA         3 2020-03-14  1.766821 1.655068
#>      4:  Tatooine  ARIMA         4 2020-03-14  1.714007 1.655068
#>      5:  Tatooine  ARIMA         5 2020-03-14  1.726421 1.655068
#>     ---                                                         
#> 103996: Coruscant  Naive       496 2020-04-08  1.460421 1.543976
#> 103997: Coruscant  Naive       497 2020-04-08  1.057846 1.543976
#> 103998: Coruscant  Naive       498 2020-04-08  1.433936 1.543976
#> 103999: Coruscant  Naive       499 2020-04-08  1.719357 1.543976
#> 104000: Coruscant  Naive       500 2020-04-08  0.781818 1.543976
traindata <- example_data[date <= splitdate]
testdata <- example_data[date > splitdate]

Get weights and create mixture

Obtain weights based on trainin data, create mixture based on predictive samples in the testing data.

weights <- stackr::crps_weights(traindata)
test_mixture <- stackr::mixture_from_samples(testdata, weights = weights)
print(test_mixture)
#> Key: <geography, sample_id, date, observed>
#>        geography sample_id       date  observed predicted        model
#>           <char>     <num>     <Date>     <num>     <num>       <char>
#>     1: Coruscant         1 2020-03-29 1.5002089 1.7634959 CRPS_Mixture
#>     2: Coruscant         1 2020-03-30 1.4713551 2.6655171 CRPS_Mixture
#>     3: Coruscant         1 2020-03-31 1.4154616 0.9215388 CRPS_Mixture
#>     4: Coruscant         1 2020-04-01 1.3426560 1.4798433 CRPS_Mixture
#>     5: Coruscant         1 2020-04-02 1.3298251 1.3518078 CRPS_Mixture
#>    ---                                                                
#> 10996:  Tatooine       500 2020-04-04 0.6750499 0.6838004 CRPS_Mixture
#> 10997:  Tatooine       500 2020-04-05 0.7225369 0.4405134 CRPS_Mixture
#> 10998:  Tatooine       500 2020-04-06 0.7532836 0.4858075 CRPS_Mixture
#> 10999:  Tatooine       500 2020-04-07 0.7598158 0.9897033 CRPS_Mixture
#> 11000:  Tatooine       500 2020-04-08 0.7719879 0.4106171 CRPS_Mixture

Score predictions

Score predictions using the scoringutils package (install from CRAN using install.packages("scoringutils")).

# combine data.frame with mixture with predictions from other models
score_df <- data.table::rbindlist(list(testdata, test_mixture), fill = TRUE)


# CRPS score for all predictions using scoringutils
score_df[, crps := scoringutils::crps(unique(observed), t(predicted)),
  by = .(geography, model, date)
]

# summarise scores
score_df[, mean(crps), by = model][, data.table::setnames(.SD, "V1", "CRPS")]

Display other metrics

data.table::setnames(
  score_df, c("date", "observed", "sample_id", "predicted"),
  c("id", "true_values", "sample", "predictions")
)
scoringutils::eval_forecasts(score_df[geography == "Tatooine"])
scoringutils::eval_forecasts(score_df[geography == "Coruscant"])

Methods

Given a cumulative distribution function (CDF) with a finite first moment of a probabilistic forecast, the corresponding Continuous Ranked Probability Score can be written as crps(F,y)=𝔼X|Xβˆ’y|βˆ’12𝔼X,Xβ€²|Xβˆ’Xβ€²|.crps(F,y)=\mathbb{E}_X|X-y|- \frac{1}{2}\mathbb{E}_{X,X^\prime}|X-X^\prime|.\tag{1}

CRPS for one model and one single observation

The CRPS can be used to stack different models to obtain one ensemble mixture model. Let us assume we have data from TT time points t=1,…,Tt = 1, \dots, T in RR regions 1,…,R1, \dots, R. Observations are denoted ytry_{tr}. Predictive samples are generated from KK different models k,…,Kk, \dots, K. For every observation ytry_{tr} the SS predictive samples are denoted x1ktr,…,xSktrx_{1ktr}, \dots, x_{Sktr}.

Using these predictive draws, we can compute the CRPS of the kk-th model for the observation ytry_{tr} at time tt in region rr as

crpsΜ‚ktr=crpsΜ‚(x1ktr,…,xSktr,ytr)=1Sβˆ‘s=1S|xsktrβˆ’ytr|βˆ’12S2βˆ‘s,j=1S|xsktrβˆ’xjktr|. \widehat {crps}_{ktr}= \widehat {crps}_(x_{1ktr}, \dots, x_{Sktr},y_{tr})= \frac{1}{S} \sum_{s=1}^S |x_{sktr}-y_{tr}| - \frac{1}{2S^2} \sum_{s, j=1}^S |x_{sktr}- x_{jktr}|. \tag{2}

CRPS for a mixture of all models and one single observation

Now we want to aggregate predictions from these KK models. When the prediction is a mixture of the KK models with weights w1,…,wsw_1, \dots, w_s, the CRPS can be expressed as

crpsΜ‚agg,tr(w1,…,wK)=1Sβˆ‘k=1Kwkβˆ‘s=1S|xsktβˆ’yt|βˆ’12S2(βˆ‘k=1Kβˆ‘k,kβ€²=1Kwkwkβ€²βˆ‘s,j=1S|xsktβˆ’xjkβ€²t|). \widehat {crps}_{agg, tr} (w_1, \dots, w_K) = \frac{1}{S} \sum_{k=1}^K w_k \sum_{s=1}^S |x_{skt}-y_t| - \frac{1}{2S^2} (\sum_{k=1}^K \sum_{k, k'=1 }^K w_k w_{k'} \sum_{s, j=1}^S |x_{skt}- x_{jk't}| ). \tag{3}

This expression is quadratic in ww. We only need to compute βˆ‘s=1S|xsktβˆ’ytr|\sum_{s=1}^S |x_{skt}-y_{tr}|, βˆ‘s,j=1S|xsktrβˆ’xjktr|\sum_{s, j=1}^S |x_{sktr}- x_{jktr}|, and βˆ‘s,j=1S|xsktrβˆ’xjkβ€²tr|\sum_{s, j=1}^S |x_{sktr}- x_{jk'tr}| for all k,kβ€²k, k' pairs once and store them for all weight values in the optimization.

Obtaining the weights that minimize CRPS

The CRPS for the mixture of all models for all observations can simply obtained by summing up the individual results from equation 3 over all regions and time points. However, we can also weight predictions from different time points and regions differently. This makes sense for example if we have very little data in the beginning or if current older observations are less characteristic of the current and future dynamics. In this case we might want to downweight the early phase in the final model evaluation. Similarly, we might want to give more or less weight to certain regions. Mathematically we can introduce a time-varying weight Ξ»1,…,Ξ»T\lambda_1, \dots, \lambda_T, e.g.Β Ξ»t=2βˆ’(1βˆ’t/T)2\lambda_t = 2-(1-t/T)^2 to penalize earler estimates. Likewise we can introduce a region-specific weight Ο„r\tau_r.

To obtain the CRPS weights we finally solve a quadratic optimization:

minw1,…,wKβˆ‘t=1Tβˆ‘r=1RΞ»tΟ„rcrpsΜ‚agg,tr(w),s.t.0≀w1,…,wK≀1,βˆ‘k=1Kwk=1.\begin{align*} &\min_{w_1, \dots, w_K} \sum_{t=1}^T \sum_{r=1}^R\lambda_t\tau_r \widehat {crps}_{agg, tr} (w), \\ &s.t. ~{0\leq w_1, \dots, w_K \leq 1, \sum_{k=1}^K w_k=1}. \end{align*}

References

  • Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp.Β 917–1003 doi:10.1214/17-BA1091
  • Strictly Proper Scoring Rules, Prediction,and Estimation, Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American Statistical Association, Volume 102, 2007 - Issue 477 doi:10.1198/016214506000001437
  • Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored, Bertrand Clarke, 2003, Journal of Machine Learning Research 4
  • Bayesian Model Weighting: The Many Faces of Model Averaging, Marvin HΓΆge, Anneli Guthke and Wolfgang Nowak, 2020, Water, doi:10.3390/w12020309
  • Bayesian Stacking and Pseudo-BMA weights using the loo package, Aki Vehtari and Jonah Gabry, 2019, https://mc-stan.org/loo/articles/loo2-weights.html