Skip to contents

Function to transform forecasts and true values before scoring.

Usage

transform_forecasts(data, fun = log_shift, append = TRUE, label = "log", ...)

Arguments

data

A data.frame or data.table with the predictions and observations. For scoring using score(), the following columns need to be present:

  • true_value - the true observed values

  • prediction - predictions or predictive samples for one true value. (You only don't need to provide a prediction column if you want to score quantile forecasts in a wide range format.)

For scoring integer and continuous forecasts a sample column is needed:

  • sample - an index to identify the predictive samples in the prediction column generated by one model for one true value. Only necessary for continuous and integer forecasts, not for binary predictions.

For scoring predictions in a quantile-format forecast you should provide a column called quantile:

  • quantile: quantile to which the prediction corresponds

In addition a model column is suggested and if not present this will be flagged and added to the input data with all forecasts assigned as an "unspecified model").

You can check the format of your data using check_forecasts() and there are examples for each format (example_quantile, example_continuous, example_integer, and example_binary).

fun

A function used to transform both true values and predictions. The default function is log_shift(), a custom function that is essentially the same as log(), but has an additional arguments (offset) that allows you add an offset before applying the logarithm. This is often helpful as the natural log transformation is not defined at zero. A common, and pragmatic solution, is to add a small offset to the data before applying the log transformation. In our work we have often used an offset of 1 but the precise value will depend on your application.

append

Logical, defaults to TRUE. Whether or not to append a transformed version of the data to the currently existing data (TRUE). If selected, the data gets transformed and appended to the existing data frame, making it possible to use the outcome directly in score(). An additional column, 'scale', gets created that denotes which rows or untransformed ('scale' has the value "natural") and which have been transformed ('scale' has the value passed to the argument label).

label

A string for the newly created 'scale' column to denote the newly transformed values. Only relevant if append = TRUE.

...

Additional parameters to pass to the function you supplied. For the default option of log_shift() this could be the offset argument.

Value

A data.table with either a transformed version of the data, or one with both the untransformed and the transformed data. includes the original data as well as a transformation of the original data. There will be one additional column, `scale', present which will be set to "natural" for the untransformed forecasts.

Details

There are a few reasons, depending on the circumstances, for why this might be desirable (check out the linked reference for more info). In epidemiology, for example, it may be useful to log-transform incidence counts before evaluating forecasts using scores such as the weighted interval score (WIS) or the continuous ranked probability score (CRPS). Log-transforming forecasts and observations changes the interpretation of the score from a measure of absolute distance between forecast and observation to a score that evaluates a forecast of the exponential growth rate. Another motivation can be to apply a variance-stabilising transformation or to standardise incidence counts by population.

Note that if you want to apply a transformation, it is important to transform the forecasts and observations and then apply the score. Applying a transformation after the score risks losing propriety of the proper scoring rule.

References

Transformation of forecasts for evaluating predictive performance in an epidemiological context Nikos I. Bosse, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, Sebastian Funk medRxiv 2023.01.23.23284722 doi:10.1101/2023.01.23.23284722 https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1

Author

Nikos Bosse nikosbosse@gmail.com

Examples


library(magrittr) # pipe operator

# transform forecasts using the natural logarithm
# negative values need to be handled (here by replacing them with 0)
example_quantile %>%
  .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
# Here we use the default function log_shift() which is essentially the same
# as log(), but has an additional arguments (offset) that allows you add an
# offset before applying the logarithm.
  transform_forecasts(append = FALSE) %>%
  head()
#> Warning: Detected zeros in input values.Try specifying offset = 1 (or any other offset).
#> Warning: Detected zeros in input values.Try specifying offset = 1 (or any other offset).
#>    location target_end_date target_type true_value location_name forecast_date
#> 1:       DE      2021-01-02       Cases  11.754302       Germany          <NA>
#> 2:       DE      2021-01-02      Deaths   8.419360       Germany          <NA>
#> 3:       DE      2021-01-09       Cases  11.950677       Germany          <NA>
#> 4:       DE      2021-01-09      Deaths   8.718827       Germany          <NA>
#> 5:       DE      2021-01-16       Cases  11.609898       Germany          <NA>
#> 6:       DE      2021-01-16      Deaths   8.677099       Germany          <NA>
#>    quantile prediction model horizon
#> 1:       NA         NA  <NA>      NA
#> 2:       NA         NA  <NA>      NA
#> 3:       NA         NA  <NA>      NA
#> 4:       NA         NA  <NA>      NA
#> 5:       NA         NA  <NA>      NA
#> 6:       NA         NA  <NA>      NA

# alternatively, integrating the truncation in the transformation function:
example_quantile %>%
 transform_forecasts(
   fun = function(x) {log_shift(pmax(0, x))}, append = FALSE
 ) %>%
 head()
#> Warning: Detected zeros in input values.Try specifying offset = 1 (or any other offset).
#> Warning: Detected zeros in input values.Try specifying offset = 1 (or any other offset).
#>    location target_end_date target_type true_value location_name forecast_date
#> 1:       DE      2021-01-02       Cases  11.754302       Germany          <NA>
#> 2:       DE      2021-01-02      Deaths   8.419360       Germany          <NA>
#> 3:       DE      2021-01-09       Cases  11.950677       Germany          <NA>
#> 4:       DE      2021-01-09      Deaths   8.718827       Germany          <NA>
#> 5:       DE      2021-01-16       Cases  11.609898       Germany          <NA>
#> 6:       DE      2021-01-16      Deaths   8.677099       Germany          <NA>
#>    quantile prediction model horizon
#> 1:       NA         NA  <NA>      NA
#> 2:       NA         NA  <NA>      NA
#> 3:       NA         NA  <NA>      NA
#> 4:       NA         NA  <NA>      NA
#> 5:       NA         NA  <NA>      NA
#> 6:       NA         NA  <NA>      NA

# specifying an offset for the log transformation removes the
# warning caused by zeros in the data
example_quantile %>%
  .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
  transform_forecasts(offset = 1, append = FALSE) %>%
  head()
#>    location target_end_date target_type true_value location_name forecast_date
#> 1:       DE      2021-01-02       Cases  11.754310       Germany          <NA>
#> 2:       DE      2021-01-02      Deaths   8.419580       Germany          <NA>
#> 3:       DE      2021-01-09       Cases  11.950683       Germany          <NA>
#> 4:       DE      2021-01-09      Deaths   8.718991       Germany          <NA>
#> 5:       DE      2021-01-16       Cases  11.609907       Germany          <NA>
#> 6:       DE      2021-01-16      Deaths   8.677269       Germany          <NA>
#>    quantile prediction model horizon
#> 1:       NA         NA  <NA>      NA
#> 2:       NA         NA  <NA>      NA
#> 3:       NA         NA  <NA>      NA
#> 4:       NA         NA  <NA>      NA
#> 5:       NA         NA  <NA>      NA
#> 6:       NA         NA  <NA>      NA

# adding square root transformed forecasts to the original ones
example_quantile %>%
  .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
  transform_forecasts(fun = sqrt, label = "sqrt") %>%
  score() %>%
  summarise_scores(by = c("model", "scale"))
#> The following messages were produced when checking inputs:
#> 1.  288 values for `prediction` are NA in the data provided and the corresponding rows were removed. This may indicate a problem if unexpected.
#>                    model   scale interval_score   dispersion underprediction
#> 1: EuroCOVIDhub-baseline natural   11124.930667 2096.9535954    5143.5356658
#> 2: EuroCOVIDhub-baseline    sqrt      27.742316    7.7296761       9.5936380
#> 3: EuroCOVIDhub-ensemble natural    5796.064569 1846.8527819    2120.6402853
#> 4: EuroCOVIDhub-ensemble    sqrt      14.974344    4.2878323       5.1827454
#> 5:  epiforecasts-EpiNow2 natural    7514.375476 2950.7342158    1697.2341137
#> 6:  epiforecasts-EpiNow2    sqrt      17.704899    5.4112770       5.7235785
#> 7:       UMass-MechBayes natural      52.651946   26.8723947      16.8009511
#> 8:       UMass-MechBayes    sqrt       1.328653    0.5993586       0.4019195
#>    overprediction coverage_deviation        bias    ae_median
#> 1:   3884.4414062        0.003369565  0.21816406 16156.871094
#> 2:     10.4190016        0.003369565  0.21816406    39.185406
#> 3:   1828.5715014        0.048716033  0.00812500  8880.542969
#> 4:      5.5037665        0.048716033  0.00812500    22.458900
#> 5:   2866.4071466       -0.055169864 -0.04336032 11208.072874
#> 6:      6.5700431       -0.055169864 -0.04336032    25.585018
#> 7:      8.9786005       -0.023125000 -0.02234375    78.476562
#> 8:      0.3273746       -0.023125000 -0.02234375     2.069103

# adding multiple transformations
example_quantile %>%
  .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
  transform_forecasts(fun = log_shift, offset = 1) %>%
  transform_forecasts(fun = sqrt, label = "sqrt") %>%
  head()
#>    location target_end_date target_type true_value location_name forecast_date
#> 1:       DE      2021-01-02       Cases     127300       Germany          <NA>
#> 2:       DE      2021-01-02      Deaths       4534       Germany          <NA>
#> 3:       DE      2021-01-09       Cases     154922       Germany          <NA>
#> 4:       DE      2021-01-09      Deaths       6117       Germany          <NA>
#> 5:       DE      2021-01-16       Cases     110183       Germany          <NA>
#> 6:       DE      2021-01-16      Deaths       5867       Germany          <NA>
#>    quantile prediction model horizon   scale
#> 1:       NA         NA  <NA>      NA natural
#> 2:       NA         NA  <NA>      NA natural
#> 3:       NA         NA  <NA>      NA natural
#> 4:       NA         NA  <NA>      NA natural
#> 5:       NA         NA  <NA>      NA natural
#> 6:       NA         NA  <NA>      NA natural