Skip to contents

Helper function to set the unit of a single forecast (i.e. the combination of columns that uniquely define a single forecast) manually. This simple function keeps the columns specified in forecast_unit (plus additional protected columns, e.g. for true values, predictions or quantile levels) and removes duplicate rows. If not done manually, scoringutils attempts to determine the unit of a single forecast automatically by simply assuming that all column names are relevant to determine the forecast unit. This may lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read. When used as part of a workflow, set_forecast_unit() can be directly piped into check_forecasts() to check everything is in order.

Usage

set_forecast_unit(data, forecast_unit)

Arguments

data

A data.frame or data.table with the predictions and observations. For scoring using score(), the following columns need to be present:

  • true_value - the true observed values

  • prediction - predictions or predictive samples for one true value. (You only don't need to provide a prediction column if you want to score quantile forecasts in a wide range format.)

For scoring integer and continuous forecasts a sample column is needed:

  • sample - an index to identify the predictive samples in the prediction column generated by one model for one true value. Only necessary for continuous and integer forecasts, not for binary predictions.

For scoring predictions in a quantile-format forecast you should provide a column called quantile:

  • quantile: quantile to which the prediction corresponds

In addition a model column is suggested and if not present this will be flagged and added to the input data with all forecasts assigned as an "unspecified model").

You can check the format of your data using check_forecasts() and there are examples for each format (example_quantile, example_continuous, example_integer, and example_binary).

forecast_unit

Character vector with the names of the columns that uniquely identify a single forecast.

Value

A data.table with only those columns kept that are relevant to scoring or denote the unit of a single forecast as specified by the user.

Examples

set_forecast_unit(
  example_quantile,
  c("location", "target_end_date", "target_type", "horizon", "model")
)
#>        true_value quantile prediction location target_end_date target_type
#>     1:     127300       NA         NA       DE      2021-01-02       Cases
#>     2:       4534       NA         NA       DE      2021-01-02      Deaths
#>     3:     154922       NA         NA       DE      2021-01-09       Cases
#>     4:       6117       NA         NA       DE      2021-01-09      Deaths
#>     5:     110183       NA         NA       DE      2021-01-16       Cases
#>    ---                                                                    
#> 20541:         78    0.850        352       IT      2021-07-24      Deaths
#> 20542:         78    0.900        397       IT      2021-07-24      Deaths
#> 20543:         78    0.950        499       IT      2021-07-24      Deaths
#> 20544:         78    0.975        611       IT      2021-07-24      Deaths
#> 20545:         78    0.990        719       IT      2021-07-24      Deaths
#>        horizon                model
#>     1:      NA                 <NA>
#>     2:      NA                 <NA>
#>     3:      NA                 <NA>
#>     4:      NA                 <NA>
#>     5:      NA                 <NA>
#>    ---                             
#> 20541:       2 epiforecasts-EpiNow2
#> 20542:       2 epiforecasts-EpiNow2
#> 20543:       2 epiforecasts-EpiNow2
#> 20544:       2 epiforecasts-EpiNow2
#> 20545:       2 epiforecasts-EpiNow2