Testing • covidregionaldata

Summary

This PR moves testing into the class methods, allowing tests to be ran interactively from within the country class object, and from within the existing testing infrastructure.

DataClass is the parent class of all country classes, such as Italy, UK, USA, etc. The generic testing function is the method test within DataClass.

Interactively, tests can be ran by doing the following:

library(covidregionaldata)
ukdata <- UK$new(level = "1", verbose = FALSE)
# could use anything but I used the acutal test one here for simplicity
ukdata$test(snapshot_dir = "../tests/testthat/custom_data/")
#> Test passed 😸
#> Test passed 🎊
#> Test passed 🎉
#> Test passed 🎊
#> Test passed 😸
#> Test passed 🎉
#> Test passed 🎊
#> Test passed 🎉
#> Test passed 🎉
#> Test passed 🌈

Here, I break down the different components of this function to walk through how the tests are run.

ukdata$test
#> function (download = FALSE, snapshot_dir = paste0(tempdir(), 
#>     "/snapshots"), all = FALSE, ...) 
#> {
#>     snapshot_file_name <- paste0(class(self)[1], "_level_", self$level, 
#>         ".rds")
#>     dir.create(snapshot_dir, showWarnings = FALSE)
#>     message_verbose(verbose = self$verbose, paste("snapshot to be saved at", 
#>         snapshot_dir))
#>     snapshot_path <- file.path(snapshot_dir, snapshot_file_name)
#>     self_copy <- self$clone()
#>     test_download(DataClass_obj = self_copy, download = download, 
#>         snapshot_path = snapshot_path)
#>     test_cleaning(DataClass_obj = self_copy)
#>     test_processing(DataClass_obj = self_copy, all = all)
#>     test_return(DataClass_obj = self_copy)
#>     if ("specific_tests" %in% names(self_copy)) {
#>         specific <- paste0("self$specific_tests(\n            self_copy = self_copy,\n            download = download,\n            all = all,\n            snapshot_path = snapshot_path,\n            ...\n          )")
#>         eval(parse(text = specific))
#>     }
#> }
#> <environment: 0x560367084938>

For a given country, such as the UK, you would make this by calling something like ukdata <- UK$new(level = "1") and can run the steps one by one by calling the respective methods: ukdata$download(); ukdata$clean(); ukdata$process(). To run the tests you would call ukdata$test(shapshot_dir = "place/2/save/snapshots") By default, if snapshot_dir is not given it will use a generic snapsots directory. The snapshot_dir argument specifies a directory to save data snapshots to and if it doesn’t exist it will make it for you. The file is constructed internally from the object you are testing and the data level, e.g. Uk_level_1.rds. Rather than running tests on your active class, the code first makes a clone of your class which is then used for tests: self_copy <- self$clone() The snapshot path is the path to a rds file where you either have stored some raw downloaded data to test, or where you want a downloaded snapshot of the data to end up. This is handled by the function test_download(). The cloned class is passed to this function, along with the download parameter, which dictates whether to overwrite the snapshot file provided.

test_download
#> function (DataClass_obj, download, snapshot_path) 
#> {
#>     if (!file.exists(snapshot_path)) {
#>         download <- TRUE
#>     }
#>     if (download) {
#>         testthat::test_that(paste0(DataClass_obj$data_name, " downloads sucessfully"), 
#>             {
#>                 DataClass_obj$download()
#>                 walk(DataClass_obj$data$raw, function(data) {
#>                   testthat::expect_s3_class(data, "data.frame")
#>                   testthat::expect_true(nrow(data) > 0)
#>                   testthat::expect_true(ncol(data) >= 2 || typeof(data[[1]]) == 
#>                     "list")
#>                 })
#>             })
#>         DataClass_obj$data$raw <- map(DataClass_obj$data$raw, 
#>             slice_tail, n = 250)
#>         DataClass_obj$data$raw <- map(DataClass_obj$data$raw, 
#>             ~.[, seq_len(min(100, ncol(.)))])
#>         saveRDS(DataClass_obj$data$raw, snapshot_path)
#>     }
#>     else {
#>         DataClass_obj$data$raw <- readRDS(snapshot_path)
#>     }
#> }
#> <bytecode: 0x56036747ce28>
#> <environment: namespace:covidregionaldata>

As shown by the code, if the data is to be downloaded (either through requesting this with the download parameter or by providing a path to a none existent file) then the download method is called on the class copy (DataClass_obj): DataClass_obj$download(). After this has downloaded the code tests the data is a data.frame, is not empty and has at least 2 columns. The code then takes the first 250 rows to work on, as using the whole data set could be very slow and for the purpose of testing is not needed. This sliced data is then saved to the snapshot file provided for use later on. If the data is not to be downloaded, the snapshot of data saved in the snapshot path is loaded as the raw data DataClass_obj$data$raw.

Once the data is downloaded, the cleaning methods are then tested test_cleaning(). Again this function takes the copied class as the argument and runs through tests on the clean data. These tests check the data is a data.frame, is not empty, has more than 2 columns and that the method avaliable_regions returns a list of characters. In addition, expect_clean_cols checks the date column is an s3 class and that region level column is a character in the cleaned data (data$clean).

test_cleaning
#> function (DataClass_obj) 
#> {
#>     testthat::test_that(paste0(DataClass_obj$data_name, " can be cleaned as expected"), 
#>         {
#>             DataClass_obj$clean()
#>             testthat::expect_s3_class(DataClass_obj$data$clean, 
#>                 "data.frame")
#>             testthat::expect_true(nrow(DataClass_obj$data$clean) > 
#>                 0)
#>             testthat::expect_true(ncol(DataClass_obj$data$clean) >= 
#>                 2)
#>             expect_clean_cols(DataClass_obj$data$clean, DataClass_obj$level)
#>         })
#>     testthat::test_that(paste(DataClass_obj$data_name, "highlights available regions as expected"), 
#>         {
#>             testthat::expect_error(DataClass_obj$available_regions(), 
#>                 NA)
#>             testthat::expect_true(class(DataClass_obj$available_regions()) %in% 
#>                 "character")
#>         })
#> }
#> <bytecode: 0x5603674765c8>
#> <environment: namespace:covidregionaldata>

Once cleaning has been tested test_processing() is called to process the data and run tests to check it all works. Again this function takes the clone of the class to work on, the same clone which has been called with the preceding functions. These tests check the data is a data.frame, is not empty, has more than 2 columns. In addition expect_processed_cols checks that processed data columns date, cases_new, cases_total, deaths_new, deaths_total and that region level have the correct types.

In processing there is an extra parameter called all. This parameter, if TRUE runs the processing step with both localised as TRUE and FALSE, making another copy of the object to check the localised data on so not to influence further tests. If FALSE processing is ran with localised set to whatever it is set at prior to test() being invoked.

test_processing
#> function (DataClass_obj, all = FALSE) 
#> {
#>     testthat::test_that(paste0(DataClass_obj$data_name, " can be processed as expected"), 
#>         {
#>             DataClass_obj$process()
#>             testthat::expect_s3_class(DataClass_obj$data$processed, 
#>                 "data.frame")
#>             testthat::expect_true(nrow(DataClass_obj$data$processed) > 
#>                 0)
#>             testthat::expect_true(ncol(DataClass_obj$data$processed) >= 
#>                 2)
#>             expect_processed_cols(DataClass_obj$data$processed, 
#>                 level = DataClass_obj$level, localised = DataClass_obj$localise)
#>             if (all) {
#>                 local_region <- DataClass_obj$clone()
#>                 local_region$localise <- FALSE
#>                 local_region$process()
#>                 expect_processed_cols(local_region$data$processed, 
#>                   level = local_region$level, localised = local_region$localise)
#>             }
#>         })
#> }
#> <bytecode: 0x560369aae350>
#> <environment: namespace:covidregionaldata>

After processing the return method is tested with test_return which check the data is a data.frame, is not empty, has more than 2 columns and that all columns contain data and are not just composed of NAs.

test_return
#> function (DataClass_obj) 
#> {
#>     testthat::test_that(paste0(DataClass_obj$data_name, " can be returned as expected"), 
#>         {
#>             returned <- DataClass_obj$return()
#>             if (any(class(returned) %in% "data.frame")) {
#>                 testthat::expect_s3_class(returned, "data.frame")
#>                 testthat::expect_true(nrow(returned) > 0)
#>                 testthat::expect_true(ncol(returned) >= 2)
#>             }
#>         })
#>     expect_columns_contain_data(DataClass_obj)
#> }
#> <bytecode: 0x56036e790a50>
#> <environment: namespace:covidregionaldata>

These tests form the generic tests applied to all classes. However, country specific tests are then called by calling the method specific_tests if that country has specific tests defined. So for Italy, where there are no specific tests, no specific tests are called, but for UK, WHO and ECDC specific tests are ran though, which are defined in their own country class. These functions should take a clone of the class as an argument (self_copy ) and any additional arguments they may need, such as a path to NHS included data for UK.

Integration with testthat

As well as interactive tests the test() method is also used by testthat when conducting package level tests but with the argument all = TRUE. This is done in the file tests/testthat/custom_tests/test_regional_dataset.R