A parent class containing non-dataset specific methods.
Details
All data sets have shared methods for extracting geographic codes, downloading, processing, and returning data. These functions are contained within this parent class and so are accessible by all data sets which inherit from here. Individual data sets can overwrite any functions or fields providing they define a method with the same name, and can be extended with additional functionality. See the individual method documentaion for further details.
See also
Data interface functions
CountryDataClass
,
get_available_datasets()
,
get_national_data()
,
get_regional_data()
,
initialise_dataclass()
Public fields
origin
the origin of the data source. For regional data sources this will usually be the name of the country.
data
Once initialised, a list of named data frames: raw (list of named raw data frames) clean (cleaned data) and processed (processed data). Data is accessed using
$data
.supported_levels
A list of supported levels.
supported_region_names
A list of region names in order of level.
supported_region_codes
A list of region codes in order of level.
region_name
string Name for the region column, e.g. 'region'. This field is filled at initialisation with the region name for the specified level (supported_region_names$level).
code_name
string Name for the codes column, e.g. 'iso_3166_2' Filled at initialisation with the code name associated with the requested level (supported_region_codes$level).
codes_lookup
string or tibble Region codes for the target origin filled by origin specific codes in
set_region_codes()
data_urls
List of named common and shared url links to raw data. Prefers shared if there is a name conflict.
common_data_urls
List of named links to raw data that are common across levels. The first entry should be named main.
level_data_urls
List of named links to raw data that are level specific. Any urls that share a name with a url from
common_data_urls
will be selected preferentially. Each top level list should be named after a supported level.source_data_cols
existing columns within the raw data
level
target region level. This field is filled at initialisation using user inputs or defaults in
$new()
data_name
string. The country name followed by the level. E.g. "Italy at level 1"
totals
Boolean. If TRUE, returns totalled data per region up to today's date. This field is filled at initialisation using user inputs or defaults in
$new()
localise
Boolean. Should region names be localised. This field is filled at initialisation using user inputs or defaults in
$new()
verbose
Boolean. Display information at various stages. This field is filled at initialisation. using user inputs or defaults in
$new()
steps
Boolean. Keep data from each processing step. This field is filled at initialisation.using user inputs or defaults in
$new()
target_regions
A character vector of regions to filter for. Used by the
filter method
.process_fns
array, additional, user supplied functions to process the data.
filter_level
Character The level of the data to filter at. Defaults to the target level.
Methods
Method new()
Initialize function used by all DataClass
objects.
Set up the DataClass
class with attributes set to input parameters.
Should only be called by a DataClass
class object.
Usage
DataClass$new(
level = "1",
filter_level,
regions,
totals = FALSE,
localise = TRUE,
verbose = TRUE,
steps = FALSE,
get = FALSE,
process_fns
)
Arguments
level
A character string indicating the target administrative level of the data with the default being "1". Currently supported options are level 1 ("1) and level 2 ("2").
filter_level
A character string indicating the level to filter at. Defaults to the level of the data if not specified and if not otherwise defined in the class. Use
get_available_datasets()
for supported options by dataset.regions
A character vector of target regions to be assigned to the
target_regions
field if present.totals
Logical, defaults to FALSE. If TRUE, returns totalled data per region up to today's date. If FALSE, returns the full dataset stratified by date and region.
localise
Logical, defaults to TRUE. Should region names be localised.
verbose
Logical, defaults to TRUE. Should verbose processing
steps
Logical, defaults to FALSE. Should all processing and cleaning steps be kept and output in a list.
get
Logical, defaults to FALSE. Should the class
get
method be called (this will download, clean, and process data at initialisation).process_fns
Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is
set_negative_values_to_zero
. if process_fns is not set (seeprocess_fns
field for all defaults). If you want to keep this when supplying your own processing functions remember to add it to your list also. If you feel you have created a cool processing function that others could benefit from please submit a Pull Request to our github repository and we will consider adding it to the package.
Method download()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
Method download_JSON()
Download raw data from data_urls
, stores a named list
of the data_url
name and the corresponding raw data table in
data$raw
. Designed as a drop-in replacement for download
so
it can be used in sub-classes.
Method clean()
Cleans raw data (corrects format, converts column types,
etc). Works on raw data and so should be called after
download()
Calls the specific class specific cleaning method (clean_common
)
followed by level specific cleaning methods.
clean_level_[1/2]
. Cleaned data is stored in data$clean
Method clean_common()
Cleaning methods that are common across a class.
By default this method is empty as if any code is required it should be
defined in a child class specific clean_common
method.
Method available_regions()
Show regions that are available to be used for
filtering operations. Can only be called once clean()
has been
called. Filtering level is determined by checking the filter_level
field.
Method process()
Processes data by adding and calculating absent columns.
Called on clean data (after clean()
).
Some countries may have data as new events (e.g. number of
new cases for that day) whilst others have a running total up to that
date. Processing calculates these based on what the data comes with
via the functions region_dispatch()
and process_internal()
,
which does the following:
Adds columns not present in the data
add_extra_na_cols()
Ensures there are no negative values
set_negative_values_to_zero()
Removes NA dates
fill_empty_dates_with_na()
Calculates cumulative data
complete_cumulative_columns()
Calculates missing columns from existing ones
calculate_columns_from_existing_data()
Arguments
process_fns
Array, additional functions to process the data. Users can supply their own functions here which would act on clean data and they will be called alongside our default processing functions. The default optional function added is
set_negative_values_to_zero
. if process_fns is not set (seeprocess_fns
field for all defaults).
Method get()
Get data related to the data class. This runs each distinct
step in the workflow in order.
Internally calls download()
,
clean()
,
filter()
and
process()
download
, clean
, filter
and process
methods.
Method return()
Return data. Designed to be called after
process()
this uses the steps argument to return either a
list of all the data preserved at each step or just the processed data.
For most datasets a custom method should not be needed.
Method summary()
Create a table of summary information for the data set being processed.
Method test()
Run tests on a country class instance. Calling test()
on a
class instance runs tests with the settings in use. For example, if you
set level = "1"
and localise = FALSE
the tests will be run on level 1
data which is not localised. Rather than downloading data for a test
users can provide a path to a snapshot file of data to test instead.
Tests are run on a clone of the class. This method calls generic tests
for all country class objects. It also calls country specific tests
which can be defined in an individual country class method called
specific_tests()
. The snapshots contain the first 1000 rows of data.
For more details see the
'testing' vignette: vignette(testing)
.
Arguments
download
logical. To download the data (TRUE) or use a snapshot (FALSE). Defaults to FALSE.
snapshot_dir
character_array the name of a directory to save the downloaded data or read from. If not defined a directory called 'snapshots' will be created in the temp directory. Snapshots are saved as rds files with the class name and level: e.g.
Italy_level_1.rds
.all
logical. Run tests with all settings (TRUE) or with those defined in the current class instance (FALSE). Defaults to FALSE.
...
Additional parameters to pass to
specific_tests