Skip to content

Commit

Permalink
Assorted updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Robinlovelace committed Jul 31, 2024
1 parent 62f9067 commit 3ef4ac4
Show file tree
Hide file tree
Showing 6 changed files with 7 additions and 17 deletions.
2 changes: 1 addition & 1 deletion R/dl.R
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ dl_stats19 = function(year = NULL,
message("Data already exists in data_dir, not downloading")
}
} else {
if (interactive() & !many_found) {
if (interactive() && !many_found) {
if (ask) {
resp = readline(phrase())
} else {
Expand Down
2 changes: 1 addition & 1 deletion R/format.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ format_stats19 = function(x, type) {
# See https://github.com/ropensci/stats19/issues/235#issuecomment-2254257770
matched_labels = lookup$label[match(x[[i]], lookup$code)]
x[[i]] = ifelse(is.na(matched_labels), x[[i]], matched_labels)
x[[i]] = as(x[[i]], original_class)
x[[i]] = methods::as(x[[i]], original_class)
}

date_in_names = "date" %in% names(x)
Expand Down
2 changes: 0 additions & 2 deletions R/read.R
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,6 @@ convert_to_col_type = function(type) {
readr::col_guess())
}

data("stats19_variables", package = "stats19")

# Create a named list of column types
unique_vars = unique(stats19_variables$variable)
unique_types = sapply(unique_vars, function(v) {
Expand Down
3 changes: 2 additions & 1 deletion data-raw/tests/test-format_vehicle_issue_235.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,5 @@ table(cas$age_of_casualty)
# 8822 7197 5769 3722 2695 1908 1342 885 546 351 333
# 99 100 101 102 103 104
# 3356 13 7 4 1 1
```
```

10 changes: 1 addition & 9 deletions vignettes/blog.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -109,15 +109,7 @@ There is a schema covering these tables but a good amount of work is needed to u

The annual statistics have not been released in a consistent way either, making it hard for people to download, or even find, the relevant files.
For example, there are separate files for each of the above tables for certain years (e.g. 2016, 2022) but not for all of 1979 - 2022 or 2018 now.
The largest chunk is the 1979 - 2004 data, which is made available in a huge ZIP file ([link](https://data.dft.gov.uk/road-accidents-safety-data/Stats19-Data1979-2004.zip)).
Unzipped this contains the following 3 files, which occupy almost 2 GB on your hard drive:

```sh
721M Apr 3 2013 Accidents7904.csv
344M Apr 3 2013 Casualty7904.csv
688M Apr 3 2013 Vehicles7904.csv
# total 1.753 GB data
```
The largest chunk contains data from 1979.

#### Note

Expand Down
5 changes: 2 additions & 3 deletions vignettes/stats19.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ knitr::opts_chunk$set(
## Introduction

**stats19** enables access to and processing of Great Britain's official road traffic casualty database, [STATS19](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data).
A description of variables in the database can be found in a [document](https://data.dft.gov.uk/road-accidents-safety-data/Brief-guide-to%20road-accidents-and-safety-data.doc) provided by the UK's Department for Transport (DfT).
A description of variables in the database can be found in a [guidance](https://www.gov.uk/guidance/road-accident-and-safety-statistics-guidance) provided by the UK's Department for Transport (DfT).
The datasets are collectively called STATS19 after the form used to report them, which can be found [here](https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/995422/stats19.pdf).
This vignette focuses on how to use the **stats19** package to work with STATS19 data.

Expand Down Expand Up @@ -177,7 +177,7 @@ columns (variables) for
129,982 crashes.

This work was done by `read_collisions(format = FALSE)`, which imported the "raw" STATS19 data without cleaning messy column names or re-categorising the outputs.
`format_collisions()` function automates the process of matching column names with variable names and labels in a [`.xls` file](https://data.dft.gov.uk/road-accidents-safety-data/variable%20lookup.xls) provided by the DfT.
`format_collisions()` function automates the process of matching column names with variable names provided by the DfT.
This means `crashes_2022` is much more usable than `crashes_2022_raw`, as shown below, which shows some key variables in the messy and clean datasets:

```{r crashes2022-columns}
Expand All @@ -204,7 +204,6 @@ crashes_2022[random_n, key_vars]
## Format STATS19 data

It is also possible to import the "raw" data as provided by the DfT.
A [`.xls` file](https://data.dft.gov.uk/road-accidents-safety-data/variable%20lookup.xls) provided by the DfT defines the column names for the datasets provided.
The packaged datasets `stats19_variables` and `stats19_schema` provide summary information about the contents of this data guide.
These contain the full variable names in the guide (`stats19_variables`) and a complete look up table relating integer values to the `.csv` files provided by the DfT and their labels (`stats19_schema`).
The first rows of each dataset are shown below:
Expand Down

0 comments on commit 3ef4ac4

Please sign in to comment.