-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy pathresponses1.Rmd
127 lines (85 loc) · 6.31 KB
/
responses1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
output: github_document
---
## Responses to review 1 of stats19
Thanks for the review.
We've had a chance, after making some changes and fixes to the package, to take-in and act on each of the comments.
The code-base has evolved substantially since the review, but the fundamental design of the package, with its 3 stage API mirroring workflows that happened before the package was developed, remains unchanged.
That is:
- `dl_stats19()` downloads files from the DfT. Good news: we have spoken to the relevant people at the Department for Transport and they assured us that the endpoints are stable. The function now uses `menu()` to provide a menu of download options for any year/type combinations and now finds files outside those explicitly mentioned in the file names.
E.g.:
```{r, eval=FALSE}
dl_stats19(year = 2022)
# Multiple matches. Which do you want to download?
#
# 1: dftRoadSafetyData_Vehicles_2022.zip
# 2: dftRoadSafetyData_Casualties_2022.zip
# 3: dftRoadSafetyData_Accidents_2022.zip
dl_stats19(year = 2022, type = "ac")
# Files identified: dftRoadSafetyData_Accidents_2022.zip
#
# Wanna do it (y = enter, n = esc)?
dl_stats19(year = 1985)
# Year not in range, changing to match 1979:2004 data
# This file is over 240 MB in size.
# Once unzipped it is over 1.8 GB.
# Files identified: Stats19-Data1979-2004.zip
#
# Wanna do it (y = enter, n = esc)?
```
- `read_*()` these functions remain unchanged, except the order of arguments has changed.
Like `dl_stats19()`, `year` is now the first argument, which is more intuitive.
- `format_*()` functions have been refactored. Each now uses `format_stats19()` behind the scenes reducing duplication.
The results are now better: more variables are now labelled.
We'll focus on areas flagged in the review for the rest of this response:
> I would tease a bit more of what's in these data sets. I wasn't entirely sure until I downloaded and opened the supporting documentation. If I were searching for this kind of data, and I didn't know what STATS19 was, I'd like to know I'm in the right place after scanning the README. Maybe a map?
We have added a map (well technically 9 maps!) and a couple of time series plots showing the scale of the data.
Also show a sample of the additional casualty and vehicle tables has been added to show more clearly the richness of data provided.
> I couldn't load the vignette from the console:
We also could not see the vignette when installing using `devtools::install_github(build_vignettes = TRUE`. But we can see the vignette if we install locally.
This was the code we ran:
```{r}
devtools::install(build_vignettes = TRUE)
vignette(package = "stats19")
```
> Several of the examples failed:
These have now been fixed - thanks for testing and reporting.
> I couldn't find any explicit contributing guidelines in the README, and there is no CONTRIBUTING document.
A CONTRIBUTING is added now. Thank you.
> The package has an obvious research application according to JOSS's definition
> There is no paper.md.
One is added with:
- A short summary describing the high-level functionality of the software
- Authors: A list of authors with their affiliations
- A statement of need clearly stating problems the software is designed to solve and its target audience.
- References: with DOIs for all those that have one (e.g. papers, datasets, software).
Review Comments
> A superb and essential package--we need this data and we need it in these formats. The download-format-read-explore workflow is intuitive and relatively frictionless. I have only some brief comments:
Thank you.
> I wonder you could possibly merge the formatting and reading step with a raw = TRUE or format = TRUE argument in the read_* functions. But perhaps that's my tendency towards abstraction. Something like ac = read_collisions(year = 2022, format = TRUE)
Done, appreciate your input.
> My personal preference would be to have the schema from dl_schema lazily loaded with the package.
DESCRIPTION: has the line LazyData which means stats19_schema is lazy loaded.
> According to the vignette, the dl_* functions are interactive, although the interactivity is commented out in the code. Will the interactivity be returning? Or does the vignette need to be updated?
Back in, as stated above.
> Out of curiosity, what's happening with https://github.com/cyipt/stats19? It was updated recently.
@mem48 answered this: cyipt/stats19 is not actually a proper R package. It is a repo containing scripts for CyIPT project, it has different sources (UK DS), and usage so there is no current need to adapt the use to this package. Malcolm is one of the contributors to this package.
> I confess I wish the package name was more expressive--stats19 sounds like an introductory statistics class.
This a reasonable point that we have thought of and discussed.
We are open minded about changing the name but, as with so many things, there are +s and -s (outlined for some options below):
- **stats19data**
- + clarifies that it's about data
- - longer, suffers from some of the same issues that **stats19** suffers from, the package is more about data formatting than data provision
- **roadcrashesUK**
- + explicit, makes region of data access transparent
- - there are other types of road crash data, also the data currently provided is technically for Great Britain, but **roadcrashesGB** doesn't work so well, and we may want to add data access options for Northern Ireland at some point also
- **roadSafetyData**
- + Matches DfT's [webpage](https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data) title on the topic
- - longer and, again, is less specific.
The main benefit we can see of changing the name would be making the package easier to find.
We think good documentation and clear description and some write-ups of the package and what it does could address these issues.
We've explored **stat19** name and it links directly to (and is almost synonymous with) road crash data.
See https://en.wikipedia.org/wiki/STATS19 for an excellent example (we plan to add this link to the README)
so the name is OK for we think, but we're open minded to alternative names mentioned above and perhaps names we've not thought of.
> This data will be used to make many maps. I personally would love a nudge in that direction in either the README or the vignette.
Definitely. Thank you very much for your input.