Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH/COPA caveat #91

Open
wengraf opened this issue Feb 26, 2019 · 9 comments
Open

CRASH/COPA caveat #91

wengraf opened this issue Feb 26, 2019 · 9 comments
Labels
help wanted Extra attention is needed

Comments

@wengraf
Copy link
Contributor

wengraf commented Feb 26, 2019

Hi:

This is great stuff - but I'm concerned someone unsure of the background of STATS19 data collection might come to incorrect conclusions, specifically around changes in Serious casualties over time, and in recent-year analysis of spatial differences in Serious casualties.

There are new data collection methods, compared to paper, in this data now:

  1. CRASH (a DfT promoted mobile app for police)
  2. COPA (a Met Police mobile system)
  3. Online public submissions

(http://roadsafetyanalysis.org/2017/09/2016-gb-casualty-data-released/)

Not all constabularies will be on CRASH/COPA, but they will be showing rises in Serious casualties relative to previous years, to some degree because these apps force those entering data to enter data more precisely. (Many who ought to don't know what constitutes a "serious"). This can easily be mis-read and encourage false conclusions.

I'd suggest some sort of warning either as the package loads, or for results including 2016+ data in the first instance.

I'd also be happy to hunt down a list of Police Forces and when/if they switched, so that you could add another field ("data_entry_type" or similar). One could then adjust serious totals as appropriate to make analysis across space or time more robust.

Ivo

@layik
Copy link
Member

layik commented Feb 26, 2019

Hello Ivo,

Thank you for opening the ticket. This is important and worth followup. I will break down your post into few points as I understand it:

  1. Potential warning message along with the disclaimer currently in, would be related to data post 2016+.
  2. You kindly want to contribute by hunting down those that are already on CRASH/COPA.
  3. Extra field in stats19::format to include data_entry_type

I just want to say: I am not 100% clear if there are different datasets released by the DfT according to their methods of collection. I think this is something that we need to clarify with DfT and right from the source. Otherwise (3) would be redundant.

The link in Ivo's post contains a link to this: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/744077/reported-road-casualties-annual-report-2017.pdf

@layik
Copy link
Member

layik commented Feb 26, 2019

RE (2) the report full report does contain this
image

@Robinlovelace
Copy link
Member

Robinlovelace commented Feb 26, 2019

I can think of the following ways to address this in the near term:

  • mentioning it in the docs
  • adding a new variable to police_boundaries with time of switch
  • potentially, adding something in the package load message

Sound like a plan? This should help reduce the chances of people arriving at false conclusions due to the different uptake times of CRASH. A PR adding an additional column to the existing police_boundary data, building on the var names shown below, would be greatly appreciated.

names(stats19::police_boundaries)
#> [1] "pfa16cd"  "pfa16nm"  "geometry"

Created on 2019-02-26 by the reprex package (v0.2.1)

My understanding of the switch is that it affects the serious/slight proportion but not the fatalities data. Is that correct? And any ideas how others are dealing with this?

In summary: definitely in favour of adding something on this, had heard about it but knew little about it. Thanks for raising the issue.

@wengraf
Copy link
Contributor Author

wengraf commented Feb 26, 2019

While the app-based systems are markedly superior in principle, there are transition issues, and not everyone has taken it up or taken up the same system or even the same version of the same system. Serious are now much more precisely counted, because the app asks about injury type, whereas the paper form required you to remember the definition. The app-based methods should have much more accurate crash location data, but the processing so far hasn't been kind to casualty home location and driver home location fields. This should improve and be backdated (the right data is there in the computer, it just isn't spitting it out at the moment).

Your plan sounds excellent, @Robinlovelace , and I can have a word on the side about it at the next STATS19 review meeting at DfT if that'll help (i) clarify any issues and/or (ii) drum up further interest.

@wengraf
Copy link
Contributor Author

wengraf commented Feb 26, 2019

My understanding is that the plan is that data entry method will begin to appear as an additional field, especially as new public-submitted data is likely to make this much more confusing soon.

@Robinlovelace
Copy link
Member

Great to hear Ivo. Note: we have talked to DfT about this package and it has been informally tested by them (see #5). Anything mentioning those issues, especially based on expertise of the likes of Craig (do you know his GH handle? ; ) and others in Agilysis, will go well beyond mention of it in the current default open access system I believe! Look forward to seeing your input and if we can help in anyway (e.g. extracting data from an impenetrable pdf) just ping me here.

@layik
Copy link
Member

layik commented Dec 3, 2020

Are we closing this?

@Robinlovelace
Copy link
Member

No I think we need to get #176 and #178 before closing this.

@Robinlovelace
Copy link
Member

Cc @stholder3 FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants