-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial resolution of STATS19 geographic data #203
Comments
Hi @agila5 good question and sorry for the slow response. Yes, this information was provided by @WillSERP. I've never seen a quantitative assessment of the accuracy of STATS19 data and know that it has improved over time. That couuld be a worthwhile thing to do at some point. In the meantime, do you @wengraf have any comments on this? Cheers! |
????? I inserted links but they seem to have disappeared! Here is the url text: |
Hi @WillSERP thanks for this but I meant the accuracy of the data not its precision: the coordinates could be reported to the nearest cm but that doesn't mean the crash actually happened there. Looking back at the comments from Andrea I'm not 100% sure he means measurement accuracy, which can be defined as the distance from the recorded location within which 95% of actual crashes happened (that's what I meant by the 10 m value). A statistician may have a better definition of accuracy, cc @agila5. I think you're talking about measurement precision @WillSERP. I'm not actually sure how the latest location info is added, is it with a GPS? If so I think 10 m is still about right, accounting for the inherent ~5 m accuracy of the device plus another ~5 m uncertainty: was the person standing exactly where the crash happened? According to this article
They seem to define accuracy as:
In any case I'm sure there is room for improvement in how we talk about these things in the package documentation so this is a useful issue, many thanks! |
In that case the ~10m accuracy is about right but only for police force areas where secondary data quality checks are carried out. This involves checking the grid refs on a road map to see if the plotted location matches the descriptive text and fields denoting junction and road types. When these checks are performed we can usually be confident the plotted location is within 10m of the true location. I can't speak for areas where these checks are not performed, except that I would expect more outliers with low accuracy. The use of handheld devices to input data is improving accuracy, but requires police forces to adopt the technology (not all have done so), and police officers to be trained to use it properly (i.e. to plot the location of the collision, not the McDonalds drive thru where they do the paperwork) |
Hi @Robinlovelace , @agila5 and @WillSERP: This came up a bit in the STATS19 review (out now! https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1001195/stats-19-review-final-report.pdf) and in some of the discussions I've had about all lane running. I take both points about accuracy and precision, but there is another question also, namely what that point (however precise/accurate it might be) really represents. The point of initial impact? Where the vehicle comes to rest? Some approximated mid point? Especially for high speed collisions, I don't think crashes happen in single points in space. Motorway ones are surely more accurately a polygon. So, at best, the location is written out to 1m (six fig grid ref), taken from either a consumer grade GPS, or some reckoning from a map by hand. It can only be enough to attempt to attribute admin responsibility (what LA, what road authority, what police force etc) and basic circumstances (T-junction, rural/urban etc). So, basically, what ever the accuracy/precision, it is all a bit false precision in a sense. |
Excellent point as always @wengraf . When amending inaccurate locations we go by first point of impact, or where there was a loss of control preceeding impact, the point at which the loss of control ocurred. This is with a focus on the needs of engineers to identify aspects of the highway that could affect the collision risk. For example, when I was 26 and foolish I put my car on its side on the nearside verge of a country lane about 20m after the exit of a corner. There was nothing wrong with this verge, but on the entrance to the corner there was a lack of kerbing that allowed soil to wash into the road. Combined with my poor driving and poor tyres this led to a loss of control around the apex. By all accounts this was a popular place to loose control and kerbing was installed on the verge to help idiots like me keep rubber side down. |
I have no faith whatsoever that this guidance is followed anywhere close to consistently enough for this to be a help when using the data!
Location is what it is, and it is good enough for the vast majority of the location-based analyses anyone might want to do. That's all you can say, really. |
😆 |
I think there is enough great info in this thread to act on: we should mention these points and add the links provided by Ivo and Will. I know it may be a bit late and overkill for your needs (for the academic paper, right @agila5 ) but we should incorporate this info somehow into the package documentation. |
Hi everyone and good morning. First of all, thank you very much for your comments.
If you want, I can create a PR to add these points to the README sometime in August
Yes but I think it doesn't matter 😅 |
Yes please 🙏 I will assign you but if you get bogged down on other stuff don't worry just let us know and I can give it a go no problem. It's great to see how open source projects can effectively crowd source information. Thanks everyone! |
Hi everyone, quick question. The README says
stats19/README.Rmd
Lines 180 to 181 in 535673a
Do you remember where (i.e. a DfT document or something similar) it's written that STATS19 data have ~10m resolution?
The text was updated successfully, but these errors were encountered: