-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some police forces used alternative grid references for eastings and northings ~1979-1981 and 1986 #101
Comments
Well found @cmcaine. At the request of @mem48 I recall we add a warning saying that locations may not be accurate before 2005. It would be amazing if we could rectify the issues in the code. I think that analysing the crashes with clearly errant points could lead to a solution. One question: do the errors also affect the |
There's no longitude or latitude data at all until 1999, by which time the eastings and northings look accurate anyway. Some number of these will be transcription errors, but others (London) certainly look like systematic use of an alternative (or truncated) grid reference system, so I think there is a chance of fixing those. Another challenge is that the data for all accidents has fewer obviously missing areas: It seems unlikely that these places really had no serious or fatal collisions in a year. Perhaps police forces used a different system for recording more serious crashes in those areas? |
We had this problem with the cyipt project, some of the older data uses less precise grid references and lots of data ended up in the sea. The british national grid has not change since 1936, and I'm not aware of any regional grids in the UK. So I suspect there is some add hoc truncation, where ploice have left out inital few didgits they would always be the same in their area of intrest. |
@cmcaine I've had a deeper dive and this does not seem to be a simple scaling problem. I can't figure out how the coordinates are supposed to map to the BNG. I suggest making an equiery with the DFT they may have some historical context that we are missing. |
Thanks for looking at it, Malcolm. Sent to DfT:
|
Just picking up the thread on this after getting back from holiday yesterday. I suspect there are some systematic errors that can be fixed, and likely some random errors that cannot. I think asking the DfT is a good plan (have you heard anything @cmcaine? can follow up if not) and, if they don't know either, would suggest a collaborative project aimed at doing an even deeper dive than @mem48 did to identify dodgy coordinates (please share analysis code you used for this if you have it). Quantifying (e.g. range, standard deviation) and plotting differences between expected (based on recent data) and recorded Easting and Northing distributions for each force/year combination in which dodgy coordinates are found should help at least identify the region/years in which there is a pattern to the error. |
I have not received any response yet. Please feel free to follow up, I
don't have any contacts at the DfT. Contacting the metropolitan police or
the Mayor's Office for Policing and Crime[1] might be sensible, to.
It's pretty clear from the maps that it's mostly crime in London that is
systematically miscoded in the first two years.
[1]:
https://www.london.gov.uk/what-we-do/mayors-office-policing-and-crime-mopac/governance-and-decision-making/mopac-decisions--71
…On Sat, 17 Aug 2019 at 16:26, Robin ***@***.***> wrote:
Just picking up the thread on this after getting back from holiday
yesterday. I suspect there are some systematic errors that can be fixed,
and likely some random errors that cannot. I think asking the DfT is a good
plan (have you heard anything @cmcaine <https://github.com/cmcaine>? can
follow up if not) and, if they don't know either, would suggest a
collaborative project aimed at doing an even deeper dive than @mem48
<https://github.com/mem48> did to identify dodgy coordinates (please
share analysis code you used for this if you have it).
Quantifying (e.g. range, standard deviation) and plotting differences
between expected (based on recent data) and recorded Easting and Northing
distributions for each force/year combination in which dodgy coordinates
are found should help at least identify the region/years in which there is
a pattern to the error.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#101?email_source=notifications&email_token=ABNZA6IKZ5WZ6POZGRYVGRLQFAKB5A5CNFSM4IKCO4N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QNWOI#issuecomment-522246969>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABNZA6L3UUGEK66TUR47WCLQFAKB5ANCNFSM4IKCO4NQ>
.
|
I couldn't find any pattern in the London Data. It wasnt even rougly in the shape of london, so I think we are going to need expert help |
I think it should be discoverable, though. 1981 was only 39 years ago. I'm sure the met police or mopac could reach out to some retired officers for us if they felt like it. I've sent a similar email to the ONS as well and attached a sample CSV of the eastings and northings. I attach here a zip of a CSV of all of the eastings and northings for 1979-1981 in case anyone wants to get the data without using R (mostly for the convenience of our external friends). Each observation includes the easting and northing, local authority district name, road class and road number. The exact text of the email sent to the ONS is:
|
Perhaps the coordinates just assume that they're on the OS map for their particular area of London. If there were enough different maps for different areas of london then the shape of London would not be scaled and recognisable. |
That is possible, there may also have been a conversion error that scrambled the data. For example BNG coordinates can be stored like this TQ1234 if these had been convered to numbers incorrectly the may have become garbled. |
Just catching up with this, I think one technical issue is also Reprex to show all london accidents return empty using `format_sf`:dd = "~/code/saferactive/ignored/"
acc7904 = stats19::get_stats19(1979, data_dir = dd)
#> No files of that type found for that year.
#> �[31mThis will download 240 MB+ (1.8 GB unzipped).�[39m
#> Coordinates and other variables may be unreliable in these datasets.
#> See https://github.com/ropensci/stats19/issues/101 and https://github.com/ropensci/stats19/issues/102
#> Files identified: Stats19-Data1979-2004.zip
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19-Data1979-2004.zip
#> Data already exists in data_dir, not downloading
#> Data saved at ~/code/saferactive/ignored//Stats19-Data1979-2004/Vehicles7904.csv~/code/saferactive/ignored//Stats19-Data1979-2004/Road-Accident-Safety-Data-Guide-1979-2004.xls~/code/saferactive/ignored//Stats19-Data1979-2004/Casualty7904.csv~/code/saferactive/ignored//Stats19-Data1979-2004/Accidents7904.csv
#> No files of that type found for that year.
#> �[31mThis will download 240 MB+ (1.8 GB unzipped).�[39m
#> Coordinates and other variables may be unreliable in these datasets.
#> See https://github.com/ropensci/stats19/issues/101 and https://github.com/ropensci/stats19/issues/102
#> Reading in:
#> /home/layik/code/saferactive/ignored//Stats19-Data1979-2004/Accidents7904.csv
#> date and time columns present, creating formatted datetime column
# acc7904 = stats19::format_sf(acc7904, lonlat = TRUE)
l = acc7904[acc7904$local_authority_district == "London", ]
nrow(l)
#> [1] 638746
l = stats19::format_sf(l, lonlat = TRUE)
#> 638746 rows removed with no coordinates
#> Warning in min(cc[[1]], na.rm = TRUE): no non-missing arguments to min;
#> returning Inf
#> Warning in min(cc[[2]], na.rm = TRUE): no non-missing arguments to min;
#> returning Inf
#> Warning in max(cc[[1]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
#> Warning in max(cc[[2]], na.rm = TRUE): no non-missing arguments to max;
#> returning -Inf
nrow(l) == 0 # TRUE
#> [1] TRUE Created on 2020-07-01 by the reprex package (v0.3.0) |
Just come to this - just terrible geo-coding, and no error checking at the time, not an alternative CRS. Unless you can match to main road name and work from that, just learn to live with it. I'd be wary of the idea of "correction" too.... |
I think different conventions were used in different forces. Confident there are ways to improve on the assumption of 'bog standard' 27700 (e.g. by dividing coords by 10) for some places, but not a priority! |
A lot would have been found on an old A-Z, then roughly guessed on a paper Landranger, with only the vaguest idea about eastings and northings. Stats19 has duff fields, at points in time, that’s something people have to just come to terms with. |
Some will have been filled with meaningless numbers, like 0,0, just so it passed the check for a filled in field. |
Good point Ivo. |
Can we also close this? As we cannot offer any useful solutions to the issue. Use of road names etc are all outside the main issue. I say we close it. |
I think we can close this. We've raised the issue and even give the user a message telling them to watch out. Good suggestion, thanks @layik. stats19::get_stats19(year = 1979)
#> No files of that type found for that year.
#> [31mThis will download 240 MB+ (1.8 GB unzipped).[39m
#> Coordinates and other variables may be unreliable in these datasets.
#> See https://github.com/ropensci/stats19/issues/101 and https://github.com/ropensci/stats19/issues/102
#> Files identified: Stats19-Data1979-2004.zip
#> http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19-Data1979-2004.zip
#> Data already exists in data_dir, not downloading
#> Data saved at ~/stats19-data/Stats19-Data1979-2004/Vehicles7904.csv~/stats19-data/Stats19-Data1979-2004/Road-Accident-Safety-Data-Guide-1979-2004.xls~/stats19-data/Stats19-Data1979-2004/Casualty7904.csv~/stats19-data/Stats19-Data1979-2004/Accidents7904.csv
#> No files of that type found for that year.
#> [31mThis will download 240 MB+ (1.8 GB unzipped).[39m
#> Coordinates and other variables may be unreliable in these datasets.
#> See https://github.com/ropensci/stats19/issues/101 and https://github.com/ropensci/stats19/issues/102
#> Reading in:
#> /home/robin/stats19-data/Stats19-Data1979-2004/Accidents7904.csv
#> date and time columns present, creating formatted datetime column
#> # A tibble: 6,224,198 x 33
#> accident_index location_eastin… location_northi… longitude latitude
#> <chr> <int> <int> <dbl> <dbl>
#> 1 197901A11AD14 NA NA NA NA
#> 2 197901A1BAW34 198460 894000 NA NA
#> 3 197901A1BFD77 406380 307000 NA NA
#> 4 197901A1BGC20 281680 440000 NA NA
#> 5 197901A1BGF95 153960 795000 NA NA
#> 6 197901A1CBC96 300370 146000 NA NA
#> 7 197901A1DAK71 143370 951000 NA NA
#> 8 197901A1DAP95 471960 845000 NA NA
#> 9 197901A1EAC32 323880 632000 NA NA
#> 10 197901A1FBK75 136380 245000 NA NA
#> # … with 6,224,188 more rows, and 28 more variables: police_force <chr>,
#> # accident_severity <chr>, number_of_vehicles <int>,
#> # number_of_casualties <int>, date <date>, day_of_week <chr>, time <chr>,
#> # local_authority_district <chr>, local_authority_highway <chr>,
#> # first_road_class <chr>, first_road_number <int>, road_type <chr>,
#> # speed_limit <int>, junction_detail <chr>, junction_control <chr>,
#> # second_road_class <chr>, second_road_number <int>,
#> # pedestrian_crossing_human_control <chr>,
#> # pedestrian_crossing_physical_facilities <chr>, light_conditions <chr>,
#> # weather_conditions <chr>, road_surface_conditions <chr>,
#> # special_conditions_at_site <chr>, carriageway_hazards <chr>,
#> # urban_or_rural_area <chr>,
#> # did_police_officer_attend_scene_of_accident <int>,
#> # lsoa_of_accident_location <chr>, datetime <dttm> Created on 2020-12-03 by the reprex package (v0.3.0) |
I've looked back at my earlier comments, and perhaps age and fatherhood has mellowed me since then...is there any mileage to be had with reverse geocoding and LA polygon/road/secondary road/junction type etc? Perhaps it is an interesting undergrad or MSc project? |
E.g. Hounslow isn't really in the sea to the west of Glasgow.
The London Boroughs and some other geographical areas did this. If we can find out what system they were using we could fix this.
1986, looks like a CRS issue too, but I don't really know what's going on.
We can also observe a lot of other errors with the early geocoding in this sequence of images:
Source code:
The text was updated successfully, but these errors were encountered: