Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vehicle_type mapped incorrectly for 1979-2004 data #102

Closed
cmcaine opened this issue Aug 9, 2019 · 6 comments
Closed

vehicle_type mapped incorrectly for 1979-2004 data #102

cmcaine opened this issue Aug 9, 2019 · 6 comments

Comments

@cmcaine
Copy link
Contributor

cmcaine commented Aug 9, 2019

$ xsv select Vehicle_Type ~/.cache/R/stats19/data/Stats19-Data1979-2004/Vehicles7904.csv | awk '{A[$1]++}END{for(i in A)print i,A[i]}' | sort -n
-1 4563
Vehicle_Type 1
1 651744
2 183944
3 45262
10 8088
11 311221
16 991
17 5079
18 190
19 539803
20 17148
21 67219
90 99464
103 33317
104 722539
105 2456
106 98506
108 98193
109 7755736
110 35898
113 300607
# This is a large biased sample from the total 1979 dataset
s19_1979_2004_vehicles$vehicle_type %>% fraction_na
0.82

I suspect the package does not use the appropriate schema for the data. It's probably using the more recent lookup table that's in the range -1:99 compared to the old table that goes from -1:113.

Probably other variables are mapped incorrectly if this is the case. The correct schema is contained in the 1979_2004 zip with the csvs as a XLS.

@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 9, 2019

1979-2004 lookup:

code label
1 Pedal cycle
2 Motorcycle 50cc and under
3 Motorcycle 125cc and under
4 Motorcycle over 125cc and up to 500cc
5 Motorcycle over 500cc
8 Taxi/Private hire car
9 Car
10 Minibus (8 - 16 passenger seats)
11 Bus or coach (17 or more pass seats)
16 Ridden horse
17 Agricultural vehicle
18 Tram
19 Van / Goods 3.5 tonnes mgw or under
20 Goods over 3.5t. and under 7.5t
21 Goods 7.5 tonnes mgw and over
22 Mobility scooter
23 Electric motorcycle
90 Other vehicle
97 Motorcycle - unknown cc
98 Goods vehicle - unknown weight
-1 Data missing or out of range
103 Motorcycle - Scooter
104 Motorcycle
105 Motorcycle - Combination
106 Motorcycle over 125cc
108 Taxi (excluding private hire cars)
109 Car (including private hire cars)
110 Minibus/Motor caravan
113 Goods vehicle over 3.5 tonnes

But from the frequency table above you can see that a bunch of these are unused.

Table for 2005 onwards:

code label
1 Pedal cycle
2 Motorcycle 50cc and under
3 Motorcycle 125cc and under
4 Motorcycle over 125cc and up to 500cc
5 Motorcycle over 500cc
8 Taxi/Private hire car
9 Car
10 Minibus (8 - 16 passenger seats)
11 Bus or coach (17 or more pass seats)
16 Ridden horse
17 Agricultural vehicle
18 Tram
19 Van / Goods 3.5 tonnes mgw or under
20 Goods over 3.5t. and under 7.5t
21 Goods 7.5 tonnes mgw and over
22 Mobility scooter
23 Electric motorcycle
90 Other vehicle
97 Motorcycle - unknown cc
98 Goods vehicle - unknown weight
-1 Data missing or out of range

@layik
Copy link
Member

layik commented Aug 9, 2019

Hi Colin. Greetings from far away.

I was going to say the same thing under #101: one problem with this package is we never processed/tested 1979-2004. So any contribution from both tickets would be great. As Robin said, @mem48 had warned us about the grid issues, too. I think at this stage if we could update the docs, would be great for users of the package.

@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 10, 2019

I figured that this bit was buggy because noone used it.

I'll submit some PRs in a bit :)

@Robinlovelace
Copy link
Member

Robinlovelace commented Aug 20, 2019

Hey @cmcaine belated thanks from me also for identifying this issue. Are you still planning to take a look (know you have other important priorities so I'm happy to try to fix this)?

@cmcaine
Copy link
Contributor Author

cmcaine commented Aug 20, 2019 via email

@layik
Copy link
Member

layik commented Dec 3, 2020

I would be having a go if I understood this 100% and might just ping @cmcaine in case he would remember a solution he had in mind at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants