-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jan open data -- draft for now #1356
base: main
Are you sure you want to change the base?
Conversation
My thoughts my looking at the changes:
Specifically...these logs jump out to me:
|
Thanks for the pointers!
Hm, I did comment out the data-analyses/gtfs_funnel/vp_keep_usable.py Lines 157 to 162 in d59e058
data-analyses/rt_segment_speeds/segment_speed_utils/vp_transform.py Lines 19 to 24 in 43da8a2
It's not clear to me what the behavior of Yeah, makes sense to look at the logs. Some sort of log checking/comparison step might be cool to make it somewhat automatic (perhaps print something if there's a major deviation like this...). Ahh, ok, specifically the sorting was to guarantee that for each trip, timestamps should be increasing (sorting on |
These all look about the same to me so far: https://github.com/cal-itp/data-analyses/blob/jan-open-data/rt_segment_speeds/45_diff_tables.ipynb |
Then I would move further down into Since the logs for the times in If it's not there, then I'd move backwards from there into other staged parquets to see. |
This reverts commit a76a6f4.
1238973
to
521af62
Compare
Started looking back into this today -- for stop_segments, 2,835,752 rows / 78,872 trips in Dec and 191,833 rows / 66,769 trips in Jan. for speedmap_segments, 3,348,200 rows / 79,571 trips in Dec and 3,074,234 rows / 79,946 trips in Jan. So I'm inclined to believe the trouble lies somewhere within these steps, and that they seem to be functioning about the same as December for speedmap segments but not for stop segments. Since both types of segments are concatenated to create the maps (with speedmap segments being the additional proxy stops), it makes sense that a ~90% drop in stop_segments trip speeds would lead to lots of missing data on the maps.
|
Narrowed it down to this merge The two tables join fine on
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think I've got it, will test the speeds pipeline tomorrow AM
Wasn't able to run the entire pipeline before heading out this week, tried running things from the top but got some new errors on the new image. @vevetron if you could make a "legacy power user" that would probably be handy, that way I could focus on this round of debugging and handle any upgrade-induced changes separately...
|
@edasmalchi Got it i'll make a legacy power user as well. |
Finished running the pipeline with some necessary tweaks but creating this as a draft to get some additional review before considering it done.
Speedmaps had a lot of missing data (and the site will revert back to December for now) but I don't know if this is due to:
I'll try redownloading vehicle positions and rerunning rt_segment_speeds later today, but @tiffanychu90 if anything I did here jumps out at you as a possible cause please let me know.
Otherwise I'll make a separate issue once I learn a little more.
Example of missing data for SFMTA, looked at a few others and the pattern seemed to be the same: