-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data upload + massage #4
Comments
Any ideas on best language to use for this? I would think we would need one with some good libraries for converting to The source files on data.okc.gov are zipped
At some point in the process, we will need to have a "cleaner" that goes through and modifies the name of certain polygons and markers (terminology correct?). Some initial transformations that I see based on the data:
In the future, more items could be added to this process to pull in other data about schools, school districts, or whatever points of interest we see fit. cc: @joekarl @DevinClark @jagthedrummer @jvrousseau @makenova |
So kmz == zip file and inside is the kml file so no worries on that. A couple of options for consideration:
Option 1 requires extra moving parts (notably ogr2ogr) while option 2 looks like we can deal with streaming data better (ie larger data). |
I say that and there's probably a good way to do streaming handling of kml as well, just easier to write a CSV parser than a kml parser. |
@adamveld12 you basically have a good chunk of the CSV parsing done right? (see #1) |
Kml to geojson conversions are out there https://github.com/mapbox/togeojson for example, but I know that we would sill have to do a good amount of cleanup on the generated geojson. I think csv parsing would give us a bit more flexibility to massage the data during the conversion rather than after. |
@joekarl Yeah, I put the parser code in a gist and you can get it here. |
@adamveld12 nice thanks |
So the message part of the data looks pretty straight forward, just convert the CSV to geojson and (maybe) convert the coordinate system if needed. GH pages - Will allow us free hosting of the data, but will make updating the data a pain (either manually updating the repo, or scripts to update the repo programmatically, neither one fun) S3 - Have to pay for S3 (though this is super super minuscule cost), but can update data in place programmatically Both are ultimately accessible via URL so frontend won't care where the data actually lives (just needs a URL) Just me but I would prefer S3 as I trust their hosting a bit more that GH pages. Also can you do SSL for custom domains with GH pages? //cc @gorsuch |
Naw. Unfortunately SSL on GitHub pages can only be accomplished via *.github.io domains. If you want to use SSL + custom domains on S3, I believe you'd have to bring cloudfront into the mix w/ an SNI cert. That's pretty straightforward. |
Initial version of data ingest is working (see merged PR #7). It basically takes the CSV input, cleans up the school names, and dumps out geojson for the schools and school districts. There are new issues for cleaning up the data even more to make it a lot smaller (see #8 and #9) as well as adding in school to school district mapping (#10). |
Closing this issue since this part is complete |
** more to come **
The text was updated successfully, but these errors were encountered: