Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with geopandas #588 #818

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
d317c98
Merge remote-tracking branch 'altair-viz/master'
iliatimofeev Mar 27, 2018
e297f20
Merge remote-tracking branch 'altair-viz/master'
iliatimofeev Mar 27, 2018
ea65ca1
Merge remote-tracking branch 'altair-viz/master'
iliatimofeev May 4, 2018
f7d66b7
__geo_interface__ in to_geojson_values
iliatimofeev May 5, 2018
e9d28a5
GeoDataFrame support without dependency of GeoPandas
iliatimofeev May 5, 2018
985d3f6
Merge remote-tracking branch 'altair-viz/master' into it-#588-geopandas
iliatimofeev May 5, 2018
b627c6c
Unused test file
iliatimofeev May 6, 2018
2cd9fde
Full __geo_interface__ support, test_geojson
iliatimofeev May 7, 2018
7db9ff8
test update
iliatimofeev May 7, 2018
950eb72
Mistakenly added .vscode files removed
iliatimofeev May 8, 2018
9f91c00
limit_rows two returns, to_* one if "geo" statement, four spaces for…
iliatimofeev May 8, 2018
76a2af8
geojson_feature()
iliatimofeev May 9, 2018
094a2e7
test_geopandas_examples (hacker version)
iliatimofeev May 13, 2018
df84294
travis-ci: move finalized locals outside try
iliatimofeev May 13, 2018
27a3df8
remove python 3 code
iliatimofeev May 13, 2018
301ea23
flat version
iliatimofeev May 16, 2018
c114acc
flat version
iliatimofeev May 16, 2018
143ad04
Merge remote-tracking branch 'altair-viz/master' into it-#588-geopandas
iliatimofeev May 16, 2018
661447d
Merge remote-tracking branch 'altair-viz/master' into it-#588-geopandas
iliatimofeev May 16, 2018
d2b46e0
GeoPandas ref
iliatimofeev May 17, 2018
649fa21
Merge remote-tracking branch 'altair-viz/master' into it-#588-geopandas
iliatimofeev Jun 10, 2018
89a999e
Merge remote-tracking branch 'altair-viz/master' into it-#588-geopandas
iliatimofeev Jun 10, 2018
80c56d6
merge
iliatimofeev Jun 10, 2018
1bb8192
flake8 fix
iliatimofeev Jun 10, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ target/

.ipynb_checkpoints
.idea/*
.vscode/*
tools/_build
Untitled*.ipynb
.mypy*
Expand Down
Binary file added altair/.DS_Store
Binary file not shown.
8 changes: 6 additions & 2 deletions altair/utils/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,9 +227,13 @@ def parse_shorthand(shorthand, data=None):
{'aggregate': 'count', 'type': 'quantitative'}
"""
attrs = _parse_shorthand(shorthand)
if isinstance(data, pd.DataFrame) and 'type' not in attrs:
if isinstance(data, pd.DataFrame):
if 'field' in attrs and attrs['field'] in data.columns:
attrs['type'] = infer_vegalite_type(data[attrs['field']])
if 'type' not in attrs:
attrs['type'] = infer_vegalite_type(data[attrs['field']])
if hasattr(data,'__geo_interface__'): #TODO: Add descripion in shorthand
attrs['field']='properties.'+attrs['field']

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something strange is going on here: is not checking for type in attrs necessary for this PR? And why is this block doubly indented?

Copy link
Contributor Author

@iliatimofeev iliatimofeev May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'type' not in attrs this is still checked, but in the line below (232).
The idea behind: to allow user work with GeoDataFrame like as regular DataFrame but showing geoshapes attached to its rows.
Implementation details:
First of all GeoDataFrame is subclass of pd.DataFrame with __geo_interface__. So it is instance of pd.DataFrame that has attribute __geo_interface__. GeoPandas are stored as geojson FeatureCollection with each row as Feature object where all columns are placed in properties object. Sample will be more informative:

{ /* vega-light data*/
    "format": {
        "property": "features", /* generate geoshape for each row*/
        "type": "json"
    },
    "values": { /* valid geojson for all rows*/
        "type": "FeatureCollection",
        "features": [
            { /* valid geojson for each row*/
                "type": "Feature",
                "properties": { /* column values  */
                    "pop_est": 12799293.0,
                    "continent": "Africa",
                },
                "geometry": { /* geometry of the row  */
                    "type": "MultiPolygon",
                    "coordinates": [ /* a lot of numbers*/]
                }
            }
        ]
    }
}

So first step is to add ["property": "features"] to vega-light data format description. That splits valid geojson stored in values back to rows of GeoDataFrame (it is possible to replace this step with storing content of "features" directly into "values" but that will made "values" invalid geojson).

Next is access to column values of GeoDataFrame. Values are accessible from chart as "properties.column_name". I hoped to simplify user experience by adding "properties." in shorthand. Now if user use shorthands he will get same behavior for GeoDataFrame as for DataFrame (take a look to updated description of PR) .

May be I should check if field name starts with "properties." to avoid doubling it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see.

We can think about that. In the meantime can you fix the indentation? 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore due new save format

return attrs


Expand Down
29 changes: 26 additions & 3 deletions altair/utils/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,17 @@ def to_json(data, prefix='altair-data'):
check_data_type(data)
ext = '.json'
filename = _compute_filename(prefix=prefix, ext=ext)
data_format = {'type': 'json'}
if isinstance(data, pd.DataFrame):
data = sanitize_dataframe(data)
data.to_json(filename, orient='records')
if not hasattr(data,'__geo_interface__'):
data.to_json(filename, orient='records')
else: #GeoPandas
with open(filename) as f:
json.dump(data.__geo_interface__, f)
data_format['property']='features'


elif isinstance(data, dict):
if 'values' not in data:
raise KeyError('values expected in data dict, but not present.')
Expand All @@ -96,9 +104,18 @@ def to_json(data, prefix='altair-data'):
json.dump(values, f)
return {
'url': filename,
'format': {'type': 'json'}
'format': data_format
}

@curry
def to_geojson_values(data, feature="features"):
if not hasattr(data, '__geo_interface__'):
raise TypeError('Expected GeoDataFrame or __geo_interface__, got: {}'.format(type(data)))
if isinstance(data, pd.DataFrame):
data = sanitize_dataframe(data)
return {
'values':data.__geo_interface__,
'format':{'type':'json','property':feature}
}

@curry
def to_csv(data, prefix='altair-data'):
Expand All @@ -123,6 +140,12 @@ def to_values(data):
check_data_type(data)
if isinstance(data, pd.DataFrame):
data = sanitize_dataframe(data)
if hasattr(data,'__geo_interface__'):#GeoPandas
return {
'values':data.__geo_interface__,
'format':{'type':'json','property':'features'}
}

return {'values': data.to_dict(orient='records')}
elif isinstance(data, dict):
if 'values' not in data:
Expand Down
12 changes: 11 additions & 1 deletion altair/utils/tests/test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pandas as pd


from ..data import limit_rows, MaxRowsError, sample, pipe, to_values
from ..data import limit_rows, MaxRowsError, sample, pipe, to_values,to_geojson_values


def _create_dataframe(N):
Expand Down Expand Up @@ -63,3 +63,13 @@ def test_type_error():
for f in (sample, limit_rows, to_values):
with pytest.raises(TypeError):
pipe(0, f)


def test_to_geojson_values():
gpd = pytest.importorskip('geopandas')
geo_data = gpd.GeoDataFrame({ "name": ['a','b']},
geometry=[gpd.geoseries.Point((1.0, 0.0)),
gpd.geoseries.Point((0.0, 1.0))],index=['i','j'])
result = pipe(geo_data, to_geojson_values)
assert result=={'format': {'property': 'features', 'type': 'json'},
'values': geo_data.__geo_interface__}
5 changes: 3 additions & 2 deletions altair/vegalite/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from toolz.curried import curry, pipe
from ..utils.core import sanitize_dataframe
from ..utils.data import (
MaxRowsError, limit_rows, sample, to_csv, to_json, to_values,
MaxRowsError, limit_rows, sample, to_csv, to_json, to_values, to_geojson_values,
check_data_type, DataTransformerRegistry
)

Expand All @@ -24,5 +24,6 @@ def default_data_transformer(data, max_rows=5000):
'to_csv',
'to_json',
'to_values',
'check_data_type'
'check_data_type',
'to_geojson_values'
)
2 changes: 1 addition & 1 deletion altair/vegalite/v2/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from .data import (
MaxRowsError,
pipe, curry, limit_rows,
sample, to_json, to_csv, to_values,
sample, to_json, to_csv, to_values, to_geojson_values,
default_data_transformer,
data_transformers
)
5 changes: 3 additions & 2 deletions altair/vegalite/v2/data.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from ..data import (MaxRowsError, curry, default_data_transformer, limit_rows,
pipe, sample, to_csv, to_json, to_values, DataTransformerRegistry)
pipe, sample, to_csv, to_json, to_values,to_geojson_values, DataTransformerRegistry)


# ==============================================================================
Expand Down Expand Up @@ -27,5 +27,6 @@
'to_csv',
'to_json',
'to_values',
'data_transformers'
'data_transformers',
'to_geojson_values'
)