Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of JOSS paper #394

Merged
merged 17 commits into from
Oct 15, 2018
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added paper/figures
Empty file.
76 changes: 76 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
@article{2011-d3,
title = {D3: Data-Driven Documents},
author = {Michael Bostock AND Vadim Ogievetsky AND Jeffrey Heer},
journal = {IEEE Trans. Visualization \& Comp. Graphics (Proc. InfoVis)},
year = {2011},
url = {http://idl.cs.washington.edu/papers/d3},
}

@article{2017-vega-lite,
title = {Vega-Lite: A Grammar of Interactive Graphics},
author = {Arvind Satyanarayan AND Dominik Moritz AND Kanit Wongsuphasawat AND Jeffrey Heer},
journal = {IEEE Trans. Visualization \& Comp. Graphics (Proc. InfoVis)},
year = {2017},
url = {http://idl.cs.washington.edu/papers/vega-lite},
}

@InProceedings{ mckinney-proc-scipy-2010,
author = { Wes McKinney },
title = { Data Structures for Statistical Computing in Python },
booktitle = { Proceedings of the 9th Python in Science Conference },
pages = { 51 - 56 },
year = { 2010 },
editor = { St\'efan van der Walt and Jarrod Millman }
}

@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2013},
url = {http://www.R-project.org/},
}

@Article{Hunter:2007,
Author = {Hunter, J. D.},
Title = {Matplotlib: A 2D graphics environment},
Journal = {Computing In Science \& Engineering},
Volume = {9},
Number = {3},
Pages = {90--95},
abstract = {Matplotlib is a 2D graphics package used for Python
for application development, interactive scripting, and
publication-quality image generation across user
interfaces and operating systems.},
publisher = {IEEE COMPUTER SOC},
doi = {10.1109/MCSE.2007.55},
year = 2007
}

@Book{,
author = {Hadley Wickham},
title = {ggplot2: Elegant Graphics for Data Analysis},
publisher = {Springer-Verlag New York},
year = {2009},
isbn = {978-0-387-98140-6},
url = {http://ggplot2.org},
}

@book{Wilkinson:2005:GG:1088896,
author = {Wilkinson, Leland},
title = {The Grammar of Graphics (Statistics and Computing)},
year = {2005},
isbn = {0387245448},
publisher = {Springer-Verlag New York, Inc.},
address = {Secaucus, NJ, USA},
}

@article{2016-reactive-vega-architecture,
title = {Reactive Vega: A Streaming Dataflow Architecture for Declarative Interactive Visualization},
author = {Arvind Satyanarayan AND Ryan Russell AND Jane Hoffswell AND Jeffrey Heer},
journal = {IEEE Trans. Visualization \& Comp. Graphics (Proc. InfoVis)},
year = {2016},
url = {http://idl.cs.washington.edu/papers/reactive-vega-architecture},
}

72 changes: 72 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: 'Altair: a declarative statistical visualization library for Python'
tags:
- Python
- visualization
- statistics
- Jupyter
authors:
- name: Jacob VanderPlas
orcid: 0000-0002-9623-3401
affiliation: 1
- name: Brian E. Granger
orcid: 0000-0002-5223-6168
affiliation: 2
- name: Jeffrey Heer
orcid: 0000-0002-6175-1655
affiliation: 1
- name: Eitan Lees
orcid: 0000-0003-0988-6015
affiliation: 4
- name: Dominik Moritz
orcid: 0000-0002-3110-1053
affiliation: 1
- name: Scott Sievert
orcid: 0000-0002-4275-3452
affiliation: 3
- name: Ben Welsh
orcid: 0000-0002-5200-7269
affiliation: 5
- name: Kanit Wongsuphasawat
orcid: 0000-0001-7231-279X
affiliation: 1
affiliations:
- name: University of Washington
index: 1
- name: California Polytechnic State University, San Luis Obispo
index: 2
- name: University of Wisconsin, Madison
index: 3
- name: Florida State University
index: 4
- name: Los Angeles Times Data Desk
index: 5
date: 01 June 2018
bibliography: paper.bib
---

# Summary

Altair is a statistical visualization library for Python. Statistical visualization is a constrained subset of data visualization focused on the creation of visualizations
that are helpful in statistical modeling. The constrained model of statistical visualization is usually expressed in terms of a visualization grammar [] that specifies how input data is transformed and mapped to visual properties (position, color, size, etc.).

Altair is based on the Vega-Lite visualization grammar [], which allows a wide range of statistical
visualizations to be expressed using a small number of grammar primitives. Vega-Lite is declarative; visualizations are specified using JSON data that follows the Vega-Lite JSON schema.
As a Python library, Altair provides an API oriented towards scientists and data scientists
doing exploratory data analysis []. Altair's Python API emits Vega-Lite JSON data, which is then
rendered in a user-interface such as the Jupyter Notebook, JupyterLab or nteract using the
Vega-Lite JavaScript library []. Vega-Lite JSON is compiled to a full Vega specification [], which is then parsed and executed using a reactive runtime that internally makes use of D3.js [].

The declarative nature of the Vega-Lite visualization grammar, and its encoding in a formal
JSON schema, provide Altair with a number of benefits. First, much of the Altair Python code,
tests, and examples are autogenerated from the Vega-Lite JSON schema, ensuring strict conformance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "examples" here, because we no longer auto-generate them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the Vega-Lite specification. Second, the JSON data produced by Altair and consumed by Vega-Lite provides a natural serialization and file format for statistical visualizations. This is leveraged by JupyterLab, which provides built-in rendering of these files. Third, the JSON data
provides a clean integration point for non-programming based visualization user-interfaces such as Voyager [].

In addition to static documentation [], Altair includes a set of Jupyter Notebooks with examples
and an interactive tutorial. These notebooks can be by anyone with only a web-browser through
binder [].

What example to show?

# References