Scio

Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o]

Verb: I can, know, understand, have knowledge.

Scio is a Scala API for Apache Beam and Google Cloud Dataflow inspired by Apache Spark and Scalding. See the current API documentation for more information.

Scio 0.3.0 and future versions depend on Apache Beam (org.apache.beam) while earlier versions depend on Google Cloud Dataflow SDK (com.google.cloud.dataflow). See this page for a list of breaking changes.

Features

Scala API close to that of Spark and Scalding core APIs
Unified batch and streaming programming model^{1, 2}
Fully managed service²
Integration with Google Cloud products: Cloud Storage, BigQuery, Pub/Sub, Datastore, Bigtable²
HDFS source/sink
Interactive mode with Scio REPL
Type safe BigQuery
Integration with Algebird and Breeze
Pipeline orchestration with Scala Futures
Distributed cache

¹ provided by Apache Beam

² provided by Google Cloud Dataflow

Quick Start

The ubiquitous word count example can be run directly with SBT in local mode, using README.md as input.

sbt "scio-examples/run-main com.spotify.scio.examples.WordCount --input=README.md --output=wc"
cat wc/part-00000-of-00001.txt

Documentation

Scio Wiki - wiki page
ScalaDocs - current API documentation
Big Data Rosetta Code - comparison of code snippets in Scio, Scalding and Spark

Artifacts

Scio includes the following artifacts:

scio-core: core library
scio-test: test utilities, add to your project as a "test" dependency
scio-bigquery: Add-on for BigQuery, included in scio-core but can also be used standalone
scio-bigtable: Add-on for Bigtable
scio-extra: Extra utilities for working with collections, Breeze, etc.
scio-hdfs: Add-on for HDFS

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1,506 Commits
project		project
scio-bench/src/test/scala/com/spotify/scio/bench		scio-bench/src/test/scala/com/spotify/scio/bench
scio-bigquery/src		scio-bigquery/src
scio-bigtable/src		scio-bigtable/src
scio-core/src/main		scio-core/src/main
scio-examples/src		scio-examples/src
scio-extra/src		scio-extra/src
scio-hdfs/src		scio-hdfs/src
scio-repl/src		scio-repl/src
scio-schemas/src/main		scio-schemas/src/main
scio-test/src		scio-test/src
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.sbt		build.sbt
circle.yml		circle.yml
scalastyle-config.xml		scalastyle-config.xml
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scio

Features

Quick Start

Documentation

Artifacts

License

About

Releases

Packages

Languages

License

SpotsInc/scio

Folders and files

Latest commit

History

Repository files navigation

Scio

Features

Quick Start

Documentation

Artifacts

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages