GitHub - Thoughtful1/PySpark_HDFS_demo: Demonstration of working Spark Application combined with HDFS and JuPyter notebook data transformation scripts

Start

Execute bash master-build.sh to build and start the containers. Then select pyspark_dataframe_jobs.ipynb notebook on Jupyter dashboard to check & run samples of data modelling ops.

In 'pyspark_dataframe_jobs' notebook I covered most popular PySpark dataframe operations plus saving and loading process of that data on HDFS filesystem.

Path to Jupyter Notebook with data aggregation samples: /JuPyter_HDFS_PySpark_datamodelling_samples/jupyter/workspace/pyspark_dataframe_jobs.ipynb

Hadoop

Access Hadoop UI on ' http://localhost:9870 '

Spark

Access Spark Master UI on ' http://localhost:8080 '

Jupyter

Access Jupyter UI on ' http://localhost:8888 '

Contact to the creator of docker image. This repo is based on docker image config created by:

Martin Karlsson

LinkedIn : [martin-karlsson][linkedin-url]
Twitter : @HelloKarlsson
Email : [email protected]
Webpage : www.martinkarlsson.io Docker Image Project Link: github.com/martinkarlssonio/big-data-solution

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
hadoop		hadoop
jupyter		jupyter
spark		spark
README.md		README.md
docker-compose.yml		docker-compose.yml
master-build.sh		master-build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Start

Hadoop

Spark

Jupyter

Contact to the creator of docker image. This repo is based on docker image config created by:

Martin Karlsson

About

Releases

Packages

Languages

Thoughtful1/PySpark_HDFS_demo

Folders and files

Latest commit

History

Repository files navigation

Start

Hadoop

Spark

Jupyter

Contact to the creator of docker image. This repo is based on docker image config created by:

Martin Karlsson

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages