Skip to content

Demonstration of working Spark Application combined with HDFS and JuPyter notebook data transformation scripts

Notifications You must be signed in to change notification settings

Thoughtful1/PySpark_HDFS_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Start

Execute bash master-build.sh to build and start the containers. Then select pyspark_dataframe_jobs.ipynb notebook on Jupyter dashboard to check & run samples of data modelling ops.

In 'pyspark_dataframe_jobs' notebook I covered most popular PySpark dataframe operations plus saving and loading process of that data on HDFS filesystem.

Path to Jupyter Notebook with data aggregation samples: /JuPyter_HDFS_PySpark_datamodelling_samples/jupyter/workspace/pyspark_dataframe_jobs.ipynb

Hadoop

Access Hadoop UI on ' http://localhost:9870 '

Spark

Access Spark Master UI on ' http://localhost:8080 '

Jupyter

Access Jupyter UI on ' http://localhost:8888 '

Contact to the creator of docker image. This repo is based on docker image config created by:

Martin Karlsson

LinkedIn : [martin-karlsson][linkedin-url]
Twitter : @HelloKarlsson
Email : [email protected]
Webpage : www.martinkarlsson.io Docker Image Project Link: github.com/martinkarlssonio/big-data-solution

About

Demonstration of working Spark Application combined with HDFS and JuPyter notebook data transformation scripts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published