Execute bash master-build.sh
to build and start the containers. Then select pyspark_dataframe_jobs.ipynb notebook on Jupyter dashboard to check & run samples of data modelling ops.
In 'pyspark_dataframe_jobs' notebook I covered most popular PySpark dataframe operations plus saving and loading process of that data on HDFS filesystem.
Path to Jupyter Notebook with data aggregation samples: /JuPyter_HDFS_PySpark_datamodelling_samples/jupyter/workspace/pyspark_dataframe_jobs.ipynb
Access Hadoop UI on ' http://localhost:9870 '
Access Spark Master UI on ' http://localhost:8080 '
Access Jupyter UI on ' http://localhost:8888 '
LinkedIn : [martin-karlsson][linkedin-url]
Twitter : @HelloKarlsson
Email : [email protected]
Webpage : www.martinkarlsson.io
Docker Image Project Link: github.com/martinkarlssonio/big-data-solution