This repository containes the notes collected from the lectures and of "Distributed architectures for big data processing and analytics" (DABDPA), hold at the Master Degree in Data Science and Engineering course (2022-2023) of Politecnico di Torino.
Quarto is "an open-source scientific and technical publishing system built on Pandoc", mainly maintained by Posit (formerly named RStudio). Find more info here.
To use Quarto, find the installer/package at this link. Download and install the one compatible with OS in use.
To test if Quarto is correctly installed, use the following command in the terminal
quarto --version
Install git and move to a target folder, then clone this repository in the local machine using the following command
git clone https://github.com/Edoch94/DABDPA.git
Install Miniconda (see here for instructions and download), and create a new environment for this project, using the environment.yml
configuration file as a "blueprint".
- Move to the local folder where the DABDPA repository was cloned
- Create the environment based on
environment.yml
conda env create -f environment.yml
Otherwise, you can create the environment using pip and the requirements.txt file
pip install -r requirements.txt
This repository is structured as a Quarto project. The configuration file of this project is ./qproject/_quarto.yml
.
After performing changes on the Quarto files (.qmd
files), the project has to be re-rendered to reflect the changes in the .html
output files. To render the entire project, follow these steps:
- Move to the local folder where the DABDPA repository was cloned
- Activate the conda environment
conda activate DABDPA
- Launch the Quarto project render
quarto render qproject/
Notice that the output of the project is a series of .html
files, saved in the ./qproject/output
folder. The output type can be easily changed to Microsoft Word (.docx
) or PDF (.pdf
) files in the format
section of the ./qproject/_quarto.yml
configuration file.