Skip to content
/ DABDPA Public

Notes from "Distributed architectures for big data processing and analytics"

Notifications You must be signed in to change notification settings

edochi/DABDPA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DABDPA

This repository containes the notes collected from the lectures and of "Distributed architectures for big data processing and analytics" (DABDPA), hold at the Master Degree in Data Science and Engineering course (2022-2023) of Politecnico di Torino.

Initial setup

1. Install Quarto

Quarto is "an open-source scientific and technical publishing system built on Pandoc", mainly maintained by Posit (formerly named RStudio). Find more info here.

To use Quarto, find the installer/package at this link. Download and install the one compatible with OS in use.

To test if Quarto is correctly installed, use the following command in the terminal

quarto --version

2. Clone this repository

Install git and move to a target folder, then clone this repository in the local machine using the following command

git clone https://github.com/Edoch94/DABDPA.git

3. Create the conda environment

Install Miniconda (see here for instructions and download), and create a new environment for this project, using the environment.yml configuration file as a "blueprint".

  • Move to the local folder where the DABDPA repository was cloned
  • Create the environment based on environment.yml
conda env create -f environment.yml

Otherwise, you can create the environment using pip and the requirements.txt file

pip install -r requirements.txt

How to use

This repository is structured as a Quarto project. The configuration file of this project is ./qproject/_quarto.yml.

After performing changes on the Quarto files (.qmd files), the project has to be re-rendered to reflect the changes in the .html output files. To render the entire project, follow these steps:

  1. Move to the local folder where the DABDPA repository was cloned
  2. Activate the conda environment
conda activate DABDPA
  1. Launch the Quarto project render
quarto render qproject/

Notice that the output of the project is a series of .html files, saved in the ./qproject/output folder. The output type can be easily changed to Microsoft Word (.docx) or PDF (.pdf) files in the format section of the ./qproject/_quarto.yml configuration file.

About

Notes from "Distributed architectures for big data processing and analytics"

Topics

Resources

Stars

Watchers

Forks

Languages