GitHub - moses90/bolt-modeling-task

Courier Datamodel

You can find the ER diagram and the description of each model in the [datamodel] folder.

A diagram image is also provided in root folder called infrastructure_diagram.svg. In this recommendation/assumption:

The main data source is a PostgreSQL database, this acts as the backend transactional database of the application.
The extraction and loading job is written in Python.
Our data warehouse is Bigquery.
Within Bigquery we perform the data modeling and transformations using dbt.
The presentation layer is connected to Looker, other ML applications/notebooks and our reverse ETL process is loading the analytics back to the source database. (So it is available to the couriers within the app.)
Python scripts and dbt is orchestrated with Mage.

Complexities and obstacles:

translating business needs and defining the metrics to be as relevant as possible for the company would require extra care (e.g. what is an active user?)
data storage and transformation performance is crucial. aggregations on millions of rows would need proper indexing and other performance optimization techniques
how much latency is okay with the metrics? which metrics should be real-time (or near real-time) and which shouldn't?
in case of some of personal and sensitive informations, security measures need to be considered to be complient with GDPR and other regulations

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
datamodel		datamodel
performance_metrics		performance_metrics
Analytics_Engineer__Bolt_-_Data_Modelling_Task.pdf		Analytics_Engineer__Bolt_-_Data_Modelling_Task.pdf
README.md		README.md
infrastructure_diagram.svg		infrastructure_diagram.svg