Names : Berezantev Mihaela, Sinsoillier Mike Junior, Du Couédic De Kergoualer Sophie Zhuo Ran
This repository contains the code to produce two models dedicated to predict the ion concentrations in water streams using data measures from in-situ probes.
This repository contains:
- Erlenbach_ion_concentration.csv : The ion concentrations sampled at different times (approximately every two hours). Represents the values to predict.
- Erlenbach_probe_data10min.csv : Different low-cost data measures from in-situ probes. Used as the features for the models.
- report.pdf : the report explaining the complete process of this project
- scripts: all the executable code, in particular
- Data_preprocessing.ipynb: a notebook illustrating our procedure to preprocess the data
- Boosting_Regressor.ipynb: predictions using boosting regressor
- NN.ipynb: predictions using a recursive neural network
- preprocessing.py: implementation of the functions actually used to preprocess the data
The notebooks can be executed as they are in Google Colaboratory. If you want to run them locally, make sure to have the following packages installed.
- python3. All the implementation are coded in python3.
- pandas. Used for data exploration and preprocessing.
- pytorch were used to compute the fast Fourier transforms during features creation.
- numpy : for an easy manipulations of the data arrays. You can install it via pip.
pip install numpy
- scikit-learn is a simple library for machine learning. Used both in the RNN and boosting regressor.
- TensorFlow
- keras is a deep learning API running with TensorFlow. It is necessary to run for the RNN.
To run and produce the results made by the boosting regressor or the RNN, a folder with the path /content/drive/MyDrive/ML/Project 2/data
should be created in google drive containing the two .csv
files Erlenbach_ion_concentration.csv and Erlenbach_probe_data10min.csv. The two notebooks can then simply be executed in Google Colaboratory.
If runned locally. The notebook should be updated with the appropriate paths to the data files, and the first cells of each notebook removed:
from google.colab import drive
drive.mount('/content/drive')