This project implements a Graph Neural Network (GNN) model to predict the water solubility of molecules using the ESOL (Estimated SOLubility) dataset. The model uses molecular graph structures to learn and predict solubility values.
The primary goal of this project is to demonstrate the application of Graph Neural Networks in predicting molecular properties, specifically water solubility. This model can be useful in various fields, including drug discovery, materials science, and chemical engineering.
- Python 3.x
- PyTorch
- PyTorch Geometric
- RDKit
- NumPy
- Pandas
- Matplotlib
- Seaborn
The GNN model consists of:
- 3 Graph Convolutional Network (GCN) layers
- Global mean and max pooling
- A final linear layer for prediction
The model uses the ESOL dataset, which contains:
- SMILES representations of molecules
- Experimental water solubility values
-
Data Preparation:
- Convert SMILES to RDKit molecules
- Extract molecular graph features
-
Model Training:
- Initialize the GCN model
- Train using MSE loss and Adam optimizer
- Monitor training loss
-
Evaluation:
- Predict solubility on test set
- Visualize results using scatter plots
The model demonstrates the ability to predict water solubility with reasonable accuracy, as shown in the scatter plot of predicted vs. actual solubility values.
- Experiment with different GNN architectures (e.g., GAT, GraphSAGE)
- Implement cross-validation for more robust evaluation
- Explore hyperparameter tuning
- Incorporate additional molecular features
- ESOL Dataset
- DeepFIndr