This repository contains a Jupyter Notebook for fine-tuning DistillBERT on a sentiment analysis dataset. The model is trained using TensorFlow and Hugging Face's transformers
library to classify tweets into sentiment categories.
The dataset used for training is Tweets.csv
, which contains airline-related tweets labeled with sentiment categories (positive
, neutral
, negative
).
- Load dataset (
Tweets.csv
) - Check for missing values and class balance
- Convert text to lowercase
- Remove unnecessary columns
- Visualize word frequency using a Word Cloud
- Convert text into tokenized inputs (
input_ids
,attention_mask
) - Use Hugging Face
DistilBertTokenizer
- Ensure proper padding and truncation
- Map tokenized inputs to a TensorFlow dataset format
- Prepare training and testing sets
- Load
DistilBertForSequenceClassification
- Define loss function and optimizer
- Train model using TensorFlow/Keras
- Predict sentiment on test data
- Compute accuracy, precision, recall, and F1-score
- Generate a classification report
To run this notebook, install the following dependencies:
pip install numpy pandas matplotlib seaborn nltk tensorflow transformers scikit-learn tqdm plotly
- Clone this repository:
git clone https://github.com/awais-124/fine-tuning-distilbert.git
cd fine-tuning-distilbert
- Run the Jupyter Notebook:
jupyter notebook CODE.ipynb