GitHub - imbottlebird/aml-nlp-project: Text summarization and sentiment analysis of user reviews on YELP

Evaluating Sentiment Capturing of Text Summarization Models

Project for MIT Course 6.862 Applied Machine Learning

Dataset:

Yelp dataset available on Kaggle https://www.kaggle.com/yelp-dataset/yelp-dataset

5,200,000 user reviews
Information on 174,000 businesses
The data spans 11 metropolitan areas

Problem:

With an overwhelming number of user reviews being generated on various online platforms, e.g. YELP, there is an increasing movement towards using NLP techniques to extract meaninful information from the review data; one of which is Text Summarization technique. However, can we fully trust the ML model to correctly transfer the sentiments from the original text to the summarized text? Is there any risk of distortion of original sentiments?

Goal:

The goal in this project is to measure the accuracy of text summarization models in capturing the sentiment information embedded in the original text reviews.

ML algorithms:

TF-IDF
Logistic Regression
CART
Random Forest
XGBOOST
Multilayer Perceptron Classifier
Bidirectional Encoder Representations from Transformers (BERT)
Pegasus (Text summarization)

Methods:

Sentiment Analysis:

Comparative Analysis of 6 ML models

Performance scores

BERT (a contextual language model) outperformed other models by a large margin

Text Summarization:

Based on the reviews of 70 businesses (8 reviews each) with human-generated summaries for benchmark analysis.

Model Evaluation

Evaluation Results

Conclusion

Compared to the baseline (randomly generated sentiments), the ML algorithm does a good job overall in capturing the original sentiments
However, there is still room for improvement compared to the human-generated summaries

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
img		img
notebook		notebook
LICENSE		LICENSE
README.md		README.md
nlp-project-preprocess.py		nlp-project-preprocess.py
nlp-project-sentiment-analysis.py		nlp-project-sentiment-analysis.py
tree.pdf		tree.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Sentiment Capturing of Text Summarization Models

Project for MIT Course 6.862 Applied Machine Learning

Dataset:

Problem:

Goal:

ML algorithms:

Methods:

Sentiment Analysis:

Text Summarization:

About

Releases

Packages

Languages

License

imbottlebird/aml-nlp-project

Folders and files

Latest commit

History

Repository files navigation

Evaluating Sentiment Capturing of Text Summarization Models

Project for MIT Course 6.862 Applied Machine Learning

Dataset:

Problem:

Goal:

ML algorithms:

Methods:

Sentiment Analysis:

Text Summarization:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages