GitHub - enesdoruk/Reuters-Text-Classification: Hand Crafted Based Text Classification on Reuters Data

Description

This project comprises the first homework assignment for CS 549, where the renowned Reuters dataset is utilized. The primary objective is to explore various preprocessing techniques applied to the dataset and construct unigram, bigram, and trigram models.

The project involves the following key steps:

Utilizing the Reuters dataset
Applying a variety of preprocessing methods
Building unigram, bigram, and trigram models
Evaluating model performance using a test set
Employing metrics such as recall, precision, and F1 score for assessment

This homework assignment serves as an introduction to text data preprocessing and n-gram modeling techniques, with a focus on practical implementation and evaluation using real-world data.

Installation

This project is compiled by python 3.8

pip install -r requirements.txt

Run

Before run the code, you should change dataset path in main.py file. default is 'path = 'reuters21578''

python main.py

main.py file print metrics to the terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
reuters21578		reuters21578
README.md		README.md
dataloader.py		dataloader.py
main.py		main.py
metrics.py		metrics.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Installation

Run

About

Releases

Packages

Languages

enesdoruk/Reuters-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Description

Installation

Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages