Email Spam Classification

This repository contains code for training a spam classification model using the Naive Bayes algorithm. It also includes functions for evaluating the model's performance and visualizing the spamicity of a given file. An explanation of the algorithm is given on my github page.

Prerequisites

Python 3.x
NLTK library
Matplotlib library
NumPy library

Installation

Clone the repository: git clone https://github.com/your-username/your-repository.git
Install the required dependencies: pip install nltk matplotlib numpy
Install nltk stop words: import nltk nltk.download('stopwords')

Usage

Import the necessary modules:

import os
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
from matplotlib import pyplot as plt
import numpy as np
import re

Train the spam classification model by calling the train_model function:

train_model(training_percent=0.8, SPAM_FOLDER='HAMS', HAM_FOLDER='SPAMS')

This function will randomly select a percentage of files from the provided spam and ham folders for training the model. It will store the training and testing file lists in separate text files.

Classify a file's spamicity using the get_file_spamicity function:

spamicity = get_file_spamicity(filename, n=8, plot=False)

This function calculates the spamicity of a given file by comparing the words in the file to the trained word count dictionary. It returns the calculated spamicity value.

Test misclassification for a given n using the test_misclassification function:

test_misclassification(testing_files_spams, testing_files_hams, n=(8, 16, 32), threshold=0.6, unseen_spamicity=0.4, plot=False, verbose=False)

This function tests the misclassification rate of the spam classification model on the provided testing files. It compares the calculated spamicity of each file to a threshold value and counts the false positives and true negatives. It accepts an optional n parameter to specify the number of words used for classification.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
HAMS		HAMS
SPAMS		SPAMS
.gitignore		.gitignore
README.MD		README.MD
get_file_spamicity_plot.png		get_file_spamicity_plot.png
spam_classifier.py		spam_classifier.py
test_misclassification_plot.png		test_misclassification_plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Classification

Prerequisites

Installation

Usage

About

Releases

Packages

Languages

An0n1mity/SpamClassifierEval

Folders and files

Latest commit

History

Repository files navigation

Email Spam Classification

Prerequisites

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages