A Naive-Bayes Text Classifier using Laplace smoothing

Pudu, Prithvidhar

ATTENTION: The works hosted here are being migrated to a new repository that will consolidate resources, improve discoverability, and better show UTA's research impact on the global community. We will update authors as the migration progresses. Please see MavMatrix for more information.

Date

2023-05-03

Author

Pudu, Prithvidhar

Metadata

Show full item record

Abstract

The text classifier was built using kaggle.com, a website that provides GPU resources to train large amounts of data. Using this website, I created a Naive-Bayes Text Classifier in Python to classify articles on whether people agree with them. The process is based on the Naive-Bayes theory which depends on two other theorems in its namesake. The Bayes theorem states that when calculating the probability of an event one should take into account the evidence for that event to happen. The Naive theorem assumes that these evidence events are independent of each other. Following this logic, I counted the number of times a particular word appears in a document and how many times that word is present in all the documents of the dataset. These statistics allow me to categorize articles using certain words. My classifier has an accuracy of 78% and can be improved by adding more data and tuning the parameters of the model.

URI

http://hdl.handle.net/10106/31530

Collections

UTA Datathon 2023 Innovative Data Intelligence Research Lab Machine Learning Text Classification Challenge - DO NOT EDIT