A Naive-Bayes Text Classifier using Laplace smoothing
Abstract
The text classifier was built using kaggle.com, a website that provides GPU resources to train large amounts of data. Using this website, I created a Naive-Bayes Text Classifier in Python to classify articles on whether people agree with them.
The process is based on the Naive-Bayes theory which depends on two other theorems in its namesake. The Bayes theorem states that when calculating the probability of an event one should take into account the evidence for that event to happen. The Naive theorem assumes that these evidence events are independent of each other. Following this logic, I counted the number of times a particular word appears in a document and how many times that word is present in all the documents of the dataset.
These statistics allow me to categorize articles using certain words. My classifier has an accuracy of 78% and can be improved by adding more data and tuning the parameters of the model.