NLP Text Classification with Naive Bayes vs Logistic Regression

In this article, we are going to be examining the distinction between using a Logistic Regression and Naive Bayes for text classification of coffee drink reviews as positive or negative.

Dataset used is a corpus of a coffee drinks reviews, the consumers reviews about how the drink tastes. A quick one on the reminder into Logistic Regression and Naive Bayes. They are both machine learning techniques for binary classification, which was used to determine whether the review is going to be positive or negative.

Logistic Regression

Fig 1. Logistic Regression

Naive Bayes

Fig 2. Naive Bayes

Step 1: Prepare the data

Step 2: Data Processing

Step 3: Split the data set into test and training set

Step 4:Numerically encode the input data set

Step 5: Fit the model

Step 6: Evaluate the model

Error Metric with Logistic Regression

Accuracy: 0.866
Precision: 0.874
Recall: 0.944
F1 Score: 0.908

Confusion Matrix with Logistic Regression

Table 1. Logistic Regression Confusion Matrix

Error Metric with Naive Bayes

Accuracy: 0.866
Precision: 0.858
Recall: 0.968
F1 Score: 0.91

Confusion Matrix with Naive Bayes

We can see that obtain a better metric with Naive Bayes Classification

To get the source code and the data set, please go through my GitHub repository

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store