Sentiment Analysis of Genuine Customer Feedback in Amazon Book Reviews — LazyPredict Inspired {Part 3}

Manali_Raut
6 min readJul 17, 2023

Welcome back to my blog series on Sentiment Analysis of Genuine Customer Feedback in Amazon Book Reviews! If you’ve been following along, you’ve already gained insights from Part 1, where we explored the world of exploratory data analysis (EDA), and Part 2, where we dived into the fascinating realm of Natural Language Processing (NLP) and the powerful VADER sentiment analysis tool.

Image created by Author

In Part 3, we will shift our focus to a crucial aspect of sentiment analysis, Finding the best algorithms for sentiment classification. Sentiment analysis plays a pivotal role in understanding customer opinions and feedback, especially in the context of Amazon reviews where valuable insights can shape business decisions.

  1. Introduction
  2. Importance of Algorithm Selection
  3. LazyPredict
  4. LazyPredict Inspired Class for Multiclass Sentiment Analysis
  5. Evaluation of Best Algorithms
  6. Analysis of Algorithm Performance

Note: If you haven’t read Part 1 and Part 2 yet, I highly recommend checking them out to gain a comprehensive understanding of the exploratory data analysis and NLP techniques I have applied to the Amazon books review dataset.

Alright, folks, here we go again, for the third time!🤠

I. Introduction

When it comes to sentiment analysis of text data, choosing the right algorithms can be quite a daunting task, especially in multiclass classification scenarios. We all want is accurate and efficient machine learning models that can handle the complexities of classifying sentiments across multiple categories.

During my exploration, I stumbled upon a fascinating machine learning tool called “LazyPredict” which caught my attention. However, there was a slight hiccup — it is primarily designed for binary classification tasks. So, I created my own version inspired by LazyPredict, specifically tailored for multiclass classification.

II. Importance of Algorithm Selection

The selection of the right algorithm is crucial in any machine learning task, and sentiment analysis is no exception. The choice of algorithm can significantly impact the accuracy and performance of the sentiment analysis model. Different algorithms have their strengths and weaknesses, making it essential to understand their characteristics and suitability for the specific task at hand. Factors such as the nature of the data, the complexity of the problem, and the available computational resources should all be considered when selecting an algorithm. By choosing the most appropriate algorithm, we can improve the accuracy of sentiment analysis and gain valuable insights from the text data.

III. LazyPredict

LazyPredict is an innovative and time-saving Python library that simplifies the process of algorithm selection for machine learning tasks. LazyPredict automates the process of training and evaluating multiple machine learning models, providing quick insights into their performance without the need for extensive manual coding. Although initially designed for binary classification, it can still be used for multiclass classification by treating each class as a separate binary classification task.

LazyPredict

LazyPredict has gained popularity among data enthusiasts and practitioners. It offers a wide range of algorithms, allowing users to experiment and compare various models effortlessly. With LazyPredict, you can quickly assess the performance of multiple machine learning models and identify the most promising ones for your task. You must try it here, the results are fascinating. 😁

IV. LazyPredict Inspired Class for Multiclass Sentiment Analysis

To tackle the challenge of multiclass sentiment analysis in text data, I have developed a custom class inspired by the concept of LazyPredict. While LazyPredict is primarily designed for binary classification, I have expanded upon its capabilities to support multiclass classification scenarios. This custom class simplifies the process of algorithm selection for sentiment analysis by providing an automated way to train and evaluate multiple machine learning models.

The class incorporates popular machine learning algorithms suitable for multiclass sentiment analysis tasks. It leverages techniques such as TF-IDF vectorization and label encoding to preprocess the text data and prepare it for modeling. By utilizing this custom class, you can effortlessly compare the performance of various algorithms and identify the most effective ones for your specific sentiment analysis problem.

If you’re still hanging in there with me and finding this blog helpful, how about buying me a book? Yes, you read that right! Just imagine, with a click, you can help expand my knowledge and keep me fueled with new ideas. So go ahead, treat me to a book, and together we’ll continue our quest for learning and exploration! Cheers!

V. Evaluation of Best Algorithms

Once I have developed the custom class for multiclass sentiment analysis, the next step was to evaluate the performance of various algorithms and identify the best ones for the task.

To assess the performance of each algorithm, I have considered key evaluation metrics such as accuracy, precision, recall, and F1 score. These metrics provide valuable insights into the algorithm’s ability to correctly classify sentiment in text review data. Also, I have took only the sample of the headline review data i.e. 500 sample. Additionally, I have also calculated the time for each algorithm to get a sense of their computational efficiency.

By evaluating multiple algorithms, we can gain a comprehensive understanding of their strengths and weaknesses in handling multiclass sentiment analysis. This knowledge will enable us to select the most suitable algorithms for our specific task, ensuring accurate and reliable sentiment analysis of product reviews.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from lazypredict_inspired_multiclass_classification import MulticlassClassifier

amazon_review_sample = pd.read_csv("data/amazon_review_sample500.csv")

#Splitting the data for Review Headlines
X = amazon_review_sample["cleaned_review_headlines"]
Y = amazon_review_sample['headline_sentiments']
X[0]

# Splitting the data
X_train_headline, X_test_headline, Y_train_headline, Y_test_headline = train_test_split(X, Y, test_size=0.2, random_state=1)

#calling MulticlassClassifier
MulticlassClassifier.multiclass_classification(X_train_headline,Y_train_headline, X_test_headline, Y_test_headline)

#Splitting the data for Review Body
X1 = amazon_review_sample["cleaned_review_body"]
Y1 = amazon_review_sample['body_sentiments']
X1[0]

# Splitting the data
X_train_body, X_test_body, Y_train_body, Y_test_body = train_test_split(X1, Y1, test_size=0.2, random_state=1)
MulticlassClassifier.multiclass_classification(X_train_body,Y_train_body, X_test_body, Y_test_body)
Review Headline Sample
Review Body Sample

VI. Analysis of Algorithm Performance

After evaluating multiple algorithms for multiclass sentiment analysis on our Amazon review dataset, it’s time to analyze their performance and identify the top performers.

Multiclass Classification on Amazon Review Headlines
Multiclass Classification on Amazon Review Body

The pretty table above indicate that, all the algorithms achieved high training accuracy, reasonably good test accuracy, and moderate precision and recall scores on both the data. The F1 scores reflected the trade-off between precision and recall. These findings indicate that the models have the potential to effectively classify the sentiment of Amazon reviews headlines and body, but there is room for further optimization and fine-tuning to improve overall performance as we have used sample only.

VII. Conclusion

Our LazyPredict inspired class and the analysis of various algorithms have prepared us to effectively perform sentiment analysis on Amazon reviews. By utilizing the most appropriate algorithms to analyze the review body and headlines, we can gain meaningful insights into customer sentiments and use that knowledge to improve customer satisfaction.

Hey curious mind, We’ve come a long way on our sentiment analysis journey, but we’re not done just yet. It’s time for Part 4, where we explore more advanced algorithms in detail. I want your input on the 4 algorithms you’d like to see in action on the whole dataset. Drop your top picks in the comments below and let’s unlock even more insights together. Ready to dive deeper? Let’s make it happen!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Manali_Raut
Manali_Raut

Written by Manali_Raut

Assistant Professor at MIT ADT University, Pune. Exploring Self Learning & Machine Learning World. Wanna explore with me ? Hold a hand & let's deep dive.

No responses yet

Write a response