Sentiment Analysis of Genuine Customer Feedback in Amazon Book Reviews —NLP, VADER {Part 2}

7 min readJun 22, 2023

In the world of online shopping, Amazon is a powerhouse that connects people and businesses across the globe. It’s not just a platform to buy and sell, but a place where customers share their experiences through detailed reviews. These reviews hold a wealth of valuable information, and that’s where sentiment analysis comes in. In this blog, we’ll continue to explore the fascinating world of sentiment analysis.

In our previous blog, we took a deep dive into exploring Amazon book reviews through an exploratory data analysis (EDA). We uncovered interesting patterns and insights about these reviews. Now, it’s time to embark on the next phase of our journey — extracting sentiments from these reviews. So, join me as we unravel the power of sentiment analysis in the world of Amazon book reviews.

Introduction
Preprocessing of Amazon Book Reviews
Natural Language Processing (NLP)
VADER Sentiment Analysis
Conclusion
Reference

Here we go again !!!🤠

1. Introduction

Sentiment analysis uses clever technology like AI and machine learning to understand the emotions and opinions expressed in these reviews. Unlike other platforms, where customers need a little push to share their thoughts, Amazon has created a culture of open and honest feedback. By analyzing this feedback, businesses can gain deep insights into their products, even uncovering aspects they might not have considered before.

In our journey, we are specifically focused on Amazon book reviews. Now, we will discover how sentiment analysis can help publishers, authors, and sellers understand what customers love and what they don’t. We will see real examples of how this analysis can reveal hidden insights.

2. Preprocessing of Amazon book reviews

Text Preprocessing refers to the initial stage of cleaning and transforming raw text data before it can be used for Natural Language Processing (NLP) tasks. It involves a series of steps that aim to remove irrelevant information, standardize the text, and prepare it for further analysis.

Common preprocessing techniques include removing punctuation, converting text to lowercase, removing stop words, stemming or lemmatizing words, and handling special characters or numerical values. By preprocessing text, we can enhance the quality of the data, improve the efficiency of NLP algorithms, and extract meaningful insights from text-based data sources.

def decontracted(phrase):
    # specific
    phrase = re.sub(r"won't", "will not", phrase)      # replace won't with "will not"
    phrase = re.sub(r"can\'t", "can not", phrase)      # replace can or cant with 'can not'
    phrase = re.sub(r"n\'t", " not", phrase)           # replece n with 'not'
    phrase = re.sub(r"\'re", " are", phrase)           # replace re with 'are'
    phrase = re.sub(r"\'s", " is", phrase)             # replace s with 'is'
    phrase = re.sub(r"\'d", " would", phrase)          # replace 'd' with 'would'
    phrase = re.sub(r"\'ll", " will", phrase)          # replace 'll with 'will'
    phrase = re.sub(r"\'t", " not", phrase)            # replace 't' with 'not'
    phrase = re.sub(r"\'ve", " have", phrase)          # replace ve with 'have'
    phrase = re.sub(r"\'m", " am", phrase)             # replace 'm with 'am'
    return phrase

p_stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

def preprocess_text(text_data):
    preprocessed_text = []             
    
    for sentence in tqdm(text_data):
        sent = decontracted(sentence)                  #calling funcion for each sentence
        #print("1st sent" , sent)
        sent = sent.replace('\\r', ' ')                # replace line terminator with space
        sent = sent.replace('\\n', ' ')                # replace new line charactor with space
        sent = sent.replace('\\"', ' ')         
        sent = re.sub('[^A-Za-z]+', ' ', sent)        # remove anything that is not letter
        sent = ''.join(p_stemmer.stem(token) for token in sent )
        sent = ''.join(lemmatizer.lemmatize(token) for token in sent )
        sent  = ' '.join(e for e in sent.split() if len( Counter(e)) > 2 )
        #sent = lstr(emmatize_text(sent)
        # https://gist.github.com/sebleier/554280
        sent = ' '.join(e for e in sent.split() if e.lower() not in stopwords ) # checking for stop words
        preprocessed_text.append(sent.lower().strip())
    return preprocessed_text

WorldCloud of Review Headline and Review Body

If you’ve been enjoying this blog and finding it helpful, I have a special request for you. How about supporting my love for learning by buying me a book? Think of it as a way to show appreciation and fuel my passion for sharing valuable insights.

3. Natural Language Processing

Understanding and analyzing the language used by customers is no easy feat. That’s where natural language processing (NLP) comes into play. NLP, a field of artificial intelligence, equips us with the tools and techniques to extract valuable insights from textual data, unraveling the power hidden within Amazon book reviews.

3.1 Part-of-speech Tagging
Part-of-speech (POS) tagging is the process of assigning a specific grammatical category or part of speech (e.g., noun, verb, adjective, adverb) to each word in a text. It helps in determining the syntactic role and function of each word within a sentence.

3.2 Named Entity Recognition
Named Entity Recognition (NER) is a natural language processing (NLP) technique that aims to identify and classify named entities in text into predefined categories. Named entities refer to specific names of persons, organizations, locations, dates, quantities, and other named entities that hold significance in the given context.

def pos_ner_tagging(text_list):
    tagged_sentences = []
    ner_sentences = []
    for sentence in text_list:
        tokens = nltk.word_tokenize(sentence)
        tagged = nltk.pos_tag(tokens)
        tagged_sentences.append(tagged)
        
        named_entities = nltk.ne_chunk(tagged)
        ner_sentences.append(named_entities)
    return tagged_sentences, ner_sentences

tagged_headlines, ner_headlines = pos_ner_tagging(cleaned_headlines)
tagged_body, ner_body = pos_ner_tagging(cleaned_body)

By performing part-of-speech tagging and named entity recognition, we gain valuable insights into the linguistic aspects of the reviews, enabling us to better understand the content and extract meaningful information.

4. VADER Sentiment Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a popular sentiment analysis tool used for analyzing Amazon book reviews. It works by assigning a sentiment score to each word in a text, taking into account not only the individual word but also the context and the presence of intensifiers or negations. By aggregating these scores, VADER can determine the overall sentiment of a review, whether it is positive, negative, or neutral.

This enables us to gain valuable insights into the sentiments expressed by customers in their book reviews, helping us understand the overall perception of the books and make data-driven decisions. VADER sentiment analysis is a powerful technique that allows us to harness the power of language and sentiments to extract meaningful information from Amazon book reviews.

In the context of VADER sentiment analysis, the terms “positive,” “negative,” “neutral,” and “compound” refer to different aspects of the sentiment expressed in a piece of text.
a. Positive: Positive sentiment refers to the expression of favorable or optimistic emotions in the text. It indicates a positive perception, satisfaction, or approval of the subject being discussed.
b. Negative: Negative sentiment indicates the expression of unfavorable or pessimistic emotions in the text. It reflects a negative perception, dissatisfaction, or disapproval of the subject being discussed.
c. Neutral: Neutral sentiment refers to the absence of strong positive or negative emotions in the text. It indicates a neutral or objective stance where the author does not express a clear opinion or emotion towards the subject.
d. Compound: The compound score represents the overall sentiment of a text. It combines the individual sentiment scores of words and phrases to calculate a single compound score that indicates the overall sentiment polarity of the text.

vader = SentimentIntensityAnalyzer()
amazon_books_data['headline_score'] = amazon_books_data['cleaned_review_headlines'].apply(lambda review: vader.polarity_scores(review))
amazon_books_data['body_score'] = amazon_books_data['cleaned_review_body'].apply(lambda review: vader.polarity_scores(review))

#Converting Compunds values into Sentiments
def converting_compound_to_sentiments(compound_list) :
    sentiment_list = []
    
    for compund in compound_list:
        if compund >= 0.05 :
            sentiment_list.append("Positive")
        elif compund <= -0.05 :
            sentiment_list.append("Negative")
        else:
            sentiment_list.append("Neutral")
    
    return sentiment_list

sentiments_headline = converting_compound_to_sentiments(headline_compound)
sentiments_body = converting_compound_to_sentiments(body_compound)

Now, see our newly updated data with sentiments

4.1 Review Headline and Review Body Sentiments

We can see that there is a huge gap between the review headline sentiments and review body sentiments.

4.2 Books with most Positive Headline Reviews

4.3 Books with most Positive Body Reviews

5. Conclusion

Wow, what an amazing journey it was!! The combination of Natural Language Processing (NLP) techniques and VADER sentiment analysis proved to be a powerful tool in understanding the sentiments expressed in Amazon book reviews. Through NLP preprocessing, we were able to extract meaningful insights from the reviews, such as identifying key words and entities and VADER sentiment analysis allowed us to quantify the overall sentiment of the reviews, whether they are positive, negative, or neutral.

This analysis provides valuable information for authors, publishers, and businesses, enabling them to make data-driven decisions and improve their products and services based on customer feedback. NLP and VADER offer a unique opportunity to dive deep into the world of Amazon book reviews and uncover valuable insights that can drive success and customer satisfaction.

See you for an epic sequel in part 3!”
You can find the code in python on Github.
You can reach me on LinkedIn.
Stay tuned!