Customer Personality Analysis — Part 1

Manali_Raut
7 min readAug 3, 2022

--

Detailed Exploratory Data Analysis

Data Science has revolutionized the world a lot through technical transformation. Now, we have gotten accustomed to seeing many machine learning applications in our day-to-day lives. But I am more interested in how machine learning can classify humans based on their personality traits.

https://www.lsretail.com/hubfs/BLOG_Retail-queue-covid-times.jpg

In this article, I will demonstrate the data analysis of Customer Personalities to extract meaningful insights from a large volume of marketing campaign data .This is an attempt to have insights on how the characteristics of a person relate to their personality traits and habits.

1.Introduction
2.Understanding the data
3.Exploratory Data Analysis (Matplotlib, Seaborn, Pandas)
4.Exploratory Data Analysis ((Dataprep.eda)
5.Conclusion

1. Introduction

Customer Personality Analysis is a detailed analysis of a company’s all types of customers. It also helps a business to understand behavior of customers, increase usage, customer satisfaction and also modify products according to needs. Here I am targeting specific people who paved the way for increasing marketing campaigns. These Personality based analysis are highly effective in increasing the popularity and attractiveness of products and services.

2. Understanding The Data

Customer personality analysis helps a business to modify its product based on its target customers from different types of customer segments. For example, instead of spending money to market a new product to every customer in the company’s database, a company can analyze which customer segment is most likely to buy the product and then market the product only on that particular segment.

2.1 Content

2.1.1. People

ID: Customer’s unique identifier
Year_Birth: Customer’s birth year
Education: Customer’s education level
Marital_Status: Customer’s marital status
Income: Customer’s yearly household income
Kidhome: Number of children in customer’s household
Teenhome: Number of teenagers in customer’s household
Dt_Customer: Date of customer’s enrollment with the company
Recency: Number of days since customer’s last purchase
Complain: 1 if the customer complained in the last 2 years, 0 otherwise

2.1.2. Products

MntWines: Amount spent on wine in last 2 years
MntFruits: Amount spent on fruits in last 2 years
MntMeatProducts: Amount spent on meat in last 2 years
MntFishProducts: Amount spent on fish in last 2 years
MntSweetProducts: Amount spent on sweets in last 2 years
MntGoldProds: Amount spent on gold in last 2 years

2.1.3. Promotion

NumDealsPurchases: Number of purchases made with a discount
AcceptedCmp1: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
AcceptedCmp2: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
AcceptedCmp3: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
AcceptedCmp4: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
AcceptedCmp5: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
Response: 1 if customer accepted the offer in the last campaign, 0 otherwise

2.1.4. Place

NumWebPurchases: Number of purchases made through the company’s website
NumCatalogPurchases: Number of purchases made using a catalogue
NumStorePurchases: Number of purchases made directly in stores
NumWebVisitsMonth: Number of visits to company’s website in the last month

3. Exploratory Data Analysis (Matplotlib, Seaborn, Pandas)

Let’s look at our data.

Data looks good as of now. First thing I have done is to check for missing values.

Found an Income column with 24 missing values so I filled it up with median values.

Now, because there’s a birth year column. I changed birth year to age (I used 2022 year to represent their current age.)

I summed up the total expenses, total no. of purchases, total accepted campaign and total kids home for each customer.

Then, I changed the values of the Marital Status column.

With the help of the customer ID column, checked for duplicate data.

Now, let’s check the data information again

Want to see me grow and flourish? If you’ve been enjoying this blog and finding it helpful, I have a special request for you. Treat me to a book and watch my ideas bloom! 🌺📚

3.1 Data Visualization

As we can see from the income graph, most customers have the income range of 30,000–80,000.

According to the Age column, most customers are between 44 to 57 ages.

We can see that out of total expenses wines is the best selling product.

We can see the Correlation between Income and Total expenses and then followed by Total Purchases. And another correlation is between Total Expenses and Total Purchases.

4. Exploratory Data Analysis (Dataprep.eda)

Exploratory Data Analysis (EDA) is the process of exploring a dataset and getting insights of its main characteristics. The dataprep.eda package simplifies this process by allowing the user to explore important characteristics with simple APIs. Each API allows the user to analyze the dataset from a high level to a low level, and from different perspectives.

I have only used one API i.e. create_report which is used to generate reports from pandas dataframe. It provides information like overview, variables, quantiles and descriptive statistics, correlations, missing values, etc.

Its a clear overview of the whole dataset, showing 24 missing values and almost all variables are skewed.

Here, I have shown all the insights of Education column. create_report gave me all these details about every column variable present in he dataset. It made far easy to understand the data.

This is the scatterplot showing the relation between income and wines. Customers in the specific range of Income are regular buyers of wine.

Create_report provided 3 kinds of correlation coefficient here. This is spearman we can see in image above.

bar chart of all variables very easily showing missing values in Income column.

5. Conclusion

This is an attempt to have insights on how the characteristics of a person relate to their personality traits and habits. To summarize my findings, I found and replaced 24 missing values in the Income column, Correlation between Income and Total expenses , no correlation between year of birth and amount spent on wine, there are more customers of wine, mostly graduates and with an average income. And about data.prep, I am amazed!! Now, we will be able to make predictions with the help of algorithms based on these demographics.

Hushhh!!! Huge work I have done here ;) I am gonna grab some tea and any song by BTS. Will meet you in part 2 .

You can find the code in python on Github.
You can reach me on
LinkedIn.
Stay tuned!

--

--

Manali_Raut

Assistant Professor at MIT ADT University, Pune. Exploring Self Learning & Machine Learning World. Wanna explore with me ? Hold a hand & let's deep dive.