Customer Personality Analysis — Part 2
Playing with Different Clustering Techniques
Now-a-days, with the help of machine learning models, it becomes too handy to research and observe human behavior. So it’s time now to learn highly complex relationships for Customer personality Analysis and evaluate their generalizability and robustness with different clustering techniques.
As we have studied the personality traits in part 1 , we understood the data very well. Hence the next step is to apply different clustering algorithms and measure their performance.
1.Mapping ML Problem
2.K-Means Clustering
3.Agglomerative Clustering
4.GaussianMixture
5.Association Rule Mining
6.Applications, Advantages & Disadvantages
7.Conclusion
8.References
1. Mapping ML Problem
The core of this case study is to perform customer segmentation with the help of different clustering algorithms. Each of these algorithms have unique strengths and weaknesses. So we will study how these algorithms result in more natural clusters depending on the data.
1.1 performance measure
1.1.1. Silhouette Score
The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample.
Silhouette Coefficient for a sample = (b — a) / max(a, b).
Where, b -> is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples — 1.
1.1.2. Support and Confidence
I have explained them well here.
2. K-Means Clustering Algorithm
The K-means is an Unsupervised Machine Learning algorithm that splits a dataset into K non-overlapping clusters or subgroups. It allows splitting the data into different groups or categories. Like, if k= 2, then there will be two clusters, if k=3, then there will be three clusters, etc. K-means algorithm is a convenient way to discover the categories of groups in the unlabeled dataset by minimizing the sum of distances between the data point and their corresponding clusters.
The algorithm works as follows:
- First, initialize k no. of clusters, called means or cluster centroids, randomly.
- Then categorize each item to its closest mean and update the mean’s coordinates, which are the averages of the items categorized in that cluster so far.
- Repeat the process for a given number of iterations and at the end, Resultant clusters are formed.
I have used the elbow method to see how many clusters we can form.
I feel 2 is the best no. of clusters based on the elbow method. Now let’s group the data with clusters.
cluster 0 : It feels like Higher class than cluster 1, Higher income, number of Expenses , Purchases and Campaign is higher and have less kids as compare to cluster 1
-cluster 1 : It feels like middle class or low class as compared to cluster 0, Income is low. Number of Expenses , Purchases and Campaign is Low and has more kids as compared to cluster 0.
Now we can see the customer segmentation with Income and total purchases.
Now, the silhouette score is,
Join me on a literary journey by buying a book. Together, we’ll explore new worlds and unlock endless possibilities! 🚀📚
3. Agglomerative Clustering
Agglomerative Clustering is one of Hierarchical clustering types, is a bottom-up strategy in which each data point is originally a cluster of its own, and as one travels up the hierarchy, more pairs of clusters are combined. In it, two nearest clusters are taken and joined to form one single cluster.
It worked same like K-Means. Formed 2 clusters, customer segmentation with Income and total purchases is also same.
The silhouette score is,
4. GaussianMixture
It is quite natural and intuitive to assume that the clusters come from different Gaussian Distributions (Univariate or Multivariate) that could have varying parameters of covariance, mean, and density. GaussianMixture models work based on Expectation-Maximization. When the number of clusters for a Gaussian Mixture model is given , the EM algorithm tries to figure out the parameters of these Gaussian distributions with the help of the following steps.
- Data points are assigned to a Gaussian cluster and probabilities are calculated that they belong to that cluster on the basis of available data.
- The mean, covariance, and density are calculated for clusters based on the data points in the E step.
- The process is repeated with the calculated values continuing to be updated until convergence is reached.
Often, the best way to find an appropriate cluster number is to try different cluster numbers and see which fits your data appropriately. The two popular evaluation metrics for picking cluster numbers for fitting Gaussian Mixture models are BIC and AIC. BIC stands for Bayesian information criterion and AIC stands for Akaike information criterion. These matrics helps to favor the simplest model that maximizes the likelihood function of the model.
BIC and AIC are meant to be minimized. So I would pick 3 as the most appropriate cluster number for the data as the chart really levels off after that.
The silhouette score is,
5. Association Rule Mining
This is my third time working with Association Rule Mining. Now, it’s serious between us :D !!
Association rule learning is a rule based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. –Wikipedia , 2021
I have already given an example here about how it works. You will also find apriori there. Apriori takes the input data in the form of 1 and 0 or True and False. Hence I changed the marketing data into following format first.
And then into something like this.
Now, by applying apriori I have got the frequent itemset.
and association rules are formed.
Wow!! The Results are amazing. It gave me frozen sets with support and confidence values. Now this is the time to check for the biggest customers of wines :D
6. Applications, Advantages & Disadvantages
7. Conclusion
By observing all the algorithms performance, the clustering is mostly formed into 2 groups. Cluster 0 is high class with high income and high expenses mostly having 1 child. Cluster 1 is not so high class or can say middle class with low income and low expenses. Customer’s most bought products are wines and meat.It’s not the only approach to this analysis to take business decisions as I limited the number of features. But still I find cluster 0 is the best target for recommending the products and making the marketing effective. As for the future, I would like to explore all of these traits into depth of various relationships amongst features as well incorporate different clustering techniques and improve their measures.
You can find the code in python on Github.
You can reach me on LinkedIn.
Stay tuned!
8. References
- https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis
- https://docs.dataprep.ai/user_guide/eda/introduction.html
- https://medium.com/@andhikaw.789/customer-personality-analysis-segmentation-clustering-1b68a62a61a2
- https://towardsdatascience.com/gaussian-mixture-models-for-clustering-3f62d0da675
- https://nycdatascience.com/blog/python/data-based-customer-personality-analysis-for-businesses/
- https://www.cobuildlab.com/blog/ai/customer-personality-analysis-and-machine-learning-introduction
- https://github.com/arienugroho050396/Customer-Personality-Analysis/blob/main/Customer%20Personality%20Analysis%20Fix.ipynb