Skip to content

swechchhapatel/Customer-segmentation-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Segmentation Analysis

Python Version scikit-learn Kaggle License

✦ Project Objective

The goal of this project is to analyze a retail customer dataset and divide the customer base into distinct groups based on their purchasing behavior and personal data. By understanding these segments, businesses can optimize targeted marketing strategies, improve customer retention, and maximize revenue.

✦ Dataset

The data used in this project is the Mall Customer Segmentation Data sourced from Kaggle. It contains synthetic data on supermarket customers designed specifically for learning clustering algorithms.

  • Features used for clustering:
    • Age
    • Annual Income (k$)
    • Spending Score (1-100): A proprietary score assigned to customers based on purchasing behavior and history.

✦ Project Structure

customer-segmentation/
│
├── data/
│   └── Mall_Customers.csv         # The dataset
│
├── notebooks/
│   └── eda_and_clustering.ipynb  
│
├── src/
│   ├── eda.py                     # Exploratory Data Analysis
│   ├── kmeans_2d.py               # 2D Clustering (Income vs Spending)
│   └── kmeans_3d.py               # 3D Clustering (Age, Income, Spending)
│
├── visuals/                       
│   ├── elbow_curve.png            
│   └── 3d Scatterplot of Age vs Annual Income vs Spending Score          
│   └── Cluster of Spending Score vs Annual Income
│   └── Distribution of Age
│   └── Distribution of Annual Income
│   └── Distribution of Spending Score
│   └── Gender Distribution
│   └── Scatterplot of the clusters of Spending Score vs Annual Income
│
├── requirements.txt              
└── README.md                      

✦ Methodology

This project utilizes K-Means Clustering, an unsupervised machine learning algorithm.

  1. Exploratory Data Analysis (EDA): Visualized the distributions of age, annual income, spending score, and gender.
  2. Determining Optimal Clusters: Used the Elbow Method (calculating the Within-Cluster Sum of Squares) to find the optimal number of clusters (k=5).
  3. 2D Clustering: Grouped customers based on Annual Income and Spending Score.
  4. 3D Clustering: Added 'Age' as a third dimension for a more granular segmentation.

✦ Key Insights & Customer Segments

Based on the K-Means algorithm, we successfully grouped the customer base into 5 distinct segments:

  • Cluster 0 (Target Customers): High Income, High Spending Score. (Prime targets for premium marketing)
  • Cluster 1 (Careful Customers): High Income, Low Spending Score. (Target for retention and discount campaigns to boost spending)
  • Cluster 2 (Standard Customers): Average Income, Average Spending Score. (The bulk of the customer base)
  • Cluster 3 (Sensible Customers): Low Income, Low Spending Score.
  • Cluster 4 (Careless Customers): Low Income, High Spending Score.

3D Cluster View

A* Algorithm in Action

✦ Installation & Setup

1. Clone the repository

git clone https://github.com/swechchhapatel/Customer-segmentation-analysis.git
cd customer-segmentation

2. Install dependencies Make sure you have Python installed, then run:

pip install -r requirements.txt

3. Run the scripts To view the Exploratory Data Analysis:

python src/eda.py

To run the 2-Dimensional Clustering model:

python src/kmeans_2d.py

To run the 3-Dimensional Clustering model and print the customer groups:

python src/kmeans_3d.py

✦ Contribution

Contributions are welcome and appreciated!

If you'd like to improve this project, please follow these steps:

  1. Fork the repository
  2. Create a new branch
    git checkout -b feature/your-feature-name
  3. Make changes and commit
    git commit -m "Add: your message"
  4. Push to your branch and open a Pull Request
  • Feel free to improve features, UI, or model performance.

Made with ❤️

About

Retail customer segmentation using K-Means clustering in Python (EDA, 2D & 3D visualizations) which identifies distinct shopper profiles to help optimize targeted marketing strategies.

Topics

Resources

License

Stars

Watchers

Forks

Contributors