This project focuses on analyzing Credit Card Dataset for Clustering.
The workflow is divided into two main parts:
- Unsupervised Learning (K-Means Clustering)
- Used to segment customers based on their credit card usage patterns.
- Achieved 0.32 Silhouette Score for Clustering 🎯
- Supervised Learning (Random Forest Classification)
- Built a classification model on the clustered dataset to predict cluster labels.
- Achieved 98.88% accuracy 🎯.
- Source: Credit Card Dataset for Clustering (Kaggle)
- Description: Contains anonymized credit card usage data for Customers.
- Features:
TENUREBALANCE_RANGEPURCHASES_RANGEONEOFF_PURCHASES_RANGEINSTALLMENTS_PURCHASES_RANGECASH_ADVANCE_RANGECREDIT_LIMIT_RANGEPAYMENTS_RANGEMINIMUM_PAYMENTS_RANGEBALANCE_FREQUENCY_RANGEPURCHASES_FREQUENCY_RANGEONEOFF_PURCHASES_FREQUENCY_RANGEPURCHASES_INSTALLMENTS_FREQUENCY_RANGECASH_ADVANCE_FREQUENCY_RANGEPRC_FULL_PAYMENT_RANGEPURCHASES_TRX_RANGECASH_ADVANCE_TRX_RANGE
- Handled missing values.
- Standardized numerical features.
- Prepared dataset for clustering.
- Chose optimal
Kusing theElbow method. - Segmented customers into clusters.
- Visualized clusters for interpretation.
- Used
K-Meanscluster assignments as pseudo-labels. - Trained a
Random Forest Classifierto predict cluster membership. - Evaluated performance with accuracy and classification report.
- Silhouette Score for Clustering:
0.32✅ - Random Forest Accuracy:
98.88%✅ - Classification Report:
- Precision, Recall, and F1-score all ≈
0.99.
- Precision, Recall, and F1-score all ≈
- Strong evidence that clusters are well-separated and predictable.
Determining the optimal number of clusters using the WCSS (Elbow) technique.
Visualization of the dataset after applying K-Means clustering.
Visualization of the dataset after applying K-Means clustering.

