Credit Risk Intelligence Engine with Fairness Constraints & Model Explainability
This project builds an end-to-end credit default prediction system using the Give Me Some Credit Kaggle dataset. Beyond raw predictive performance, the notebook integrates a full model explainability pipeline (SHAP + LIME) and a fairness auditing framework (IBM AIF360) to ensure equitable lending decisions across demographic groups.
The system is designed to answer four core business questions:
- Which customer characteristics are most strongly associated with credit default?
- Can machine learning models reliably identify high-risk applicants before default occurs?
- What are the primary drivers of default risk according to the model?
- How should prediction thresholds be selected to balance risk detection and customer approval rates?
Source: Give Me Some Credit — Kaggle Competition
The dataset contains financial and behavioural attributes of borrowers, where each row represents a credit applicant. The target variable is SeriousDlqin2yrs (renamed Defaulted) — whether the person experienced financial distress within two years.
| Original Column | Simple Name | Description |
|---|---|---|
SeriousDlqin2yrs |
Defaulted | Target: 1 if financial distress within 2 years |
RevolvingUtilizationOfUnsecuredLines |
Revolving Utilization | Credit card balance as % of credit limit |
age |
Age | Borrower's age in years |
NumberOfTime30-59DaysPastDueNotWorse |
Past Due 30–59 Days | Times 30–59 days late on payment |
DebtRatio |
Debt Ratio | Monthly debt payments / monthly gross income |
MonthlyIncome |
Monthly Income | Borrower's monthly income |
NumberOfOpenCreditLinesAndLoans |
Open Credit Lines | Number of open credit lines and loans |
NumberOfTimes90DaysLate |
Past Due 90+ Days | Times 90+ days late |
NumberRealEstateLoansOrLines |
Real Estate Loans | Number of real estate loans |
NumberOfTime60-89DaysPastDueNotWorse |
Past Due 60–89 Days | Times 60–89 days late |
NumberOfDependents |
Dependents | Number of dependents |
credit_risk_intelligence_engine.ipynb
│
├── 1. Business Questions & Project Scope
├── 2. Library Imports
├── 3. Data Loading & Exploration
├── 4. Data Preprocessing & Feature Engineering
├── 5. Exploratory Data Analysis (EDA)
├── 6. Statistical Testing for Key Risk Drivers
├── 7. Model Development
│ ├── Baseline: Logistic Regression
│ ├── Ensemble: Random Forest
│ └── Primary: XGBoost (Gradient Boosting)
├── 8. Model Evaluation
│ ├── Confusion Matrix
│ ├── ROC & Precision-Recall Curves
│ └── Model Comparison Summary
├── 9. Model Explainability
│ ├── SHAP (Global Feature Importance)
│ └── LIME (Individual Prediction Explanations)
├── 10. Fairness Auditing (IBM AIF360)
│ ├── Pre-Mitigation Bias Assessment
│ ├── Reject Option Classification (ROC) Mitigation
│ └── Post-Mitigation Fairness Audit
├── 11. Final Performance Summary
└── 12. Save Model & Artifacts
- Missing value imputation using median strategy (
SimpleImputer) - Feature scaling with
StandardScaler - Class imbalance handling via
scale_pos_weight(XGBoost) andclass_weight='balanced'(sklearn models) - Engineered aggregate feature: Total Past Due (sum of all delinquency counts)
Mann-Whitney U tests and KS tests are applied to confirm statistically significant differences (p < 0.05) between default and non-default groups across all key features. Cohen's D effect sizes are also reported to assess practical significance beyond statistical thresholds.
| Model | Purpose |
|---|---|
| Logistic Regression | Interpretable baseline |
| Random Forest | Non-linear ensemble benchmark |
| XGBoost | Primary production model |
- SHAP — Global feature importance using TreeExplainer; summary and beeswarm plots to understand model behaviour across the full dataset.
- LIME — Local surrogate model for individual prediction explanations, showing per-feature contribution to a specific applicant's risk score.
- Protected attribute: Age Group (focus on 18–25 as unprivileged group)
- Metrics: Disparate Impact (80% Rule), Equal Opportunity Difference, Statistical Parity Difference
- Mitigation: Reject Option Classification (ROC) via IBM AIF360 — adjusts decision thresholds near the decision boundary to achieve demographic parity without retraining the model
| Metric | Before Mitigation | After Mitigation |
|---|---|---|
| Disparate Impact | 0.7964 ❌ | ~1.00 ✅ |
| Equal Opportunity Diff | Biased | Equalized ✅ |
| ROC-AUC | ~0.86 | Maintained |
| Recall (Defaults) | High | Maintained |
Top Predictors (SHAP):
- Total Past Due (historical delinquency)
- Revolving Utilization of Unsecured Lines
- Age
- Number of Times 90+ Days Late
- Debt Ratio
After running the full notebook, the following files are saved:
| File | Description |
|---|---|
xgboost_credit_model.pkl |
Trained XGBoost model |
feature_scaler.pkl |
Fitted StandardScaler |
fairness_thresholds.json |
Age-group-specific decision thresholds |
feature_columns.json |
List of feature names for inference |
model_performance_summary.csv |
Model comparison metrics |
fairness_metrics.csv |
Pre/post-mitigation fairness metrics |
pip install numpy pandas matplotlib seaborn scikit-learn xgboost shap lime aif360- Download the Give Me Some Credit dataset from Kaggle and place it in your working directory.
- Open the notebook in Jupyter, JupyterLab, Google Colab, or Kaggle.
- Run all cells sequentially from top to bottom.
Note: The notebook was developed on Python 3.12 with Kaggle's hosted environment. LIME and AIF360 are installed inline via
!pip installcells.
- Production Integration — Connect to a live credit underwriting pipeline
- Model Drift Monitoring — Set up automated monitoring for performance and fairness metric degradation over time
- A/B Testing Framework — Continuously evaluate threshold strategies and mitigation approaches on real-world outcomes
- IBM AIF360 Documentation
- SHAP Documentation
- LIME Paper — Ribeiro et al. (2016)
- Give Me Some Credit — Kaggle
- EEOC 80% (Four-Fifths) Rule for Disparate Impact Analysis
Srishti Rajput
Credit Risk Intelligence Engine