🏦 Credit Risk Intelligence Engine

Credit Risk Intelligence Engine with Fairness Constraints & Model Explainability

📖 Overview

This project builds an end-to-end credit default prediction system using the Give Me Some Credit Kaggle dataset. Beyond raw predictive performance, the notebook integrates a full model explainability pipeline (SHAP + LIME) and a fairness auditing framework (IBM AIF360) to ensure equitable lending decisions across demographic groups.

The system is designed to answer four core business questions:

Which customer characteristics are most strongly associated with credit default?
Can machine learning models reliably identify high-risk applicants before default occurs?
What are the primary drivers of default risk according to the model?
How should prediction thresholds be selected to balance risk detection and customer approval rates?

📂 Dataset

Source: Give Me Some Credit — Kaggle Competition

The dataset contains financial and behavioural attributes of borrowers, where each row represents a credit applicant. The target variable is SeriousDlqin2yrs (renamed Defaulted) — whether the person experienced financial distress within two years.

Original Column	Simple Name	Description
`SeriousDlqin2yrs`	Defaulted	Target: 1 if financial distress within 2 years
`RevolvingUtilizationOfUnsecuredLines`	Revolving Utilization	Credit card balance as % of credit limit
`age`	Age	Borrower's age in years
`NumberOfTime30-59DaysPastDueNotWorse`	Past Due 30–59 Days	Times 30–59 days late on payment
`DebtRatio`	Debt Ratio	Monthly debt payments / monthly gross income
`MonthlyIncome`	Monthly Income	Borrower's monthly income
`NumberOfOpenCreditLinesAndLoans`	Open Credit Lines	Number of open credit lines and loans
`NumberOfTimes90DaysLate`	Past Due 90+ Days	Times 90+ days late
`NumberRealEstateLoansOrLines`	Real Estate Loans	Number of real estate loans
`NumberOfTime60-89DaysPastDueNotWorse`	Past Due 60–89 Days	Times 60–89 days late
`NumberOfDependents`	Dependents	Number of dependents

🔬 Project Structure

credit_risk_intelligence_engine.ipynb
│
├── 1. Business Questions & Project Scope
├── 2. Library Imports
├── 3. Data Loading & Exploration
├── 4. Data Preprocessing & Feature Engineering
├── 5. Exploratory Data Analysis (EDA)
├── 6. Statistical Testing for Key Risk Drivers
├── 7. Model Development
│   ├── Baseline: Logistic Regression
│   ├── Ensemble: Random Forest
│   └── Primary: XGBoost (Gradient Boosting)
├── 8. Model Evaluation
│   ├── Confusion Matrix
│   ├── ROC & Precision-Recall Curves
│   └── Model Comparison Summary
├── 9. Model Explainability
│   ├── SHAP (Global Feature Importance)
│   └── LIME (Individual Prediction Explanations)
├── 10. Fairness Auditing (IBM AIF360)
│   ├── Pre-Mitigation Bias Assessment
│   ├── Reject Option Classification (ROC) Mitigation
│   └── Post-Mitigation Fairness Audit
├── 11. Final Performance Summary
└── 12. Save Model & Artifacts

⚙️ Methodology

Data Preprocessing

Missing value imputation using median strategy (SimpleImputer)
Feature scaling with StandardScaler
Class imbalance handling via scale_pos_weight (XGBoost) and class_weight='balanced' (sklearn models)
Engineered aggregate feature: Total Past Due (sum of all delinquency counts)

Statistical Testing

Mann-Whitney U tests and KS tests are applied to confirm statistically significant differences (p < 0.05) between default and non-default groups across all key features. Cohen's D effect sizes are also reported to assess practical significance beyond statistical thresholds.

Models Trained

Model	Purpose
Logistic Regression	Interpretable baseline
Random Forest	Non-linear ensemble benchmark
XGBoost	Primary production model

Explainability

SHAP — Global feature importance using TreeExplainer; summary and beeswarm plots to understand model behaviour across the full dataset.
LIME — Local surrogate model for individual prediction explanations, showing per-feature contribution to a specific applicant's risk score.

Fairness Auditing

Protected attribute: Age Group (focus on 18–25 as unprivileged group)
Metrics: Disparate Impact (80% Rule), Equal Opportunity Difference, Statistical Parity Difference
Mitigation: Reject Option Classification (ROC) via IBM AIF360 — adjusts decision thresholds near the decision boundary to achieve demographic parity without retraining the model

📊 Key Results

Metric	Before Mitigation	After Mitigation
Disparate Impact	0.7964 ❌	~1.00 ✅
Equal Opportunity Diff	Biased	Equalized ✅
ROC-AUC	~0.86	Maintained
Recall (Defaults)	High	Maintained

Top Predictors (SHAP):

Total Past Due (historical delinquency)
Revolving Utilization of Unsecured Lines
Age
Number of Times 90+ Days Late
Debt Ratio

💾 Output Artifacts

After running the full notebook, the following files are saved:

File	Description
`xgboost_credit_model.pkl`	Trained XGBoost model
`feature_scaler.pkl`	Fitted StandardScaler
`fairness_thresholds.json`	Age-group-specific decision thresholds
`feature_columns.json`	List of feature names for inference
`model_performance_summary.csv`	Model comparison metrics
`fairness_metrics.csv`	Pre/post-mitigation fairness metrics

🚀 Getting Started

Prerequisites

pip install numpy pandas matplotlib seaborn scikit-learn xgboost shap lime aif360

Running the Notebook

Download the Give Me Some Credit dataset from Kaggle and place it in your working directory.
Open the notebook in Jupyter, JupyterLab, Google Colab, or Kaggle.
Run all cells sequentially from top to bottom.

Note: The notebook was developed on Python 3.12 with Kaggle's hosted environment. LIME and AIF360 are installed inline via !pip install cells.

🎓 Future Improvements

Production Integration — Connect to a live credit underwriting pipeline
Model Drift Monitoring — Set up automated monitoring for performance and fairness metric degradation over time
A/B Testing Framework — Continuously evaluate threshold strategies and mitigation approaches on real-world outcomes

📚 References

IBM AIF360 Documentation
SHAP Documentation
LIME Paper — Ribeiro et al. (2016)
Give Me Some Credit — Kaggle
EEOC 80% (Four-Fifths) Rule for Disparate Impact Analysis

👩‍💻 Author

Srishti Rajput
Credit Risk Intelligence Engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏦 Credit Risk Intelligence Engine

📖 Overview

📂 Dataset

🔬 Project Structure

⚙️ Methodology

Data Preprocessing

Statistical Testing

Models Trained

Explainability

Fairness Auditing

📊 Key Results

💾 Output Artifacts

🚀 Getting Started

Prerequisites

Running the Notebook

🎓 Future Improvements

📚 References

👩‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
README.md		README.md
credit_risk_intelligence_engine.ipynb		credit_risk_intelligence_engine.ipynb

Folders and files

Latest commit

History

Repository files navigation

🏦 Credit Risk Intelligence Engine

📖 Overview

📂 Dataset

🔬 Project Structure

⚙️ Methodology

Data Preprocessing

Statistical Testing

Models Trained

Explainability

Fairness Auditing

📊 Key Results

💾 Output Artifacts

🚀 Getting Started

Prerequisites

Running the Notebook

🎓 Future Improvements

📚 References

👩‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages