A comprehensive machine learning system to predict which customers are likely to close their accounts and identify effective retention strategies. This project helps businesses reduce churn by 10-15% and increase customer lifetime value.
- High Impact: Reduces customer churn by 10-15%
- Increased Revenue: Improves customer lifetime value through targeted retention
- Proactive Approach: Automated alerting system for high-risk customers
- Data-Driven Insights: SHAP-based model interpretability for actionable insights
- Classification: Advanced ML techniques for churn prediction
- Feature Selection: Behavioral pattern analysis and feature engineering
- Model Interpretation: SHAP values for explainable AI
- Customer Segmentation: Risk-based customer categorization
customer-churn-prediction-system/
βββ src/
β βββ models/ # ML model training and prediction
β βββ features/ # Feature engineering and selection
β βββ visualization/ # Data visualization and plotting
β βββ utils/ # Utility functions and helpers
β βββ dashboard/ # Interactive dashboard application
βββ data/
β βββ raw/ # Original datasets
β βββ processed/ # Cleaned and preprocessed data
β βββ external/ # External data sources
β βββ models/ # Trained model artifacts
βββ notebooks/ # Jupyter notebooks for exploration
βββ tests/ # Unit and integration tests
βββ config/ # Configuration files
βββ scripts/ # Data processing and utility scripts
βββ docs/ # Documentation
This project uses Nix flakes for reproducible development environments and dependency management.
- Nix with flakes enabled
- Git
-
Clone the repository:
git clone <repository-url> cd customer-churn-prediction-system
-
Enter the development environment:
nix develop
-
Initialize project structure:
make setup
-
Download sample data:
make data-download
If you prefer not to use Nix, you can set up the environment manually:
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e ".[dev,jupyter]"
# Initialize project structure
make setupThe project supports multiple datasets:
- Kaggle: Telco Customer Churn - Telecommunications customer data
- Bank Customer Churn Dataset - Financial services customer data
Download scripts are provided in scripts/download_data.py.
# Development environment
make setup # Initialize project structure
nix develop # Enter development shell
# Data processing
make data-download # Download datasets
make data-process # Process raw data
# Model development
make train # Train churn prediction model
make predict # Run predictions
make model-evaluate # Evaluate model performance
# Development tools
make test # Run tests
make lint # Check code style
make format # Format code
make type-check # Type checking
# Applications
make jupyter # Start Jupyter Lab
make dashboard # Start interactive dashboardYou can also use Nix apps for common tasks:
nix run .#jupyter # Start Jupyter Lab
nix run .#dashboard # Start dashboard-
Data Analysis (
notebooks/01-exploratory-data-analysis.ipynb)- Analyze customer behavior patterns
- Examine transaction history and trends
- Identify key churn indicators
-
Feature Engineering (
src/features/build_features.py)- Transaction frequency metrics
- Balance trend analysis
- Service usage patterns
- Behavioral change detection
-
Model Development (
src/models/train_model.py)- Ensemble methods (Random Forest, XGBoost, LightGBM)
- Handle class imbalance
- Cross-validation and hyperparameter tuning
-
Model Interpretation (
notebooks/03-model-interpretation.ipynb)- SHAP values for feature importance
- Local and global explanations
- Business-friendly interpretation
-
Customer Segmentation (
src/models/segment_customers.py)- High/Medium/Low risk categorization
- Behavioral clustering
- Personalized retention strategies
-
Dashboard Development (
src/dashboard/app.py)- Interactive Plotly/Dash dashboard
- Real-time churn monitoring
- Customer success team interface
-
Alerting System (
src/utils/alerting.py)- Automated high-risk customer detection
- Email/Slack notifications
- Integration with CRM systems
- High Risk: Immediate intervention with personalized offers
- Medium Risk: Proactive engagement and loyalty programs
- Low Risk: Maintain satisfaction with regular check-ins
- 10-15% reduction in customer churn
- Increased customer lifetime value
- Improved customer satisfaction scores
- Data-driven retention budget allocation
- Python: Core development language
- scikit-learn: Machine learning framework
- SHAP: Model interpretability
- Plotly/Dash: Interactive dashboards
- SQL: Data querying and analysis
- Nix: Reproducible development environment
Run the test suite:
make testFor specific test categories:
pytest tests/unit/ # Unit tests
pytest tests/integration/ # Integration tests
pytest -m "not slow" # Skip slow testsThe system includes:
- Model performance monitoring
- Data drift detection
- Automated retraining pipelines
- A/B testing framework for retention strategies
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support, please open an issue on GitHub.