A production-ready framework for A/B testing, causal inference, and uplift modeling in AI SaaS experimentation
This project demonstrates a complete end-to-end workflow for evaluating AI-powered features in a SaaS platform using advanced causal inference techniques. It combines classical A/B testing with modern machine learning approaches to estimate heterogeneous treatment effects and enable personalized feature rollouts.
Key Capabilities:
- π― Causal Effect Estimation: Move beyond average treatment effects to individual-level predictions
- π Uplift Modeling: Identify which users benefit most from AI features
- π¨ User Segmentation: Create actionable segments based on treatment response
- πΌ Business Insights: Translate statistical findings into strategic recommendations
# Clone the repository
git clone https://github.com/amitabh-7t/ms-experimentation-causal-inference.git
cd ms-experimentation-causal-inference
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Start Jupyter
jupyter notebook
# Open and run the comprehensive notebook
notebooks/00_complete_causal_inference_analysis.ipynbA comprehensive, production-ready notebook covering the entire analysis pipeline:
- Load and validate 1.5M daily observations across 50k users
- Data quality checks and summary statistics
- Cohort distribution analysis
- Cohort-level statistical comparisons
- Pairwise t-tests with lift calculations
- Visualization of treatment effects
- Key Finding: 15-25% revenue lift with AI features
- Conditional Average Treatment Effect (CATE) estimation
- Feature engineering (continuous, binary, categorical)
- Individual-level uplift predictions
- Key Finding: Heterogeneous effects - not all users benefit equally
- Identify drivers of treatment response
- Extract importance from meta-learner models
- Top Drivers: baseline_productivity, churn_risk, user_tenure
- Five-tier user segmentation (Very Low β Very High)
- Segment profiling and characterization
- Business Strategy: Targeted rollout recommendations per segment
- Executive summary with key findings
- Immediate actions and long-term strategy
- Expected business impact (revenue, retention, adoption)
- Limitations and next steps
Output: Production-ready analysis with visualizations, statistical tests, and actionable insights suitable for both technical and non-technical stakeholders.
ms-experimentation-causal-inference/
βββ notebooks/
β βββ 00_complete_causal_inference_analysis.ipynb # β Main comprehensive notebook
β βββ 01_data_generation.ipynb # Synthetic data generation
β βββ 02_eda.ipynb # Exploratory analysis
β βββ 03_ab_test_engine.ipynb # A/B testing framework
β βββ 04_causal_inference.ipynb # Causal methods
β βββ 05_uplift_model.ipynb # Uplift modeling
β βββ 06_business_report.ipynb # Business insights
βββ src/
β βββ ab_test.py # A/B testing utilities
β βββ causal.py # Causal inference methods
β βββ uplift_model.py # Uplift modeling
β βββ utils.py # Helper functions
βββ data/ # Generated datasets (gitignored)
βββ requirements.txt # Python dependencies
βββ README.md
Treatment Cohorts:
- A_control: Baseline (no AI features)
- B_adaptive_v1: First-generation adaptive AI
- C_adaptive_v2: Second-generation adaptive AI
Dataset:
- 50,000 users
- 30-day observation period
- 1.5M daily observations
- Rich feature set (demographics, behavior, confounders)
X-learner Meta-learner:
- Train separate models for treatment and control groups
- Estimate counterfactual outcomes for each user
- Compute individual treatment effects (CATE)
- Combine predictions using propensity weighting
Advantages:
- Handles heterogeneous treatment effects
- Efficient with imbalanced groups
- Provides interpretable feature importance
- Enables personalized targeting
- ai_calls: AI feature usage intensity
- tasks_completed: Productivity measure
- satisfaction_score: User satisfaction (1-5 scale)
- retention_7d: 7-day retention rate
- revenue: Revenue per user
| Library | Version | Purpose |
|---|---|---|
| pandas | 2.3.3 | Data manipulation |
| numpy | 2.3.5 | Numerical computing |
| scikit-learn | 1.7.2 | Machine learning |
| econml | 0.15.5 | Causal inference (X-learner) |
| statsmodels | 0.14.5 | Statistical testing |
| matplotlib | 3.10.7 | Visualization |
| seaborn | 0.13.2 | Statistical plots |
- causalml (0.15.5): Alternative causal inference methods
- xgboost (3.1.2): Gradient boosting
- lightgbm (4.6.0): Gradient boosting
- shap (0.50.0): Model interpretability
- pyarrow (16.1.0): Parquet file support
- β Statistically significant improvements across all metrics (p < 0.001)
- π Revenue lift: 15-25% depending on cohort
- π Retention lift: 10-20%
- π― C_adaptive_v2 outperforms B_adaptive_v1, validating iterative development
- π― Top 20% of users show 3-5x higher uplift than average
- π Uplift range: Near-zero to 100+ revenue points
- π Key drivers: Baseline productivity, churn risk, user tenure
Targeted Rollout Strategy:
- 50% of resources β Very High uplift segment (maximum ROI)
- 30% of resources β High uplift segment
- 15% of resources β Medium uplift segment
- 5% of resources β Low/Very Low segments (focus on retention basics)
Expected Outcomes:
- 20-30% increase in incremental revenue vs. blanket rollout
- 5-10% reduction in churn among targeted users
- 2-3x higher AI feature adoption
This framework is applicable to:
- Product Experimentation: Evaluate new features with heterogeneous user bases
- Personalization: Identify which users benefit from specific treatments
- Resource Allocation: Optimize rollout strategies based on predicted uplift
- Retention Programs: Target at-risk users with high-impact interventions
- Pricing Optimization: Estimate willingness to pay across segments
- EconML Documentation
- KΓΌnzel et al. (2019): "Metalearners for estimating heterogeneous treatment effects"
- Athey & Imbens (2016): "Recursive partitioning for heterogeneous causal effects"
- CATE: Conditional Average Treatment Effect
- Uplift Modeling: Predicting individual treatment response
- Meta-learners: S-learner, T-learner, X-learner
- Propensity Score: Probability of treatment assignment
ArrowKeyError with Parquet files:
pip install pyarrow==16.1.0Jupyter kernel issues:
python -m ipykernel install --user --name=venvMissing dependencies:
pip install -r requirements.txt --upgradeContributions are welcome! Areas for enhancement:
- Multi-treatment optimization (beyond binary treatment)
- Causal forests implementation
- Real-time uplift scoring API
- Additional meta-learner approaches
- Sensitivity analysis tools
This project is licensed under the MIT License - see the LICENSE file for details.
Author: Amitabh
GitHub: @amitabh-7t
Repository: ms-experimentation-causal-inference
For questions or feedback, please open an issue on GitHub.
β Star this repository if you find it useful for your experimentation workflows!