Purpose: End‑to‑end framework for estimating Conditional Average Treatment Effects (CATE) and deploying uplift‑based targeting models that replace or augment traditional A/B testing.
Companies rely on experimentation to optimise user engagement, but fixed‑horizon A/B tests are slow and often under‑powered. This project builds a hybrid causal‑ML pipeline that:
- Ingests raw user‑session logs from a data warehouse.
- Cleans & feature‑engineers covariates suitable for causal analysis.
- Estimates heterogeneous treatment effects (CATE/ITE) with state‑of‑the‑art models (Causal Forests, DragonNet, TARNet).
- Surfaces high‑uplift cohorts via an API for real‑time targeting.
- Tracks experiments & model lineage with MLflow and Airflow.
Outcome: 9.2 % uplift in CTR in offline replay simulation relative to uniform treatment assignment.
.
├── airflow/ # Orchestration layer
│ ├── dags/
│ │ ├── etl_pipeline.py # Nightly feature engineering DAG
│ │ └── train_pipeline.py # Weekly model‑training DAG
│ └── docker/
│ └── Dockerfile # Slim Airflow image for CI
├── infra/ # IaC & deployment configs
│ ├── docker-compose.yml # Local full‑stack spin‑up
│ └── fly.toml # Fly.io production app spec
├── data/
│ ├── raw/ # Immutable source logs (Parquet)
│ └── processed/ # Feature & label tables
├── notebooks/ # Exploratory & report notebooks
│ ├── 01_dag_definition.ipynb # Causal graph & backdoor checks
│ ├── 02_model_compare.ipynb # Benchmark CATE estimators
│ └── 03_policy_replay.ipynb # Offline uplift‑policy eval
├── src/ # Project Python package (installable)
│ ├── common/
│ │ └── config.py # Path & hyper‑param management
│ ├── features/
│ │ ├── builder.py # Column generation & encoders
│ │ └── __init__.py
│ ├── models/
│ │ ├── causal_estimator.py # Wrapper for EconML/DoWhy
│ │ ├── policy_evaluation.py # Doubly‑robust replay metrics
│ │ └── __init__.py
│ └── inference/
│ ├── app.py # FastAPI entrypoint
│ └── schemas.py # Pydantic request/response models
├── tests/ # Pytest suites
│ ├── test_features.py
│ ├── test_models.py
│ └── test_api.py
├── scripts/ # One‑off utilities (seed, cleanup)
│ ├── bootstrap_db.sh
│ └── seed_data.py
├── .github/
│ └── workflows/
│ ├── ci.yml # Unit tests, lint, coverage gate
│ └── cd.yml # Docker build & deploy
├── environment.yml # Conda spec
├── requirements.txt # Pip alternative
├── Makefile # Common CLI tasks (make train, make test)
├── .pre-commit-config.yaml # Black, ruff, isort hooks
└── README.md # Project documentation root
.
├── airflow/ # DAG definitions & configs
├── data/
│ ├── raw/
│ └── processed/
├── notebooks/ # EDA & experiment notebooks
├── src/
│ ├── features/ # Feature builders
│ ├── models/ # Causal estimators & wrappers
│ └── inference/ # Serving code (FastAPI)
├── tests/ # Unit & integration tests
├── environment.yml # Conda environment
├── requirements.txt # pip alternative
└── README.md # You are here
mamba env create -f environment.yml
mamba activate causal-mlpython -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtGPU users: follow the PyTorch installation matrix for CUDA wheels.
| Source | Format | Notes |
|---|---|---|
| User session logs | Parquet | page_view_id, timestamps, features |
| Experiment feed | JSON | Treatment/control assignment metadata |
| Lookup tables | CSV | User demographics, device info |
All personally‑identifiable information (PII) must be removed or hashed before leaving secure storage.
- Define causal graph in
notebooks/01_dag_definition.ipynbusingDoWhy. - Run ETL:
make featuresor triggerairflow dags trigger etl_pipeline. - Model training:
make trainor wait for the weekly Airflow schedule. - Validation: offline policy replay (
src/models/policy_evaluation.py). - Register candidate to MLflow if metrics exceed thresholds.
- Deploy via GitHub Actions → Docker Registry → Fly.io / K8s.
| Estimator | Library | Metric (AUC) |
|---|---|---|
| Causal Forest | econml |
0.89 |
| DragonNet | econml + torch |
0.91 (★) |
| S‑Learner (XGB) | causalml |
0.84 |
★ Selected for production.
Hyper‑parameters stored under MLflow run tags; see notebooks/02_model_compare.ipynb for full experiment dashboard.
cd infra/
# Build image
docker build -t causal-inference-api .
# Push & deploy
flyctl deploy -i causal-inference-apisrc/inference/app.py exposes /predict (batch) and /score (single user) endpoints; latency ~30 ms p95 on t4‑small.
- Unit tests:
pytestin GitHub Actions, coverage gate 85 %. - Linting:
black,ruff, &pre‑commithooks. - Docker publish: tags MLflow model SHA.
- Infra as Code:
fly.toml,docker-compose.ymlfor local stack.
- Fork & branch from
main. - Enable pre‑commit:
pre-commit install. - Run
make testbefore PR. - PR must include updated docs and passing CI.
MIT © 2025 Causal‑ML Lab, Williams College.
- Kuang et al., Estimating Individual Treatment Effect: A Causal Representation Learning Approach, ICLR 2024.
- EconML team, Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation, 2023.
- Pearl, Causality: Models, Reasoning, and Inference, 2nd ed.