Causal‑ML Uplift Modeling Pipeline

Purpose: End‑to‑end framework for estimating Conditional Average Treatment Effects (CATE) and deploying uplift‑based targeting models that replace or augment traditional A/B testing.

1 Project Overview

Companies rely on experimentation to optimise user engagement, but fixed‑horizon A/B tests are slow and often under‑powered. This project builds a hybrid causal‑ML pipeline that:

Ingests raw user‑session logs from a data warehouse.
Cleans & feature‑engineers covariates suitable for causal analysis.
Estimates heterogeneous treatment effects (CATE/ITE) with state‑of‑the‑art models (Causal Forests, DragonNet, TARNet).
Surfaces high‑uplift cohorts via an API for real‑time targeting.
Tracks experiments & model lineage with MLflow and Airflow.

Outcome: 9.2 % uplift in CTR in offline replay simulation relative to uniform treatment assignment.

2 Architecture

.
├── airflow/                     # Orchestration layer
│   ├── dags/
│   │   ├── etl_pipeline.py      # Nightly feature engineering DAG
│   │   └── train_pipeline.py    # Weekly model‑training DAG
│   └── docker/
│       └── Dockerfile           # Slim Airflow image for CI
├── infra/                       # IaC & deployment configs
│   ├── docker-compose.yml       # Local full‑stack spin‑up
│   └── fly.toml                 # Fly.io production app spec
├── data/
│   ├── raw/                     # Immutable source logs (Parquet)
│   └── processed/               # Feature & label tables
├── notebooks/                   # Exploratory & report notebooks
│   ├── 01_dag_definition.ipynb  # Causal graph & backdoor checks
│   ├── 02_model_compare.ipynb   # Benchmark CATE estimators
│   └── 03_policy_replay.ipynb   # Offline uplift‑policy eval
├── src/                         # Project Python package (installable)
│   ├── common/
│   │   └── config.py            # Path & hyper‑param management
│   ├── features/
│   │   ├── builder.py           # Column generation & encoders
│   │   └── __init__.py
│   ├── models/
│   │   ├── causal_estimator.py  # Wrapper for EconML/DoWhy
│   │   ├── policy_evaluation.py # Doubly‑robust replay metrics
│   │   └── __init__.py
│   └── inference/
│       ├── app.py               # FastAPI entrypoint
│       └── schemas.py           # Pydantic request/response models
├── tests/                       # Pytest suites
│   ├── test_features.py
│   ├── test_models.py
│   └── test_api.py
├── scripts/                     # One‑off utilities (seed, cleanup)
│   ├── bootstrap_db.sh
│   └── seed_data.py
├── .github/
│   └── workflows/
│       ├── ci.yml               # Unit tests, lint, coverage gate
│       └── cd.yml               # Docker build & deploy
├── environment.yml              # Conda spec
├── requirements.txt             # Pip alternative
├── Makefile                     # Common CLI tasks (make train, make test)
├── .pre-commit-config.yaml      # Black, ruff, isort hooks
└── README.md                    # Project documentation root

3 Repository Structure

.
├── airflow/               # DAG definitions & configs
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/             # EDA & experiment notebooks
├── src/
│   ├── features/          # Feature builders
│   ├── models/            # Causal estimators & wrappers
│   └── inference/         # Serving code (FastAPI)
├── tests/                 # Unit & integration tests
├── environment.yml        # Conda environment
├── requirements.txt       # pip alternative
└── README.md              # You are here

4 Environment Setup

Conda (recommended)

mamba env create -f environment.yml
mamba activate causal-ml

Virtualenv

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

GPU users: follow the PyTorch installation matrix for CUDA wheels.

5 Data Sources

Source	Format	Notes
User session logs	Parquet	`page_view_id`, timestamps, features
Experiment feed	JSON	Treatment/control assignment metadata
Lookup tables	CSV	User demographics, device info

All personally‑identifiable information (PII) must be removed or hashed before leaving secure storage.

6 Experiment Workflow

Define causal graph in notebooks/01_dag_definition.ipynb using DoWhy.
Run ETL: make features or trigger airflow dags trigger etl_pipeline.
Model training: make train or wait for the weekly Airflow schedule.
Validation: offline policy replay (src/models/policy_evaluation.py).
Register candidate to MLflow if metrics exceed thresholds.
Deploy via GitHub Actions → Docker Registry → Fly.io / K8s.

7 Model Training & Evaluation

Estimator	Library	Metric (AUC)
Causal Forest	`econml`	0.89
DragonNet	`econml + torch`	0.91 (★)
S‑Learner (XGB)	`causalml`	0.84

★ Selected for production.

Hyper‑parameters stored under MLflow run tags; see notebooks/02_model_compare.ipynb for full experiment dashboard.

8 Deployment

cd infra/
# Build image
docker build -t causal-inference-api .
# Push & deploy
flyctl deploy -i causal-inference-api

src/inference/app.py exposes /predict (batch) and /score (single user) endpoints; latency ~30 ms p95 on t4‑small.

9 CI / CD

Unit tests: pytest in GitHub Actions, coverage gate 85 %.
Linting: black, ruff, & pre‑commit hooks.
Docker publish: tags MLflow model SHA.
Infra as Code: fly.toml, docker-compose.yml for local stack.

10 Contributing

Fork & branch from main.
Enable pre‑commit: pre-commit install.
Run make test before PR.
PR must include updated docs and passing CI.

11 License

12 References & Further Reading

Kuang et al., Estimating Individual Treatment Effect: A Causal Representation Learning Approach, ICLR 2024.
EconML team, Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation, 2023.
Pearl, Causality: Models, Reasoning, and Inference, 2nd ed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal‑ML Uplift Modeling Pipeline

1 Project Overview

2 Architecture

3 Repository Structure

4 Environment Setup

Conda (recommended)

Virtualenv

5 Data Sources

6 Experiment Workflow

7 Model Training & Evaluation

8 Deployment

9 CI / CD

10 Contributing

11 License

12 References & Further Reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
airflow		airflow
infra		infra
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Causal‑ML Uplift Modeling Pipeline

1 Project Overview

2 Architecture

3 Repository Structure

4 Environment Setup

Conda (recommended)

Virtualenv

5 Data Sources

6 Experiment Workflow

7 Model Training & Evaluation

8 Deployment

9 CI / CD

10 Contributing

11 License

12 References & Further Reading

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages