Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- **WooldridgeDiD (ETWFE)** estimator — Extended Two-Way Fixed Effects from Wooldridge (2025, 2023). Supports OLS, logit, and Poisson QMLE paths with ASF-based ATT and delta-method SEs. Four aggregation types (simple, group, calendar, event) matching Stata `jwdid_estat`. Alias: `ETWFE`. (PR #216, thanks @wenddymacro)

## [2.8.4] - 2026-04-04

### Added
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
- **Panel data support**: Two-way fixed effects estimator for panel designs
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), and Efficient DiD (Chen, Sant'Anna & Xie 2025) estimators for heterogeneous treatment timing
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), Efficient DiD (Chen, Sant'Anna & Xie 2025), and Wooldridge ETWFE (2021/2023) estimators for heterogeneous treatment timing
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
- **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
Expand Down Expand Up @@ -117,6 +117,7 @@ All estimators have short aliases for convenience:
| `Stacked` | `StackedDiD` | Stacked DiD |
| `Bacon` | `BaconDecomposition` | Goodman-Bacon decomposition |
| `EDiD` | `EfficientDiD` | Efficient DiD |
| `ETWFE` | `WooldridgeDiD` | Wooldridge ETWFE (2021/2023) |

`TROP` already uses its short canonical name and needs no alias.

Expand Down
6 changes: 1 addition & 5 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,7 @@ Implements local projections for dynamic treatment effects. Doesn't require spec

### Nonlinear DiD

For outcomes where linear models are inappropriate (binary, count, bounded).

- Logit/probit DiD for binary outcomes
- Poisson DiD for count outcomes
- Proper handling of incidence rate ratios and odds ratios
Implemented in `WooldridgeDiD` (alias `ETWFE`) — OLS, Poisson QMLE, and logit paths with ASF-based ATT. See [Tutorial 16](docs/tutorials/16_wooldridge_etwfe.ipynb).

**Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.

Expand Down
4 changes: 4 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ Deferred items from PR reviews that were not addressed before merge.
| StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low |
| HonestDiD Delta^RM: uses naive FLCI instead of paper's ARP conditional/hybrid confidence sets (Sections 3.2.1-3.2.2). ARP infrastructure exists but moment inequality transformation needs calibration. CIs are conservative (wider, valid coverage). | `honest_did.py` | #248 | Medium |
| Replicate weight tests use Fay-like BRR perturbations (0.5/1.5), not true half-sample BRR. Add true BRR regressions per estimator family. Existing `test_survey_phase6.py` covers true BRR at the helper level. | `tests/test_replicate_weight_expansion.py` | #253 | Low |
| WooldridgeDiD: QMLE sandwich uses `aweight` cluster-robust adjustment `(G/(G-1))*(n-1)/(n-k)` vs Stata's `G/(G-1)` only. Conservative (inflates SEs). Add `qmle` weight type if Stata golden values confirm material difference. | `wooldridge.py`, `linalg.py` | #216 | Medium |
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
| WooldridgeDiD: canonical link requirement (W2023 Prop 3.1) not enforced — no warning if user applies wrong method to outcome type. Estimator is consistent regardless, but equivalence with imputation breaks. | `wooldridge.py` | #216 | Low |
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |

#### Performance

Expand Down
149 changes: 149 additions & 0 deletions benchmarks/python/benchmark_wooldridge.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
#!/usr/bin/env python3
"""
Benchmark: WooldridgeDiD (ETWFE) Estimator (diff-diff WooldridgeDiD).

Validates OLS ETWFE ATT(g,t) against Callaway-Sant'Anna on mpdta data
(Proposition 3.1 equivalence), and measures estimation timing.

Usage:
python benchmark_wooldridge.py --data path/to/mpdta.csv --output path/to/results.json
"""

import argparse
import json
import os
import sys
from pathlib import Path

# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
def _get_backend_from_args():
"""Parse --backend argument without importing diff_diff."""
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
args, _ = parser.parse_known_args()
return args.backend

_requested_backend = _get_backend_from_args()
if _requested_backend in ("python", "rust"):
os.environ["DIFF_DIFF_BACKEND"] = _requested_backend

# NOW import diff_diff and other dependencies (will see the env var)
import pandas as pd

# Add parent to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))

from diff_diff import WooldridgeDiD, HAS_RUST_BACKEND
from benchmarks.python.utils import Timer


def parse_args():
parser = argparse.ArgumentParser(
description="Benchmark WooldridgeDiD (ETWFE) estimator"
)
parser.add_argument("--data", required=True, help="Path to input CSV data (mpdta format)")
parser.add_argument("--output", required=True, help="Path to output JSON results")
parser.add_argument(
"--backend", default="auto", choices=["auto", "python", "rust"],
help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
)
return parser.parse_args()


def get_actual_backend() -> str:
"""Return the actual backend being used based on HAS_RUST_BACKEND."""
return "rust" if HAS_RUST_BACKEND else "python"


def main():
args = parse_args()

actual_backend = get_actual_backend()
print(f"Using backend: {actual_backend}")

print(f"Loading data from: {args.data}")
df = pd.read_csv(args.data)

# Run OLS ETWFE estimation
print("Running WooldridgeDiD (OLS ETWFE) estimation...")
est = WooldridgeDiD(method="ols", control_group="not_yet_treated")

with Timer() as estimation_timer:
results = est.fit(
df,
outcome="lemp",
unit="countyreal",
time="year",
cohort="first_treat",
)

estimation_time = estimation_timer.elapsed

# Compute event study aggregation
results.aggregate("event")
total_time = estimation_timer.elapsed

# Store data info
n_units = len(df["countyreal"].unique())
n_periods = len(df["year"].unique())
n_obs = len(df)

# Format ATT(g,t) effects
gt_effects_out = []
for (g, t), cell in sorted(results.group_time_effects.items()):
gt_effects_out.append({
"cohort": int(g),
"time": int(t),
"att": float(cell["att"]),
"se": float(cell["se"]),
})

# Format event study effects
es_effects = []
if results.event_study_effects:
for rel_t, effect_data in sorted(results.event_study_effects.items()):
es_effects.append({
"event_time": int(rel_t),
"att": float(effect_data["att"]),
"se": float(effect_data["se"]),
})

output = {
"estimator": "diff_diff.WooldridgeDiD",
"method": "ols",
"control_group": "not_yet_treated",
"backend": actual_backend,
# Overall ATT
"overall_att": float(results.overall_att),
"overall_se": float(results.overall_se),
# Group-time ATT(g,t)
"group_time_effects": gt_effects_out,
# Event study
"event_study": es_effects,
# Timing
"timing": {
"estimation_seconds": estimation_time,
"total_seconds": total_time,
},
# Metadata
"metadata": {
"n_units": n_units,
"n_periods": n_periods,
"n_obs": n_obs,
"n_cohorts": len(results.groups),
},
}

print(f"Writing results to: {args.output}")
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "w") as f:
json.dump(output, f, indent=2)

print(f"Overall ATT: {results.overall_att:.6f} (SE: {results.overall_se:.6f})")
print(f"Completed in {total_time:.3f} seconds")
return output


if __name__ == "__main__":
main()
7 changes: 7 additions & 0 deletions diff_diff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@
TROPResults,
trop,
)
from diff_diff.wooldridge import WooldridgeDiD
from diff_diff.wooldridge_results import WooldridgeDiDResults
from diff_diff.utils import (
WildBootstrapResults,
check_parallel_trends,
Expand Down Expand Up @@ -210,6 +212,7 @@
Stacked = StackedDiD
Bacon = BaconDecomposition
EDiD = EfficientDiD
ETWFE = WooldridgeDiD

__version__ = "2.8.4"
__all__ = [
Expand Down Expand Up @@ -276,6 +279,10 @@
"EfficientDiDResults",
"EDiDBootstrapResults",
"EDiD",
# WooldridgeDiD (ETWFE)
"WooldridgeDiD",
"WooldridgeDiDResults",
"ETWFE",
# Visualization
"plot_bacon",
"plot_event_study",
Expand Down
66 changes: 66 additions & 0 deletions diff_diff/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -2342,3 +2342,69 @@ def _compute_confidence_interval(
upper = estimate + critical_value * se

return (lower, upper)


def solve_poisson(
X: np.ndarray,
y: np.ndarray,
max_iter: int = 200,
tol: float = 1e-8,
init_beta: Optional[np.ndarray] = None,
) -> Tuple[np.ndarray, np.ndarray]:
"""Poisson IRLS (Newton-Raphson with log link).

Does NOT prepend an intercept — caller must include one if needed.
Returns (beta, W_final) where W_final = mu_hat (used for sandwich vcov).

Parameters
----------
X : (n, k) design matrix (caller provides intercept / group FE dummies)
y : (n,) non-negative count outcomes
max_iter : maximum IRLS iterations
tol : convergence threshold on sup-norm of coefficient change
init_beta : optional starting coefficient vector; if None, zeros are used
with the first column treated as the intercept and initialized to
log(mean(y)) to improve convergence for large-scale outcomes.

Returns
-------
beta : (k,) coefficient vector
W : (n,) final fitted means mu_hat (weights for sandwich vcov)
"""
n, k = X.shape
if init_beta is not None:
beta = init_beta.copy()
else:
beta = np.zeros(k)
# Initialise the intercept to log(mean(y)) so the first IRLS step
# starts near the unconditional mean rather than exp(0)=1, which
# causes overflow when y is large (e.g. employment levels).
mean_y = float(np.mean(y))
if mean_y > 0:
beta[0] = np.log(mean_y)
for _ in range(max_iter):
eta = np.clip(X @ beta, -500, 500)
mu = np.exp(eta)
score = X.T @ (y - mu) # gradient of log-likelihood
hess = X.T @ (mu[:, None] * X) # -Hessian = X'WX, W=diag(mu)
try:
delta = np.linalg.solve(hess + 1e-12 * np.eye(k), score)
except np.linalg.LinAlgError:
break
# Damped step: cap the maximum coefficient change to avoid overshooting
max_step = np.max(np.abs(delta))
if max_step > 1.0:
delta = delta / max_step
beta_new = beta + delta
if np.max(np.abs(beta_new - beta)) < tol:
beta = beta_new
break
beta = beta_new
else:
warnings.warn(
"solve_poisson did not converge in {} iterations".format(max_iter),
RuntimeWarning,
stacklevel=2,
)
mu_final = np.exp(np.clip(X @ beta, -500, 500))
return beta, mu_final
Loading
Loading