igerber · igerber · Apr 4, 2026 · Apr 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **WooldridgeDiD (ETWFE)** estimator — Extended Two-Way Fixed Effects from Wooldridge (2025, 2023). Supports OLS, logit, and Poisson QMLE paths with ASF-based ATT and delta-method SEs. Four aggregation types (simple, group, calendar, event) matching Stata `jwdid_estat`. Alias: `ETWFE`. (PR #216, thanks @wenddymacro)
+
 ## [2.8.4] - 2026-04-04
 
 ### Added

diff --git a/README.md b/README.md
@@ -84,7 +84,7 @@ Detailed guide: [`docs/llms-practitioner.txt`](docs/llms-practitioner.txt)
 - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
 - **Panel data support**: Two-way fixed effects estimator for panel designs
 - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
-- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), and Efficient DiD (Chen, Sant'Anna & Xie 2025) estimators for heterogeneous treatment timing
+- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), Efficient DiD (Chen, Sant'Anna & Xie 2025), and Wooldridge ETWFE (2021/2023) estimators for heterogeneous treatment timing
 - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
 - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
 - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
@@ -117,6 +117,7 @@ All estimators have short aliases for convenience:
 | `Stacked` | `StackedDiD` | Stacked DiD |
 | `Bacon` | `BaconDecomposition` | Goodman-Bacon decomposition |
 | `EDiD` | `EfficientDiD` | Efficient DiD |
+| `ETWFE` | `WooldridgeDiD` | Wooldridge ETWFE (2021/2023) |
 
 `TROP` already uses its short canonical name and needs no alias.
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -82,11 +82,7 @@ Implements local projections for dynamic treatment effects. Doesn't require spec
 
 ### Nonlinear DiD
 
-For outcomes where linear models are inappropriate (binary, count, bounded).
-
-- Logit/probit DiD for binary outcomes
-- Poisson DiD for count outcomes
-- Proper handling of incidence rate ratios and odds ratios
+Implemented in `WooldridgeDiD` (alias `ETWFE`) — OLS, Poisson QMLE, and logit paths with ASF-based ATT. See [Tutorial 16](docs/tutorials/16_wooldridge_etwfe.ipynb).
 
 **Reference**: [Wooldridge (2023)](https://academic.oup.com/ectj/article/26/3/C31/7250479). *The Econometrics Journal*.
 

diff --git a/TODO.md b/TODO.md
@@ -72,6 +72,10 @@ Deferred items from PR reviews that were not addressed before merge.
 | StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low |
 | HonestDiD Delta^RM: uses naive FLCI instead of paper's ARP conditional/hybrid confidence sets (Sections 3.2.1-3.2.2). ARP infrastructure exists but moment inequality transformation needs calibration. CIs are conservative (wider, valid coverage). | `honest_did.py` | #248 | Medium |
 | Replicate weight tests use Fay-like BRR perturbations (0.5/1.5), not true half-sample BRR. Add true BRR regressions per estimator family. Existing `test_survey_phase6.py` covers true BRR at the helper level. | `tests/test_replicate_weight_expansion.py` | #253 | Low |
+| WooldridgeDiD: QMLE sandwich uses `aweight` cluster-robust adjustment `(G/(G-1))*(n-1)/(n-k)` vs Stata's `G/(G-1)` only. Conservative (inflates SEs). Add `qmle` weight type if Stata golden values confirm material difference. | `wooldridge.py`, `linalg.py` | #216 | Medium |
+| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
+| WooldridgeDiD: canonical link requirement (W2023 Prop 3.1) not enforced — no warning if user applies wrong method to outcome type. Estimator is consistent regardless, but equivalence with imputation breaks. | `wooldridge.py` | #216 | Low |
+| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
 
 #### Performance
 

diff --git a/benchmarks/python/benchmark_wooldridge.py b/benchmarks/python/benchmark_wooldridge.py
@@ -0,0 +1,149 @@
+#!/usr/bin/env python3
+"""
+Benchmark: WooldridgeDiD (ETWFE) Estimator (diff-diff WooldridgeDiD).
+
+Validates OLS ETWFE ATT(g,t) against Callaway-Sant'Anna on mpdta data
+(Proposition 3.1 equivalence), and measures estimation timing.
+
+Usage:
+    python benchmark_wooldridge.py --data path/to/mpdta.csv --output path/to/results.json
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+
+# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
+def _get_backend_from_args():
+    """Parse --backend argument without importing diff_diff."""
+    parser = argparse.ArgumentParser(add_help=False)
+    parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
+    args, _ = parser.parse_known_args()
+    return args.backend
+
+_requested_backend = _get_backend_from_args()
+if _requested_backend in ("python", "rust"):
+    os.environ["DIFF_DIFF_BACKEND"] = _requested_backend
+
+# NOW import diff_diff and other dependencies (will see the env var)
+import pandas as pd
+
+# Add parent to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from diff_diff import WooldridgeDiD, HAS_RUST_BACKEND
+from benchmarks.python.utils import Timer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description="Benchmark WooldridgeDiD (ETWFE) estimator"
+    )
+    parser.add_argument("--data", required=True, help="Path to input CSV data (mpdta format)")
+    parser.add_argument("--output", required=True, help="Path to output JSON results")
+    parser.add_argument(
+        "--backend", default="auto", choices=["auto", "python", "rust"],
+        help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
+    )
+    return parser.parse_args()
+
+
+def get_actual_backend() -> str:
+    """Return the actual backend being used based on HAS_RUST_BACKEND."""
+    return "rust" if HAS_RUST_BACKEND else "python"
+
+
+def main():
+    args = parse_args()
+
+    actual_backend = get_actual_backend()
+    print(f"Using backend: {actual_backend}")
+
+    print(f"Loading data from: {args.data}")
+    df = pd.read_csv(args.data)
+
+    # Run OLS ETWFE estimation
+    print("Running WooldridgeDiD (OLS ETWFE) estimation...")
+    est = WooldridgeDiD(method="ols", control_group="not_yet_treated")
+
+    with Timer() as estimation_timer:
+        results = est.fit(
+            df,
+            outcome="lemp",
+            unit="countyreal",
+            time="year",
+            cohort="first_treat",
+        )
+
+    estimation_time = estimation_timer.elapsed
+
+    # Compute event study aggregation
+    results.aggregate("event")
+    total_time = estimation_timer.elapsed
+
+    # Store data info
+    n_units = len(df["countyreal"].unique())
+    n_periods = len(df["year"].unique())
+    n_obs = len(df)
+
+    # Format ATT(g,t) effects
+    gt_effects_out = []
+    for (g, t), cell in sorted(results.group_time_effects.items()):
+        gt_effects_out.append({
+            "cohort": int(g),
+            "time": int(t),
+            "att": float(cell["att"]),
+            "se": float(cell["se"]),
+        })
+
+    # Format event study effects
+    es_effects = []
+    if results.event_study_effects:
+        for rel_t, effect_data in sorted(results.event_study_effects.items()):
+            es_effects.append({
+                "event_time": int(rel_t),
+                "att": float(effect_data["att"]),
+                "se": float(effect_data["se"]),
+            })
+
+    output = {
+        "estimator": "diff_diff.WooldridgeDiD",
+        "method": "ols",
+        "control_group": "not_yet_treated",
+        "backend": actual_backend,
+        # Overall ATT
+        "overall_att": float(results.overall_att),
+        "overall_se": float(results.overall_se),
+        # Group-time ATT(g,t)
+        "group_time_effects": gt_effects_out,
+        # Event study
+        "event_study": es_effects,
+        # Timing
+        "timing": {
+            "estimation_seconds": estimation_time,
+            "total_seconds": total_time,
+        },
+        # Metadata
+        "metadata": {
+            "n_units": n_units,
+            "n_periods": n_periods,
+            "n_obs": n_obs,
+            "n_cohorts": len(results.groups),
+        },
+    }
+
+    print(f"Writing results to: {args.output}")
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_path, "w") as f:
+        json.dump(output, f, indent=2)
+
+    print(f"Overall ATT: {results.overall_att:.6f} (SE: {results.overall_se:.6f})")
+    print(f"Completed in {total_time:.3f} seconds")
+    return output
+
+
+if __name__ == "__main__":
+    main()
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -164,6 +164,8 @@
     TROPResults,
     trop,
 )
+from diff_diff.wooldridge import WooldridgeDiD
+from diff_diff.wooldridge_results import WooldridgeDiDResults
 from diff_diff.utils import (
     WildBootstrapResults,
     check_parallel_trends,
@@ -210,6 +212,7 @@
 Stacked = StackedDiD
 Bacon = BaconDecomposition
 EDiD = EfficientDiD
+ETWFE = WooldridgeDiD
 
 __version__ = "2.8.4"
 __all__ = [
@@ -276,6 +279,10 @@
     "EfficientDiDResults",
     "EDiDBootstrapResults",
     "EDiD",
+    # WooldridgeDiD (ETWFE)
+    "WooldridgeDiD",
+    "WooldridgeDiDResults",
+    "ETWFE",
     # Visualization
     "plot_bacon",
     "plot_event_study",

diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py
@@ -2342,3 +2342,69 @@ def _compute_confidence_interval(
     upper = estimate + critical_value * se
 
     return (lower, upper)
+
+
+def solve_poisson(
+    X: np.ndarray,
+    y: np.ndarray,
+    max_iter: int = 200,
+    tol: float = 1e-8,
+    init_beta: Optional[np.ndarray] = None,
+) -> Tuple[np.ndarray, np.ndarray]:
+    """Poisson IRLS (Newton-Raphson with log link).
+
+    Does NOT prepend an intercept — caller must include one if needed.
+    Returns (beta, W_final) where W_final = mu_hat (used for sandwich vcov).
+
+    Parameters
+    ----------
+    X : (n, k) design matrix (caller provides intercept / group FE dummies)
+    y : (n,) non-negative count outcomes
+    max_iter : maximum IRLS iterations
+    tol : convergence threshold on sup-norm of coefficient change
+    init_beta : optional starting coefficient vector; if None, zeros are used
+        with the first column treated as the intercept and initialized to
+        log(mean(y)) to improve convergence for large-scale outcomes.
+
+    Returns
+    -------
+    beta : (k,) coefficient vector
+    W : (n,) final fitted means mu_hat (weights for sandwich vcov)
+    """
+    n, k = X.shape
+    if init_beta is not None:
+        beta = init_beta.copy()
+    else:
+        beta = np.zeros(k)
+        # Initialise the intercept to log(mean(y)) so the first IRLS step
+        # starts near the unconditional mean rather than exp(0)=1, which
+        # causes overflow when y is large (e.g. employment levels).
+        mean_y = float(np.mean(y))
+        if mean_y > 0:
+            beta[0] = np.log(mean_y)
+    for _ in range(max_iter):
+        eta = np.clip(X @ beta, -500, 500)
+        mu = np.exp(eta)
+        score = X.T @ (y - mu)                   # gradient of log-likelihood
+        hess = X.T @ (mu[:, None] * X)           # -Hessian = X'WX, W=diag(mu)
+        try:
+            delta = np.linalg.solve(hess + 1e-12 * np.eye(k), score)
+        except np.linalg.LinAlgError:
+            break
+        # Damped step: cap the maximum coefficient change to avoid overshooting
+        max_step = np.max(np.abs(delta))
+        if max_step > 1.0:
+            delta = delta / max_step
+        beta_new = beta + delta
+        if np.max(np.abs(beta_new - beta)) < tol:
+            beta = beta_new
+            break
+        beta = beta_new
+    else:
+        warnings.warn(
+            "solve_poisson did not converge in {} iterations".format(max_iter),
+            RuntimeWarning,
+            stacklevel=2,
+        )
+    mu_final = np.exp(np.clip(X @ beta, -500, 500))
+    return beta, mu_final