From 1983a50c2ab4fc81c6cd07f664e96377779fce2c Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 09:08:52 -0400 Subject: [PATCH 01/12] Update survey documentation: compatibility matrix, roadmap, deferred work Add survey compatibility matrix to choosing_estimator.rst (Phase 8f), fix 11 stale entries in the tutorial table and replace with cross-reference, mark Phase 8a-8e as shipped in survey-roadmap.md, consolidate all remaining NotImplementedError paths into a single deferred work section, add SDR to replicate method lists, and update ROADMAP.md version/status entries. Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 13 ++- TODO.md | 4 + docs/choosing_estimator.rst | 123 ++++++++++++++++++++- docs/survey-roadmap.md | 171 ++++++++++++++++------------- docs/tutorials/16_survey_did.ipynb | 59 +++++++++- 5 files changed, 280 insertions(+), 90 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 058f2ba1..8f8fce1c 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,15 +8,15 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). ## Current Status -diff-diff v2.7.5 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination: +diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination: -- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) +- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) - **Study design**: Power analysis tools - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md)) +- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md)) - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) --- @@ -34,19 +34,20 @@ full details. - **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4). Supports BRFSS, ACS annual, CPS monthly. -- **Survey-Aware DiD Tutorial** *(Open)*: Jupyter notebook demonstrating +- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating the full workflow with realistic survey data. - **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full event-study VCV propagated to sensitivity analysis, with bootstrap/replicate diagonal fallback. -### Staggered Triple Difference (DDD) +### Staggered Triple Difference (DDD) *(Implemented)* -Extend the existing `TripleDifference` estimator to handle staggered adoption settings. +`StaggeredTripleDifference` estimator for staggered adoption DDD settings. - Group-time ATT(g,t) for DDD designs with variation in treatment timing - Event study aggregation and pre-treatment placebo effects - Multiplier bootstrap for valid inference in staggered settings +- Full survey support (pweight, strata/PSU/FPC, replicate weights) **Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`. diff --git a/TODO.md b/TODO.md index 9c1aad97..748a2dd8 100644 --- a/TODO.md +++ b/TODO.md @@ -15,6 +15,10 @@ Current limitations that may affect users: | MultiPeriodDiD wild bootstrap not supported | `estimators.py:778-784` | Low | Edge case | | `predict()` raises NotImplementedError | `estimators.py:567-588` | Low | Rarely needed | +For survey-specific limitations (NotImplementedError paths), see the +[consolidated deferred list](docs/survey-roadmap.md#deferred-work-consolidated) +in survey-roadmap.md. + ## Code Quality ### Large Module Files diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index f0c49261..34bbf612 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -571,5 +571,124 @@ If you're unsure which estimator to use: investigate why (often reveals violations of assumptions) 5. **Using survey data?** - Pass a ``SurveyDesign`` to ``fit()`` for design-based - variance estimation. See the `survey tutorial `_ - for a full walkthrough with strata, PSU, FPC, replicate weights, and subpopulation analysis. + variance estimation. See the :ref:`survey-design-support` section below for + the compatibility matrix, and the `survey tutorial `_ + for a full walkthrough. + +.. _survey-design-support: + +Survey Design Support +--------------------- + +All estimators accept an optional ``survey_design`` parameter in ``fit()``. +Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance +estimation. The depth of support varies by estimator: + +.. list-table:: + :header-rows: 1 + :widths: 25 12 18 18 18 + + * - Estimator + - Weights + - Strata/PSU/FPC + - Replicate Weights + - Survey Bootstrap + * - ``DifferenceInDifferences`` + - Full + - Full + - Full + - -- + * - ``TwoWayFixedEffects`` + - Full + - Full + - Full + - -- + * - ``MultiPeriodDiD`` + - Full + - Full + - Full + - -- + * - ``CallawaySantAnna`` + - pweight only + - Full + - Full + - Multiplier at PSU + * - ``TripleDifference`` + - pweight only + - Full + - Full (analytical) + - -- + * - ``StaggeredTripleDifference`` + - pweight only + - Full + - Full + - Multiplier at PSU + * - ``SunAbraham`` + - Full + - Full + - Full + - Rao-Wu rescaled + * - ``StackedDiD`` + - pweight only + - Full (pweight only) + - Full + - -- + * - ``ImputationDiD`` + - pweight only + - Full + - Full (analytical) + - Multiplier at PSU + * - ``TwoStageDiD`` + - pweight only + - Full + - Full (analytical) + - Multiplier at PSU + * - ``ContinuousDiD`` + - Full + - Full + - Full (analytical) + - Multiplier at PSU + * - ``EfficientDiD`` + - Full + - Full + - Full (analytical) + - Multiplier at PSU + * - ``SyntheticDiD`` + - pweight only + - Via bootstrap + - -- + - Rao-Wu rescaled + * - ``TROP`` + - pweight only + - Via bootstrap + - -- + - Rao-Wu rescaled + * - ``BaconDecomposition`` + - Diagnostic + - Diagnostic + - -- + - -- + +**Legend:** + +- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance +- **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics) +- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``SyntheticDiD`` requires ``variance_method='bootstrap'``; ``TROP`` uses bootstrap by default. ``SyntheticDiD`` placebo does not support strata/PSU/FPC. +- **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error +- **Diagnostic**: Weighted descriptive statistics only (no inference) +- **--**: Not supported + +.. note:: + + ``EfficientDiD`` does not support ``covariates`` and ``survey_design`` + simultaneously (the DR nuisance path does not yet thread survey weights). + +.. note:: + + ``SyntheticDiD`` with ``variance_method='placebo'`` does not support + strata/PSU/FPC. Use ``variance_method='bootstrap'`` for full survey + design support. + +For the full walkthrough with code examples, see the +`survey tutorial `_. +For deferred work and remaining limitations, see ``docs/survey-roadmap.md``. diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 1f1a9efe..008101f4 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,7 +1,7 @@ # Survey Data Support Roadmap This document captures the survey data support roadmap for diff-diff. -Phases 1-7 are implemented. Phase 8 (maturity refinements) is planned. +All phases (1-8) are implemented. Remaining deferred items are listed at the bottom. ## Implemented (Phases 1-2) @@ -32,9 +32,9 @@ Phase 5 infrastructure (bootstrap+survey interaction): | Estimator | Deferred Capability | Blocker | |-----------|-------------------|---------| -| SunAbraham | Pairs bootstrap + survey | Phase 5: bootstrap+survey interaction | -| ContinuousDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction | -| EfficientDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction | +| SunAbraham | Pairs bootstrap + survey | **Resolved** (Phase 6, Rao-Wu rescaled) | +| ContinuousDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| EfficientDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | | EfficientDiD | Covariates (DR path) + survey | DR nuisance estimation needs survey weight threading | All blocked combinations raise `NotImplementedError` when attempted, with a @@ -56,12 +56,12 @@ TripleDifference IPW/DR from Phase 3 deferred work. | Estimator | Deferred Capability | Blocker | |-----------|-------------------|---------| -| ImputationDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| TwoStageDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| CallawaySantAnna | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | Phase 5: route combined IF/WIF through `compute_survey_vcov()` for design-based aggregation SEs | -| CallawaySantAnna | Covariates + IPW/DR + survey | Phase 5: DRDID panel nuisance IF corrections | -| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Phase 5: replace conservative plug-in IF with semiparametrically efficient IF | +| ImputationDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| TwoStageDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) | +| CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) | +| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | **Resolved** (Phase 7a) | ## Implemented (Phase 5): SyntheticDiD + TROP Survey Support @@ -100,12 +100,13 @@ JKn requires explicit `replicate_strata` (per-replicate stratum assignment). - Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py` - Replicate weights mutually exclusive with strata/PSU/FPC - Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()` -- **Limitations**: Supported in CallawaySantAnna, ContinuousDiD, EfficientDiD, - TripleDifference (analytical only, no bootstrap). Rejected with - `NotImplementedError` in DifferenceInDifferences, TwoWayFixedEffects, - MultiPeriodDiD, StackedDiD, SunAbraham, ImputationDiD, TwoStageDiD, - SyntheticDiD, TROP. Expansion to regression-based estimators (SA, - Imputation, TwoStage, Stacked) is straightforward but deferred. +- **Coverage**: 12 of 15 estimators support replicate weights. + Supported: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, + CallawaySantAnna, TripleDifference (analytical only), StaggeredTripleDifference, + SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD. + Rejected: SyntheticDiD, TROP (no published theory on replicate weights + + unit weight optimization / nuclear norm regularization), BaconDecomposition + (diagnostic tool, no inference). ### DEFF Diagnostics ✅ (2026-03-26) Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov. @@ -210,93 +211,109 @@ variance estimation for staggered triple differences. Refinements to close remaining gaps versus R's `survey` package and improve practitioner experience. Prioritized by user impact. -### 8a. Successive Difference Replication (SDR) +### 8a. Successive Difference Replication (SDR) ✅ -**Priority: High.** ACS PUMS — the most common US survey dataset for DiD -policy evaluation — provides 80 SDR replicate weight columns. Without SDR -support, these users can't use their provided replicate weights directly. - -**What's needed:** -- Add `"SDR"` to `valid_rep_methods` in `SurveyDesign` -- Variance formula: `V = 4/R * sum((theta_r - theta)^2)` — a scaling - difference from BRR, not a new algorithm -- Wire through `compute_replicate_vcov()` and `compute_replicate_if_variance()` +**Shipped in v2.8.4.** ACS PUMS — the most common US survey dataset for DiD +policy evaluation — provides 80 SDR replicate weight columns. +`SurveyDesign(replicate_method="SDR")` with variance formula +`V = 4/R * sum((theta_r - theta)^2)`. **Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and Model-Based Postcensal Estimation of Income and Poverty Characteristics for States and Counties." ASA Proceedings. -### 8b. FPC in ImputationDiD and TwoStageDiD - -**Priority: High.** Both estimators now support replicate weights and TSL -with strata/PSU, but reject FPC outright (`NotImplementedError`). Adding -FPC is incremental — thread `fpc` through the existing TSL variance path. -Matters for finite population surveys (common in state-level sampling). +### 8b. FPC in ImputationDiD and TwoStageDiD ✅ -**Current gate:** `imputation.py:280`, `two_stage.py:268` +**Shipped in v2.8.4.** Both estimators now have full strata/PSU/FPC +support. FPC is threaded through the existing TSL variance path. -### 8c. Silent Operation Warnings +### 8c. Silent Operation Warnings ✅ -**Priority: High.** Add `UserWarning` emissions for operations that -silently alter analysis results: -- TROP lstsq → pseudo-inverse numerical fallback -- TwoStageDiD NaN masking of unidentified fixed effects -- TwoStageDiD always-treated unit removal -- CallawaySantAnna silent (g,t) pair skipping -- TROP missing treatment indicator fill with 0 -- Rust → Python backend fallback (currently debug log only) -- Survey weight normalization (pweights rescaled to mean=1) -- `np.inf` → 0 never-treated conversion +**Shipped in v2.8.3.** Eight operations that previously altered analysis +results without informing the user now emit `UserWarning`: +TROP lstsq fallback, TwoStageDiD NaN masking, TwoStageDiD always-treated +removal, CallawaySantAnna (g,t) pair skipping, TROP treatment indicator +fill, Rust → Python fallback, survey weight normalization, `np.inf` → 0 +never-treated conversion. -### 8d. Lonely PSU "adjust" in Bootstrap +### 8d. Lonely PSU "adjust" in Bootstrap ✅ -**Priority: Medium.** `lonely_psu="adjust"` works for analytical (TSL) -variance but raises `NotImplementedError` for survey-aware bootstrap -(2 raises in `bootstrap_utils.py`). Real survey data regularly has -singleton strata. Users needing bootstrap inference with such data hit -a wall. +**Shipped in v2.8.4.** `lonely_psu="adjust"` now works with survey-aware +bootstrap using Rust & Rao (1996) grand-mean centering. **Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation for Complex Surveys Using Replication Techniques." Statistical Methods in Medical Research 5(3). -### 8e. Survey Diagnostics and Utilities +### 8e. Survey Diagnostics and Utilities ✅ -**Priority: Medium.** Small additions that signal maturity to survey -statisticians: -- **CV on estimates**: coefficient of variation (SE/estimate) on results - objects — trivial to add, used by federal agencies for publication - standards (NCHS requires CV < 30% for releasable estimates) -- **Weight trimming**: `trim_weights(data, weight_col, upper=None, - quantile=None)` utility in `prep.py` for capping extreme weights -- **ImputationDiD pretrends + survey**: pre-trends F-test currently - ignores survey variance (`NotImplementedError` at `imputation.py:240`) +**Shipped in v2.8.4.** +- **CV on estimates**: `coef_var` property on all results objects (SE/|estimate|). + Handles edge cases (SE=0, estimate=0). +- **Weight trimming**: `trim_weights(data, weight_col, upper=None, lower=None, + quantile=None)` in `prep.py` for capping extreme survey weights. +- **ImputationDiD pretrends + survey**: pre-trends F-test now survey-aware + using subpopulation approach for correct variance under complex designs. -### 8f. Survey Compatibility Matrix +### 8f. Survey Compatibility Matrix ✅ -**Priority: Medium.** Users discover survey support limits by hitting -`NotImplementedError` at runtime. Add a table to the survey tutorial -or `choosing_estimator.rst` showing which estimator × survey feature -combinations are supported (weights, strata/PSU, FPC, replicate weights, -bootstrap + survey). +**Shipped.** Full compatibility table added to `docs/choosing_estimator.rst` +(Survey Design Support section) showing estimator × survey feature +combinations. Tutorial cross-references this table. ### 8g. Documentation-Only Items -**Priority: Low.** No code changes required: +**Partially addressed.** No code changes required. Remaining items +deferred to the consolidated list below: - **Multi-stage design**: document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. - Don't implement multi-stage — it adds complexity without changing - results for DiD applications. - **Post-stratification / calibration**: document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's - `survey::calibrate()` for weight calibration. This is data prep, - not DiD estimation — out of scope. + `survey::calibrate()` for weight calibration. + +## Deferred Work (Consolidated) + +All items below raise `NotImplementedError` when attempted, with a message +describing the limitation. This is the single source of truth for remaining +survey limitations. + +### Replicate Weights Not Supported + +| Estimator | Reason | +|-----------|--------| +| SyntheticDiD | No published theory on replicate weights + unit weight optimization | +| TROP | No published theory on replicate weights + nuclear norm regularization | +| BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply | + +### EfficientDiD Survey Limitations -### Deferred +| Limitation | Reason | +|-----------|--------| +| `covariates` + `survey_design` | DR nuisance path doesn't thread survey weights | +| `cluster` + `survey_design` | Use `survey_design` with PSU/strata instead | -| Estimator | Capability | Reason | +### Bootstrap + Replicate Weights (Mutual Exclusion) + +Replicate weights and bootstrap are alternative variance estimation methods. +Combining them raises `NotImplementedError`: + +| Estimator | +|-----------| +| CallawaySantAnna | +| ContinuousDiD | +| EfficientDiD | +| StaggeredTripleDifference | + +### Other Limitations + +| Estimator | Limitation | Reason | |-----------|-----------|--------| -| SyntheticDiD | Replicate weights | No published theory on replicate weights + unit weight optimization | -| TROP | Replicate weights | No published theory on replicate weights + nuclear norm regularization | -| BaconDecomposition | Replicate weights | Diagnostic tool with no inference — replicate weights don't apply | -| EfficientDiD | Covariates + survey, cluster + survey, bootstrap + survey | Lower demand, newer estimator; 3 `NotImplementedError` paths | +| SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | +| ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | +| ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | +| (all estimators) | Wild bootstrap + survey weights | Use analytical survey SEs or survey-aware multiplier bootstrap instead | + +### Documentation-Only (Phase 8g) + +- **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed. +- **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration. diff --git a/docs/tutorials/16_survey_did.ipynb b/docs/tutorials/16_survey_did.ipynb index 543e9862..aa1ce756 100644 --- a/docs/tutorials/16_survey_did.ipynb +++ b/docs/tutorials/16_survey_did.ipynb @@ -95,7 +95,7 @@ "**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. ", "Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. ", "The library rescales them internally so that weighted estimators are numerically stable. ", - "This is standard practice (Lumley 2004, \u00a72.2). ", + "This is standard practice (Lumley 2004, §2.2). ", "The warning confirms rescaling occurred; it is not an error." ] }, @@ -356,8 +356,9 @@ "- **JKn** (Jackknife delete-n): Stratified jackknife, drops one PSU per stratum.\n", "- **BRR** (Balanced Repeated Replication): Halve each stratum, reweight.\n", "- **Fay's BRR**: Modified BRR with a damping factor (0 < rho < 1).\n", + "- **SDR** (Successive Difference Replication): Used by ACS PUMS (80 replicate columns). Variance: V = 4/R × Σ(θᵣ − θ)².\n", "\n", - "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other." + "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other.\n" ] }, { @@ -527,13 +528,61 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only | -- | -- | Rao-Wu rescaled |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note:** `EfficientDiD` does not support `covariates` and `survey_design` simultaneously (the DR nuisance path does not yet thread survey weights). Use `covariates=None` with survey designs.\n\nFor full details, see `docs/survey-roadmap.md`." + "source": [ + "## 9. Which Estimators Support Survey Design?\n", + "\n", + "All estimators accept `survey_design` in `fit()`. Support depth varies — see the\n", + "[Survey Design Support](https://diff-diff.readthedocs.io/en/latest/choosing_estimator.html#survey-design-support)\n", + "table in the Choosing an Estimator guide for the full compatibility matrix.\n", + "\n", + "**Key highlights:**\n", + "- **CallawaySantAnna** has the most complete support: all design elements, replicate weights, analytical and bootstrap SEs, panel and cross-section modes\n", + "- **12 of 15 estimators** support replicate weights (BRR, Fay, JK1, JKn, SDR)\n", + "- **SyntheticDiD** and **TROP** support strata/PSU/FPC via bootstrap only (not placebo/analytical)\n" + ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary\n\n**Key takeaways:**\n\n1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n\n2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n\n3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n\n4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n\n5. **Replicate weights** (JK1, JKn, BRR, Fay) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n\n6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n\n7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n\n8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n\n**Quick reference:**\n\n| Parameter | When to use |\n|-----------|------------|\n| `weights` | Always -- specify the sampling weight column |\n| `strata` | When the survey uses stratified sampling |\n| `psu` | When multi-stage (clustered) sampling is used |\n| `fpc` | When the sampling fraction is non-negligible |\n| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n\n**References:**\n\n- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1)." + "## Summary\n", + "\n", + "**Key takeaways:**\n", + "\n", + "1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n", + "\n", + "2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n", + "\n", + "3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n", + "\n", + "4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n", + "\n", + "5. **Replicate weights** (JK1, JKn, BRR, Fay, SDR) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n", + "\n", + "6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n", + "\n", + "7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n", + "\n", + "8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n", + "\n", + "**Quick reference:**\n", + "\n", + "| Parameter | When to use |\n", + "|-----------|------------|\n", + "| `weights` | Always -- specify the sampling weight column |\n", + "| `strata` | When the survey uses stratified sampling |\n", + "| `psu` | When multi-stage (clustered) sampling is used |\n", + "| `fpc` | When the sampling fraction is non-negligible |\n", + "| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n", + "\n", + "**References:**\n", + "\n", + "- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n", + "- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n", + "- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n", + "- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n", + "- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n", + "- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1).\n" ] } ], @@ -544,4 +593,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} From 8677e76fb856179cbed5a5056fe938c3ac3205c6 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 09:27:58 -0400 Subject: [PATCH 02/12] Address P2 review findings: fix stale notes, deferred IF, missing estimators - Revert efficient DRDID nuisance IF for reg+covariates to deferred status (code and REGISTRY.md still use conservative plug-in IF) - Update phase summary table Notes to reflect resolved bootstrap+survey paths (SA, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, CS) - Add SunAbraham, ImputationDiD, TwoStageDiD to bootstrap+replicate mutual exclusion table in consolidated deferred section Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 008101f4..d329c84e 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -19,11 +19,11 @@ All phases (1-8) are implemented. Remaining deferred items are listed at the bot | Estimator | File | Survey Support | Notes | |-----------|------|----------------|-------| | StackedDiD | `stacked_did.py` | pweight only | Q-weights compose multiplicatively with survey weights; TSL vcov on composed weights; fweight/aweight rejected (composition changes weight semantics) | -| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap+survey deferred | +| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap via Rao-Wu rescaled (Phase 6) | | BaconDecomposition | `bacon.py` | Diagnostic | Weighted cell means, weighted within-transform, weighted group shares; no inference (diagnostic only) | | TripleDifference | `triple_diff.py` | Full | Regression, IPW, and DR methods with weighted OLS/logit + TSL on influence functions | -| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap+survey deferred | -| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap+survey deferred | +| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap via multiplier at PSU (Phase 6) | +| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap via multiplier at PSU (Phase 6) | ### Phase 3 Deferred Work @@ -44,9 +44,9 @@ message pointing to the planned phase or describing the limitation. | Estimator | File | Survey Support | Notes | |-----------|------|----------------|-------| -| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred | -| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred | -| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | +| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap via multiplier at PSU (Phase 6) | +| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap via multiplier at PSU (Phase 6) | +| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR supports covariates (Phase 7a); survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | **Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked @@ -61,7 +61,7 @@ TripleDifference IPW/DR from Phase 3 deferred work. | CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | | CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) | | CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) | -| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | **Resolved** (Phase 7a) | +| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Deferred — code uses conservative plug-in IF (see REGISTRY.md) | ## Implemented (Phase 5): SyntheticDiD + TROP Survey Support @@ -285,6 +285,12 @@ survey limitations. | TROP | No published theory on replicate weights + nuclear norm regularization | | BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply | +### CallawaySantAnna Survey Limitations + +| Limitation | Reason | +|-----------|--------| +| Efficient DRDID nuisance IF for `reg`+covariates | Code uses conservative plug-in IF; efficient correction deferred (see REGISTRY.md) | + ### EfficientDiD Survey Limitations | Limitation | Reason | @@ -302,7 +308,10 @@ Combining them raises `NotImplementedError`: | CallawaySantAnna | | ContinuousDiD | | EfficientDiD | +| ImputationDiD | | StaggeredTripleDifference | +| SunAbraham | +| TwoStageDiD | ### Other Limitations From 38f087baeaa65679ca10eadd45315dd10c50890a Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 09:43:15 -0400 Subject: [PATCH 03/12] Fix P2: clarify error types in consolidated deferred section Some bootstrap+replicate exclusions raise ValueError (not NotImplementedError). Update wording to "raise an error" to accurately reflect the runtime contract. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index d329c84e..f7f18313 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -273,9 +273,10 @@ deferred to the consolidated list below: ## Deferred Work (Consolidated) -All items below raise `NotImplementedError` when attempted, with a message -describing the limitation. This is the single source of truth for remaining -survey limitations. +All items below raise an error when attempted (`NotImplementedError` or +`ValueError` depending on the estimator), with a message describing the +limitation. This is the single source of truth for remaining survey +limitations. ### Replicate Weights Not Supported @@ -301,7 +302,7 @@ survey limitations. ### Bootstrap + Replicate Weights (Mutual Exclusion) Replicate weights and bootstrap are alternative variance estimation methods. -Combining them raises `NotImplementedError`: +Combining them raises `NotImplementedError` or `ValueError`: | Estimator | |-----------| From 7180d94d3c8748a91218affb426b4beef110bde4 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 09:54:39 -0400 Subject: [PATCH 04/12] Address P2/P3: REGISTRY replicate matrix, ROADMAP qualifier, soften wording - Update REGISTRY.md replicate-weight support matrix: CS now supports covariates with replicate weights (IF-based path is covariate-agnostic, shipped in Phase 7a) - Qualify ROADMAP.md: "replicate weights supported for 12 of 15" instead of "across all estimators" - Soften consolidated deferred section from "single source of truth" to "summary of major remaining limitations" with TODO.md cross-reference Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- docs/methodology/REGISTRY.md | 7 ++++--- docs/survey-roadmap.md | 5 +++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 8f8fce1c..582de611 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) - **Study design**: Power analysis tools - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md)) +- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (replicate weights supported for 12 of 15; see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix) - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) --- diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 554acd28..04f1fb50 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2307,9 +2307,10 @@ variance from the distribution of replicate estimates. design structure is fixed and dropped replicates contribute zero to the sum without changing the scale. Survey df uses `n_valid - 1` for t-based inference. -- **Note:** Replicate-weight support matrix: - - **Supported**: CallawaySantAnna (reg/ipw/dr without covariates, no - bootstrap), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), +- **Note:** Replicate-weight support matrix (12 of 15 estimators): + - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates, + no bootstrap; IF-based replicate variance is covariate-agnostic), + ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), TripleDifference (all methods), LinearRegression (OLS path), DifferenceInDifferences (no-absorb via LinearRegression dispatch, absorb via estimator-level refit), MultiPeriodDiD (no-absorb via diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index f7f18313..b25b4f74 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -275,8 +275,9 @@ deferred to the consolidated list below: All items below raise an error when attempted (`NotImplementedError` or `ValueError` depending on the estimator), with a message describing the -limitation. This is the single source of truth for remaining survey -limitations. +limitation. This is a summary of the major remaining survey limitations. +See also `TODO.md` for general tech debt items (e.g., multi-absorb + +survey weights). ### Replicate Weights Not Supported From 1bb37b17e8115e63b3f87755ef2c21774ea7a53d Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:02:49 -0400 Subject: [PATCH 05/12] Address P2/P3: fix REGISTRY replicate list, update stale error message - Replace LinearRegression (internal helper) with StaggeredTripleDifference (public estimator) in REGISTRY.md replicate-weight support matrix - Update wild bootstrap + survey error message to remove stale "planned Phase 5 support" reference Co-Authored-By: Claude Opus 4.6 (1M context) --- diff_diff/survey.py | 4 ++-- docs/methodology/REGISTRY.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/diff_diff/survey.py b/diff_diff/survey.py index c717db39..1ad16dd3 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -1087,8 +1087,8 @@ def _resolve_survey_for_fit(survey_design, data, inference_mode="analytical"): if inference_mode == "wild_bootstrap": raise NotImplementedError( "Wild bootstrap with survey weights is not yet supported. " - "Use inference='analytical' with survey_design, or see " - "docs/survey-roadmap.md for planned Phase 5 support." + "Use inference='analytical' with survey_design, or use " + "survey-aware multiplier bootstrap (n_bootstrap > 0)." ) resolved = survey_design.resolve(data) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 04f1fb50..77d85feb 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2307,11 +2307,11 @@ variance from the distribution of replicate estimates. design structure is fixed and dropped replicates contribute zero to the sum without changing the scale. Survey df uses `n_valid - 1` for t-based inference. -- **Note:** Replicate-weight support matrix (12 of 15 estimators): +- **Note:** Replicate-weight support matrix (12 of 15 public estimators): - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates, no bootstrap; IF-based replicate variance is covariate-agnostic), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), - TripleDifference (all methods), LinearRegression (OLS path), + TripleDifference (all methods), StaggeredTripleDifference (IF-based), DifferenceInDifferences (no-absorb via LinearRegression dispatch, absorb via estimator-level refit), MultiPeriodDiD (no-absorb via `compute_replicate_vcov`, absorb via estimator-level refit), From 51bff5551cab2f38dc9c0e69ba0a8a2760ca79e3 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:11:03 -0400 Subject: [PATCH 06/12] Address P2/P3: tighten ROADMAP wording, add SDR to Phase 6, fix error msg - ROADMAP.md: restructure to say "survey-aware inference across all 15 estimators; replicate weights supported for 12 of 15" - survey-roadmap.md Phase 6: add SDR to replicate method list - survey.py: make wild bootstrap error message generic (not all estimators expose n_bootstrap) Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- diff_diff/survey.py | 3 +-- docs/survey-roadmap.md | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 582de611..ebfd39e0 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) - **Study design**: Power analysis tools - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (replicate weights supported for 12 of 15; see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix) +- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, Taylor linearization, DEFF diagnostics, subpopulation analysis — survey-aware inference across all 15 estimators; replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15 (see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix) - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) --- diff --git a/diff_diff/survey.py b/diff_diff/survey.py index 1ad16dd3..14ae44d0 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -1087,8 +1087,7 @@ def _resolve_survey_for_fit(survey_design, data, inference_mode="analytical"): if inference_mode == "wild_bootstrap": raise NotImplementedError( "Wild bootstrap with survey weights is not yet supported. " - "Use inference='analytical' with survey_design, or use " - "survey-aware multiplier bootstrap (n_bootstrap > 0)." + "Use analytical survey inference (the default) instead." ) resolved = survey_design.resolve(data) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index b25b4f74..ea877689 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -92,7 +92,7 @@ Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies: ### Replicate Weight Variance ✅ (2026-03-26) Re-run WLS for each replicate weight column, compute variance from distribution -of estimates. Supports BRR, Fay's BRR, JK1, JKn methods. +of estimates. Supports BRR, Fay's BRR, JK1, JKn, and SDR methods. JKn requires explicit `replicate_strata` (per-replicate stratum assignment). - `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign - `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate) From 815e0e44adae7523eb25b3b00d1a62d11279457c Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:17:04 -0400 Subject: [PATCH 07/12] Address P2/P3: precise ROADMAP survey wording, fix header and wild bootstrap - ROADMAP.md: distinguish survey weights (all 15) from design-based variance (varies by estimator), carve out BaconDecomposition - survey-roadmap.md: header says "Phases 1-8f implemented" (8g partial) instead of "All phases implemented" - Deferred work: scope wild bootstrap row to DiD/TWFE/MultiPeriod (the estimators that expose inference='wild_bootstrap') Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- docs/survey-roadmap.md | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index ebfd39e0..2bb2a752 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) - **Study design**: Power analysis tools - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, Taylor linearization, DEFF diagnostics, subpopulation analysis — survey-aware inference across all 15 estimators; replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15 (see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix) +- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix. - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) --- diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index ea877689..050bac58 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,7 +1,8 @@ # Survey Data Support Roadmap This document captures the survey data support roadmap for diff-diff. -All phases (1-8) are implemented. Remaining deferred items are listed at the bottom. +Phases 1-8f are implemented. Phase 8g (documentation-only items) is partially +addressed. Remaining deferred items are listed at the bottom. ## Implemented (Phases 1-2) @@ -322,7 +323,7 @@ Combining them raises `NotImplementedError` or `ValueError`: | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | -| (all estimators) | Wild bootstrap + survey weights | Use analytical survey SEs or survey-aware multiplier bootstrap instead | +| DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead | ### Documentation-Only (Phase 8g) From 1b174a63ebbdf14c00633c0842e8841800b56d56 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:27:27 -0400 Subject: [PATCH 08/12] Address P2/P3: fix ROADMAP opening sentence, move Phase 8g out of deferred MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - ROADMAP.md line 11: remove "Taylor linearization, replicate weights integrated across all estimators" — now says "all estimators accept survey weights, with design-based variance varying by estimator" - survey-roadmap.md: move Phase 8g documentation tasks into their own section outside the consolidated runtime-limitations block Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 2 +- docs/survey-roadmap.md | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 2bb2a752..377f8dd8 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,7 +8,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). ## Current Status -diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination: +diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination: - **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 050bac58..c235d3f4 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -325,7 +325,11 @@ Combining them raises `NotImplementedError` or `ValueError`: | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead | -### Documentation-Only (Phase 8g) +--- + +## Remaining Documentation Tasks (Phase 8g) + +These are documentation improvements, not runtime limitations: - **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed. - **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration. From 92df37c1c1234227a2f0b39002bbe6592ae7e1ac Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:36:40 -0400 Subject: [PATCH 09/12] Address P3: separate documented deviations from runtime limitations Move CallawaySantAnna conservative plug-in IF entry into its own "Documented Deviations" subsection (supported path, not an error). Runtime limitations intro now accurately describes only error-raising items. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index c235d3f4..1d21f5d8 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -274,11 +274,21 @@ deferred to the consolidated list below: ## Deferred Work (Consolidated) +### Documented Deviations + +These are supported paths that use a conservative or simplified approach +rather than the theoretically optimal one. They do not raise errors. + +| Estimator | Deviation | Details | +|-----------|-----------|---------| +| CallawaySantAnna | `reg`+covariates uses conservative plug-in IF | Efficient DRDID nuisance IF correction deferred; see REGISTRY.md | + +### Runtime Limitations + All items below raise an error when attempted (`NotImplementedError` or `ValueError` depending on the estimator), with a message describing the -limitation. This is a summary of the major remaining survey limitations. -See also `TODO.md` for general tech debt items (e.g., multi-absorb + -survey weights). +limitation. See also `TODO.md` for general tech debt items (e.g., +multi-absorb + survey weights). ### Replicate Weights Not Supported @@ -288,12 +298,6 @@ survey weights). | TROP | No published theory on replicate weights + nuclear norm regularization | | BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply | -### CallawaySantAnna Survey Limitations - -| Limitation | Reason | -|-----------|--------| -| Efficient DRDID nuisance IF for `reg`+covariates | Code uses conservative plug-in IF; efficient correction deferred (see REGISTRY.md) | - ### EfficientDiD Survey Limitations | Limitation | Reason | From f54dea835ccebdcbae7642e7f59369e13d458317 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:46:45 -0400 Subject: [PATCH 10/12] Address P3: MultiPeriodDiD wild bootstrap warns, not errors Split wild bootstrap row: DiD/TWFE raise NotImplementedError, MultiPeriodDiD warns and falls back to analytical inference. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 1d21f5d8..9c2419f3 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -327,7 +327,8 @@ Combining them raises `NotImplementedError` or `ValueError`: | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | -| DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead | +| DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead | +| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference (no error raised) | --- From 2168e2c72d275a0845db3106651c2f7f3a6d4703 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 10:59:27 -0400 Subject: [PATCH 11/12] Address P3: move MultiPeriodDiD fallback out of error section MultiPeriodDiD wild bootstrap warns and falls back rather than raising. Move it into its own "Warning/Fallback Behaviors" subsection outside the runtime-error block. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 9c2419f3..2b1a193f 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -328,7 +328,14 @@ Combining them raises `NotImplementedError` or `ValueError`: | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | | DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead | -| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference (no error raised) | + +### Warning/Fallback Behaviors + +These do not raise errors but silently change behavior: + +| Estimator | Limitation | Behavior | +|-----------|-----------|----------| +| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference | --- From 2fee9b419ed16626e71fdae9c327a11347d2969b Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 4 Apr 2026 11:15:42 -0400 Subject: [PATCH 12/12] Address P3: update 7c blurb to reflect tutorial cross-reference Tutorial Section 9 now links to the compatibility matrix rather than containing the table itself. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/survey-roadmap.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 2b1a193f..56311734 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -168,8 +168,8 @@ framed around a state-level preventive care program evaluated with a stratified health survey (ACS/BRFSS-like). Covers: why survey design matters, SurveyDesign setup, basic DiD with survey, staggered DiD (CallawaySantAnna) with survey, replicate weights (JK1), subpopulation -analysis, DEFF diagnostics, repeated cross-sections, and estimator -support reference table. Uses `generate_survey_did_data()` DGP function +analysis, DEFF diagnostics, repeated cross-sections, and a link to the +compatibility matrix in `choosing_estimator.rst`. Uses `generate_survey_did_data()` DGP function added to `diff_diff.prep`. ### 7d. HonestDiD with Survey Variance ✅