diff --git a/ROADMAP.md b/ROADMAP.md index 058f2ba1..377f8dd8 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,15 +8,15 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md). ## Current Status -diff-diff v2.7.5 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination: +diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination: -- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) +- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024) - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022) - **Study design**: Power analysis tools - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs -- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md)) +- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix. - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks) --- @@ -34,19 +34,20 @@ full details. - **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020, Section 4). Supports BRFSS, ACS annual, CPS monthly. -- **Survey-Aware DiD Tutorial** *(Open)*: Jupyter notebook demonstrating +- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating the full workflow with realistic survey data. - **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full event-study VCV propagated to sensitivity analysis, with bootstrap/replicate diagonal fallback. -### Staggered Triple Difference (DDD) +### Staggered Triple Difference (DDD) *(Implemented)* -Extend the existing `TripleDifference` estimator to handle staggered adoption settings. +`StaggeredTripleDifference` estimator for staggered adoption DDD settings. - Group-time ATT(g,t) for DDD designs with variation in treatment timing - Event study aggregation and pre-treatment placebo effects - Multiplier bootstrap for valid inference in staggered settings +- Full survey support (pweight, strata/PSU/FPC, replicate weights) **Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`. diff --git a/TODO.md b/TODO.md index 9c1aad97..748a2dd8 100644 --- a/TODO.md +++ b/TODO.md @@ -15,6 +15,10 @@ Current limitations that may affect users: | MultiPeriodDiD wild bootstrap not supported | `estimators.py:778-784` | Low | Edge case | | `predict()` raises NotImplementedError | `estimators.py:567-588` | Low | Rarely needed | +For survey-specific limitations (NotImplementedError paths), see the +[consolidated deferred list](docs/survey-roadmap.md#deferred-work-consolidated) +in survey-roadmap.md. + ## Code Quality ### Large Module Files diff --git a/diff_diff/survey.py b/diff_diff/survey.py index c717db39..14ae44d0 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -1087,8 +1087,7 @@ def _resolve_survey_for_fit(survey_design, data, inference_mode="analytical"): if inference_mode == "wild_bootstrap": raise NotImplementedError( "Wild bootstrap with survey weights is not yet supported. " - "Use inference='analytical' with survey_design, or see " - "docs/survey-roadmap.md for planned Phase 5 support." + "Use analytical survey inference (the default) instead." ) resolved = survey_design.resolve(data) diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index f0c49261..34bbf612 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -571,5 +571,124 @@ If you're unsure which estimator to use: investigate why (often reveals violations of assumptions) 5. **Using survey data?** - Pass a ``SurveyDesign`` to ``fit()`` for design-based - variance estimation. See the `survey tutorial `_ - for a full walkthrough with strata, PSU, FPC, replicate weights, and subpopulation analysis. + variance estimation. See the :ref:`survey-design-support` section below for + the compatibility matrix, and the `survey tutorial `_ + for a full walkthrough. + +.. _survey-design-support: + +Survey Design Support +--------------------- + +All estimators accept an optional ``survey_design`` parameter in ``fit()``. +Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance +estimation. The depth of support varies by estimator: + +.. list-table:: + :header-rows: 1 + :widths: 25 12 18 18 18 + + * - Estimator + - Weights + - Strata/PSU/FPC + - Replicate Weights + - Survey Bootstrap + * - ``DifferenceInDifferences`` + - Full + - Full + - Full + - -- + * - ``TwoWayFixedEffects`` + - Full + - Full + - Full + - -- + * - ``MultiPeriodDiD`` + - Full + - Full + - Full + - -- + * - ``CallawaySantAnna`` + - pweight only + - Full + - Full + - Multiplier at PSU + * - ``TripleDifference`` + - pweight only + - Full + - Full (analytical) + - -- + * - ``StaggeredTripleDifference`` + - pweight only + - Full + - Full + - Multiplier at PSU + * - ``SunAbraham`` + - Full + - Full + - Full + - Rao-Wu rescaled + * - ``StackedDiD`` + - pweight only + - Full (pweight only) + - Full + - -- + * - ``ImputationDiD`` + - pweight only + - Full + - Full (analytical) + - Multiplier at PSU + * - ``TwoStageDiD`` + - pweight only + - Full + - Full (analytical) + - Multiplier at PSU + * - ``ContinuousDiD`` + - Full + - Full + - Full (analytical) + - Multiplier at PSU + * - ``EfficientDiD`` + - Full + - Full + - Full (analytical) + - Multiplier at PSU + * - ``SyntheticDiD`` + - pweight only + - Via bootstrap + - -- + - Rao-Wu rescaled + * - ``TROP`` + - pweight only + - Via bootstrap + - -- + - Rao-Wu rescaled + * - ``BaconDecomposition`` + - Diagnostic + - Diagnostic + - -- + - -- + +**Legend:** + +- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance +- **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics) +- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``SyntheticDiD`` requires ``variance_method='bootstrap'``; ``TROP`` uses bootstrap by default. ``SyntheticDiD`` placebo does not support strata/PSU/FPC. +- **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error +- **Diagnostic**: Weighted descriptive statistics only (no inference) +- **--**: Not supported + +.. note:: + + ``EfficientDiD`` does not support ``covariates`` and ``survey_design`` + simultaneously (the DR nuisance path does not yet thread survey weights). + +.. note:: + + ``SyntheticDiD`` with ``variance_method='placebo'`` does not support + strata/PSU/FPC. Use ``variance_method='bootstrap'`` for full survey + design support. + +For the full walkthrough with code examples, see the +`survey tutorial `_. +For deferred work and remaining limitations, see ``docs/survey-roadmap.md``. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 554acd28..77d85feb 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2307,10 +2307,11 @@ variance from the distribution of replicate estimates. design structure is fixed and dropped replicates contribute zero to the sum without changing the scale. Survey df uses `n_valid - 1` for t-based inference. -- **Note:** Replicate-weight support matrix: - - **Supported**: CallawaySantAnna (reg/ipw/dr without covariates, no - bootstrap), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), - TripleDifference (all methods), LinearRegression (OLS path), +- **Note:** Replicate-weight support matrix (12 of 15 public estimators): + - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates, + no bootstrap; IF-based replicate variance is covariate-agnostic), + ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), + TripleDifference (all methods), StaggeredTripleDifference (IF-based), DifferenceInDifferences (no-absorb via LinearRegression dispatch, absorb via estimator-level refit), MultiPeriodDiD (no-absorb via `compute_replicate_vcov`, absorb via estimator-level refit), diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index 1f1a9efe..56311734 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,7 +1,8 @@ # Survey Data Support Roadmap This document captures the survey data support roadmap for diff-diff. -Phases 1-7 are implemented. Phase 8 (maturity refinements) is planned. +Phases 1-8f are implemented. Phase 8g (documentation-only items) is partially +addressed. Remaining deferred items are listed at the bottom. ## Implemented (Phases 1-2) @@ -19,11 +20,11 @@ Phases 1-7 are implemented. Phase 8 (maturity refinements) is planned. | Estimator | File | Survey Support | Notes | |-----------|------|----------------|-------| | StackedDiD | `stacked_did.py` | pweight only | Q-weights compose multiplicatively with survey weights; TSL vcov on composed weights; fweight/aweight rejected (composition changes weight semantics) | -| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap+survey deferred | +| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap via Rao-Wu rescaled (Phase 6) | | BaconDecomposition | `bacon.py` | Diagnostic | Weighted cell means, weighted within-transform, weighted group shares; no inference (diagnostic only) | | TripleDifference | `triple_diff.py` | Full | Regression, IPW, and DR methods with weighted OLS/logit + TSL on influence functions | -| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap+survey deferred | -| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap+survey deferred | +| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap via multiplier at PSU (Phase 6) | +| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap via multiplier at PSU (Phase 6) | ### Phase 3 Deferred Work @@ -32,9 +33,9 @@ Phase 5 infrastructure (bootstrap+survey interaction): | Estimator | Deferred Capability | Blocker | |-----------|-------------------|---------| -| SunAbraham | Pairs bootstrap + survey | Phase 5: bootstrap+survey interaction | -| ContinuousDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction | -| EfficientDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction | +| SunAbraham | Pairs bootstrap + survey | **Resolved** (Phase 6, Rao-Wu rescaled) | +| ContinuousDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| EfficientDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | | EfficientDiD | Covariates (DR path) + survey | DR nuisance estimation needs survey weight threading | All blocked combinations raise `NotImplementedError` when attempted, with a @@ -44,9 +45,9 @@ message pointing to the planned phase or describing the limitation. | Estimator | File | Survey Support | Notes | |-----------|------|----------------|-------| -| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred | -| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred | -| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | +| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap via multiplier at PSU (Phase 6) | +| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap via multiplier at PSU (Phase 6) | +| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR supports covariates (Phase 7a); survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | **Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked @@ -56,12 +57,12 @@ TripleDifference IPW/DR from Phase 3 deferred work. | Estimator | Deferred Capability | Blocker | |-----------|-------------------|---------| -| ImputationDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| TwoStageDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| CallawaySantAnna | Bootstrap + survey | Phase 5: bootstrap+survey interaction | -| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | Phase 5: route combined IF/WIF through `compute_survey_vcov()` for design-based aggregation SEs | -| CallawaySantAnna | Covariates + IPW/DR + survey | Phase 5: DRDID panel nuisance IF corrections | -| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Phase 5: replace conservative plug-in IF with semiparametrically efficient IF | +| ImputationDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| TwoStageDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) | +| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) | +| CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) | +| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Deferred — code uses conservative plug-in IF (see REGISTRY.md) | ## Implemented (Phase 5): SyntheticDiD + TROP Survey Support @@ -92,7 +93,7 @@ Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies: ### Replicate Weight Variance ✅ (2026-03-26) Re-run WLS for each replicate weight column, compute variance from distribution -of estimates. Supports BRR, Fay's BRR, JK1, JKn methods. +of estimates. Supports BRR, Fay's BRR, JK1, JKn, and SDR methods. JKn requires explicit `replicate_strata` (per-replicate stratum assignment). - `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign - `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate) @@ -100,12 +101,13 @@ JKn requires explicit `replicate_strata` (per-replicate stratum assignment). - Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py` - Replicate weights mutually exclusive with strata/PSU/FPC - Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()` -- **Limitations**: Supported in CallawaySantAnna, ContinuousDiD, EfficientDiD, - TripleDifference (analytical only, no bootstrap). Rejected with - `NotImplementedError` in DifferenceInDifferences, TwoWayFixedEffects, - MultiPeriodDiD, StackedDiD, SunAbraham, ImputationDiD, TwoStageDiD, - SyntheticDiD, TROP. Expansion to regression-based estimators (SA, - Imputation, TwoStage, Stacked) is straightforward but deferred. +- **Coverage**: 12 of 15 estimators support replicate weights. + Supported: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD, + CallawaySantAnna, TripleDifference (analytical only), StaggeredTripleDifference, + SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD. + Rejected: SyntheticDiD, TROP (no published theory on replicate weights + + unit weight optimization / nuclear norm regularization), BaconDecomposition + (diagnostic tool, no inference). ### DEFF Diagnostics ✅ (2026-03-26) Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov. @@ -166,8 +168,8 @@ framed around a state-level preventive care program evaluated with a stratified health survey (ACS/BRFSS-like). Covers: why survey design matters, SurveyDesign setup, basic DiD with survey, staggered DiD (CallawaySantAnna) with survey, replicate weights (JK1), subpopulation -analysis, DEFF diagnostics, repeated cross-sections, and estimator -support reference table. Uses `generate_survey_did_data()` DGP function +analysis, DEFF diagnostics, repeated cross-sections, and a link to the +compatibility matrix in `choosing_estimator.rst`. Uses `generate_survey_did_data()` DGP function added to `diff_diff.prep`. ### 7d. HonestDiD with Survey Variance ✅ @@ -210,93 +212,136 @@ variance estimation for staggered triple differences. Refinements to close remaining gaps versus R's `survey` package and improve practitioner experience. Prioritized by user impact. -### 8a. Successive Difference Replication (SDR) +### 8a. Successive Difference Replication (SDR) ✅ -**Priority: High.** ACS PUMS — the most common US survey dataset for DiD -policy evaluation — provides 80 SDR replicate weight columns. Without SDR -support, these users can't use their provided replicate weights directly. - -**What's needed:** -- Add `"SDR"` to `valid_rep_methods` in `SurveyDesign` -- Variance formula: `V = 4/R * sum((theta_r - theta)^2)` — a scaling - difference from BRR, not a new algorithm -- Wire through `compute_replicate_vcov()` and `compute_replicate_if_variance()` +**Shipped in v2.8.4.** ACS PUMS — the most common US survey dataset for DiD +policy evaluation — provides 80 SDR replicate weight columns. +`SurveyDesign(replicate_method="SDR")` with variance formula +`V = 4/R * sum((theta_r - theta)^2)`. **Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and Model-Based Postcensal Estimation of Income and Poverty Characteristics for States and Counties." ASA Proceedings. -### 8b. FPC in ImputationDiD and TwoStageDiD - -**Priority: High.** Both estimators now support replicate weights and TSL -with strata/PSU, but reject FPC outright (`NotImplementedError`). Adding -FPC is incremental — thread `fpc` through the existing TSL variance path. -Matters for finite population surveys (common in state-level sampling). +### 8b. FPC in ImputationDiD and TwoStageDiD ✅ -**Current gate:** `imputation.py:280`, `two_stage.py:268` +**Shipped in v2.8.4.** Both estimators now have full strata/PSU/FPC +support. FPC is threaded through the existing TSL variance path. -### 8c. Silent Operation Warnings +### 8c. Silent Operation Warnings ✅ -**Priority: High.** Add `UserWarning` emissions for operations that -silently alter analysis results: -- TROP lstsq → pseudo-inverse numerical fallback -- TwoStageDiD NaN masking of unidentified fixed effects -- TwoStageDiD always-treated unit removal -- CallawaySantAnna silent (g,t) pair skipping -- TROP missing treatment indicator fill with 0 -- Rust → Python backend fallback (currently debug log only) -- Survey weight normalization (pweights rescaled to mean=1) -- `np.inf` → 0 never-treated conversion +**Shipped in v2.8.3.** Eight operations that previously altered analysis +results without informing the user now emit `UserWarning`: +TROP lstsq fallback, TwoStageDiD NaN masking, TwoStageDiD always-treated +removal, CallawaySantAnna (g,t) pair skipping, TROP treatment indicator +fill, Rust → Python fallback, survey weight normalization, `np.inf` → 0 +never-treated conversion. -### 8d. Lonely PSU "adjust" in Bootstrap +### 8d. Lonely PSU "adjust" in Bootstrap ✅ -**Priority: Medium.** `lonely_psu="adjust"` works for analytical (TSL) -variance but raises `NotImplementedError` for survey-aware bootstrap -(2 raises in `bootstrap_utils.py`). Real survey data regularly has -singleton strata. Users needing bootstrap inference with such data hit -a wall. +**Shipped in v2.8.4.** `lonely_psu="adjust"` now works with survey-aware +bootstrap using Rust & Rao (1996) grand-mean centering. **Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation for Complex Surveys Using Replication Techniques." Statistical Methods in Medical Research 5(3). -### 8e. Survey Diagnostics and Utilities +### 8e. Survey Diagnostics and Utilities ✅ -**Priority: Medium.** Small additions that signal maturity to survey -statisticians: -- **CV on estimates**: coefficient of variation (SE/estimate) on results - objects — trivial to add, used by federal agencies for publication - standards (NCHS requires CV < 30% for releasable estimates) -- **Weight trimming**: `trim_weights(data, weight_col, upper=None, - quantile=None)` utility in `prep.py` for capping extreme weights -- **ImputationDiD pretrends + survey**: pre-trends F-test currently - ignores survey variance (`NotImplementedError` at `imputation.py:240`) +**Shipped in v2.8.4.** +- **CV on estimates**: `coef_var` property on all results objects (SE/|estimate|). + Handles edge cases (SE=0, estimate=0). +- **Weight trimming**: `trim_weights(data, weight_col, upper=None, lower=None, + quantile=None)` in `prep.py` for capping extreme survey weights. +- **ImputationDiD pretrends + survey**: pre-trends F-test now survey-aware + using subpopulation approach for correct variance under complex designs. -### 8f. Survey Compatibility Matrix +### 8f. Survey Compatibility Matrix ✅ -**Priority: Medium.** Users discover survey support limits by hitting -`NotImplementedError` at runtime. Add a table to the survey tutorial -or `choosing_estimator.rst` showing which estimator × survey feature -combinations are supported (weights, strata/PSU, FPC, replicate weights, -bootstrap + survey). +**Shipped.** Full compatibility table added to `docs/choosing_estimator.rst` +(Survey Design Support section) showing estimator × survey feature +combinations. Tutorial cross-references this table. ### 8g. Documentation-Only Items -**Priority: Low.** No code changes required: +**Partially addressed.** No code changes required. Remaining items +deferred to the consolidated list below: - **Multi-stage design**: document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. - Don't implement multi-stage — it adds complexity without changing - results for DiD applications. - **Post-stratification / calibration**: document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's - `survey::calibrate()` for weight calibration. This is data prep, - not DiD estimation — out of scope. + `survey::calibrate()` for weight calibration. + +## Deferred Work (Consolidated) + +### Documented Deviations + +These are supported paths that use a conservative or simplified approach +rather than the theoretically optimal one. They do not raise errors. + +| Estimator | Deviation | Details | +|-----------|-----------|---------| +| CallawaySantAnna | `reg`+covariates uses conservative plug-in IF | Efficient DRDID nuisance IF correction deferred; see REGISTRY.md | + +### Runtime Limitations + +All items below raise an error when attempted (`NotImplementedError` or +`ValueError` depending on the estimator), with a message describing the +limitation. See also `TODO.md` for general tech debt items (e.g., +multi-absorb + survey weights). + +### Replicate Weights Not Supported + +| Estimator | Reason | +|-----------|--------| +| SyntheticDiD | No published theory on replicate weights + unit weight optimization | +| TROP | No published theory on replicate weights + nuclear norm regularization | +| BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply | + +### EfficientDiD Survey Limitations + +| Limitation | Reason | +|-----------|--------| +| `covariates` + `survey_design` | DR nuisance path doesn't thread survey weights | +| `cluster` + `survey_design` | Use `survey_design` with PSU/strata instead | -### Deferred +### Bootstrap + Replicate Weights (Mutual Exclusion) -| Estimator | Capability | Reason | +Replicate weights and bootstrap are alternative variance estimation methods. +Combining them raises `NotImplementedError` or `ValueError`: + +| Estimator | +|-----------| +| CallawaySantAnna | +| ContinuousDiD | +| EfficientDiD | +| ImputationDiD | +| StaggeredTripleDifference | +| SunAbraham | +| TwoStageDiD | + +### Other Limitations + +| Estimator | Limitation | Reason | |-----------|-----------|--------| -| SyntheticDiD | Replicate weights | No published theory on replicate weights + unit weight optimization | -| TROP | Replicate weights | No published theory on replicate weights + nuclear norm regularization | -| BaconDecomposition | Replicate weights | Diagnostic tool with no inference — replicate weights don't apply | -| EfficientDiD | Covariates + survey, cluster + survey, bootstrap + survey | Lower demand, newer estimator; 3 `NotImplementedError` paths | +| SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` | +| ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented | +| ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented | +| DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead | + +### Warning/Fallback Behaviors + +These do not raise errors but silently change behavior: + +| Estimator | Limitation | Behavior | +|-----------|-----------|----------| +| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference | + +--- + +## Remaining Documentation Tasks (Phase 8g) + +These are documentation improvements, not runtime limitations: + +- **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed. +- **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration. diff --git a/docs/tutorials/16_survey_did.ipynb b/docs/tutorials/16_survey_did.ipynb index 543e9862..aa1ce756 100644 --- a/docs/tutorials/16_survey_did.ipynb +++ b/docs/tutorials/16_survey_did.ipynb @@ -95,7 +95,7 @@ "**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. ", "Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. ", "The library rescales them internally so that weighted estimators are numerically stable. ", - "This is standard practice (Lumley 2004, \u00a72.2). ", + "This is standard practice (Lumley 2004, §2.2). ", "The warning confirms rescaling occurred; it is not an error." ] }, @@ -356,8 +356,9 @@ "- **JKn** (Jackknife delete-n): Stratified jackknife, drops one PSU per stratum.\n", "- **BRR** (Balanced Repeated Replication): Halve each stratum, reweight.\n", "- **Fay's BRR**: Modified BRR with a damping factor (0 < rho < 1).\n", + "- **SDR** (Successive Difference Replication): Used by ACS PUMS (80 replicate columns). Variance: V = 4/R × Σ(θᵣ − θ)².\n", "\n", - "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other." + "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other.\n" ] }, { @@ -527,13 +528,61 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only | -- | -- | Rao-Wu rescaled |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note:** `EfficientDiD` does not support `covariates` and `survey_design` simultaneously (the DR nuisance path does not yet thread survey weights). Use `covariates=None` with survey designs.\n\nFor full details, see `docs/survey-roadmap.md`." + "source": [ + "## 9. Which Estimators Support Survey Design?\n", + "\n", + "All estimators accept `survey_design` in `fit()`. Support depth varies — see the\n", + "[Survey Design Support](https://diff-diff.readthedocs.io/en/latest/choosing_estimator.html#survey-design-support)\n", + "table in the Choosing an Estimator guide for the full compatibility matrix.\n", + "\n", + "**Key highlights:**\n", + "- **CallawaySantAnna** has the most complete support: all design elements, replicate weights, analytical and bootstrap SEs, panel and cross-section modes\n", + "- **12 of 15 estimators** support replicate weights (BRR, Fay, JK1, JKn, SDR)\n", + "- **SyntheticDiD** and **TROP** support strata/PSU/FPC via bootstrap only (not placebo/analytical)\n" + ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Summary\n\n**Key takeaways:**\n\n1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n\n2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n\n3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n\n4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n\n5. **Replicate weights** (JK1, JKn, BRR, Fay) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n\n6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n\n7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n\n8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n\n**Quick reference:**\n\n| Parameter | When to use |\n|-----------|------------|\n| `weights` | Always -- specify the sampling weight column |\n| `strata` | When the survey uses stratified sampling |\n| `psu` | When multi-stage (clustered) sampling is used |\n| `fpc` | When the sampling fraction is non-negligible |\n| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n\n**References:**\n\n- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1)." + "## Summary\n", + "\n", + "**Key takeaways:**\n", + "\n", + "1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n", + "\n", + "2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n", + "\n", + "3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n", + "\n", + "4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n", + "\n", + "5. **Replicate weights** (JK1, JKn, BRR, Fay, SDR) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n", + "\n", + "6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n", + "\n", + "7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n", + "\n", + "8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n", + "\n", + "**Quick reference:**\n", + "\n", + "| Parameter | When to use |\n", + "|-----------|------------|\n", + "| `weights` | Always -- specify the sampling weight column |\n", + "| `strata` | When the survey uses stratified sampling |\n", + "| `psu` | When multi-stage (clustered) sampling is used |\n", + "| `fpc` | When the sampling fraction is non-negligible |\n", + "| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n", + "\n", + "**References:**\n", + "\n", + "- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n", + "- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n", + "- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n", + "- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n", + "- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n", + "- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1).\n" ] } ], @@ -544,4 +593,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +}