From 1983a50c2ab4fc81c6cd07f664e96377779fce2c Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 09:08:52 -0400
Subject: [PATCH 01/12] Update survey documentation: compatibility matrix,
 roadmap, deferred work

Add survey compatibility matrix to choosing_estimator.rst (Phase 8f),
fix 11 stale entries in the tutorial table and replace with cross-reference,
mark Phase 8a-8e as shipped in survey-roadmap.md, consolidate all remaining
NotImplementedError paths into a single deferred work section, add SDR to
replicate method lists, and update ROADMAP.md version/status entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                         |  13 ++-
 TODO.md                            |   4 +
 docs/choosing_estimator.rst        | 123 ++++++++++++++++++++-
 docs/survey-roadmap.md             | 171 ++++++++++++++++-------------
 docs/tutorials/16_survey_did.ipynb |  59 +++++++++-
 5 files changed, 280 insertions(+), 90 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 058f2ba1..8f8fce1c 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8,15 +8,15 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 ## Current Status
 
-diff-diff v2.7.5 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination:
+diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination:
 
-- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
+- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
 - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
 - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
 - **Study design**: Power analysis tools
 - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
-- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md))
+- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md))
 - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
 
 ---
@@ -34,19 +34,20 @@ full details.
 - **Repeated Cross-Sections** *(Implemented)*: `panel=False` support for
   CallawaySantAnna using cross-sectional DRDID (Sant'Anna & Zhao 2020,
   Section 4). Supports BRFSS, ACS annual, CPS monthly.
-- **Survey-Aware DiD Tutorial** *(Open)*: Jupyter notebook demonstrating
+- **Survey-Aware DiD Tutorial** *(Implemented)*: Jupyter notebook demonstrating
   the full workflow with realistic survey data.
 - **HonestDiD + Survey Variance** *(Implemented)*: Survey df and full
   event-study VCV propagated to sensitivity analysis, with bootstrap/replicate
   diagonal fallback.
 
-### Staggered Triple Difference (DDD)
+### Staggered Triple Difference (DDD) *(Implemented)*
 
-Extend the existing `TripleDifference` estimator to handle staggered adoption settings.
+`StaggeredTripleDifference` estimator for staggered adoption DDD settings.
 
 - Group-time ATT(g,t) for DDD designs with variation in treatment timing
 - Event study aggregation and pre-treatment placebo effects
 - Multiplier bootstrap for valid inference in staggered settings
+- Full survey support (pweight, strata/PSU/FPC, replicate weights)
 
 **Reference**: [Ortiz-Villavicencio & Sant'Anna (2025)](https://arxiv.org/abs/2505.09942). "Better Understanding Triple Differences Estimators." *Working Paper*. R package: `triplediff`.
 
diff --git a/TODO.md b/TODO.md
index 9c1aad97..748a2dd8 100644
--- a/TODO.md
+++ b/TODO.md
@@ -15,6 +15,10 @@ Current limitations that may affect users:
 | MultiPeriodDiD wild bootstrap not supported | `estimators.py:778-784` | Low | Edge case |
 | `predict()` raises NotImplementedError | `estimators.py:567-588` | Low | Rarely needed |
 
+For survey-specific limitations (NotImplementedError paths), see the
+[consolidated deferred list](docs/survey-roadmap.md#deferred-work-consolidated)
+in survey-roadmap.md.
+
 ## Code Quality
 
 ### Large Module Files
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index f0c49261..34bbf612 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -571,5 +571,124 @@ If you're unsure which estimator to use:
    investigate why (often reveals violations of assumptions)
 
 5. **Using survey data?** - Pass a ``SurveyDesign`` to ``fit()`` for design-based
-   variance estimation. See the `survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_
-   for a full walkthrough with strata, PSU, FPC, replicate weights, and subpopulation analysis.
+   variance estimation. See the :ref:`survey-design-support` section below for
+   the compatibility matrix, and the `survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_
+   for a full walkthrough.
+
+.. _survey-design-support:
+
+Survey Design Support
+---------------------
+
+All estimators accept an optional ``survey_design`` parameter in ``fit()``.
+Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
+estimation. The depth of support varies by estimator:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 25 12 18 18 18
+
+   * - Estimator
+     - Weights
+     - Strata/PSU/FPC
+     - Replicate Weights
+     - Survey Bootstrap
+   * - ``DifferenceInDifferences``
+     - Full
+     - Full
+     - Full
+     - --
+   * - ``TwoWayFixedEffects``
+     - Full
+     - Full
+     - Full
+     - --
+   * - ``MultiPeriodDiD``
+     - Full
+     - Full
+     - Full
+     - --
+   * - ``CallawaySantAnna``
+     - pweight only
+     - Full
+     - Full
+     - Multiplier at PSU
+   * - ``TripleDifference``
+     - pweight only
+     - Full
+     - Full (analytical)
+     - --
+   * - ``StaggeredTripleDifference``
+     - pweight only
+     - Full
+     - Full
+     - Multiplier at PSU
+   * - ``SunAbraham``
+     - Full
+     - Full
+     - Full
+     - Rao-Wu rescaled
+   * - ``StackedDiD``
+     - pweight only
+     - Full (pweight only)
+     - Full
+     - --
+   * - ``ImputationDiD``
+     - pweight only
+     - Full
+     - Full (analytical)
+     - Multiplier at PSU
+   * - ``TwoStageDiD``
+     - pweight only
+     - Full
+     - Full (analytical)
+     - Multiplier at PSU
+   * - ``ContinuousDiD``
+     - Full
+     - Full
+     - Full (analytical)
+     - Multiplier at PSU
+   * - ``EfficientDiD``
+     - Full
+     - Full
+     - Full (analytical)
+     - Multiplier at PSU
+   * - ``SyntheticDiD``
+     - pweight only
+     - Via bootstrap
+     - --
+     - Rao-Wu rescaled
+   * - ``TROP``
+     - pweight only
+     - Via bootstrap
+     - --
+     - Rao-Wu rescaled
+   * - ``BaconDecomposition``
+     - Diagnostic
+     - Diagnostic
+     - --
+     - --
+
+**Legend:**
+
+- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
+- **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics)
+- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``SyntheticDiD`` requires ``variance_method='bootstrap'``; ``TROP`` uses bootstrap by default. ``SyntheticDiD`` placebo does not support strata/PSU/FPC.
+- **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error
+- **Diagnostic**: Weighted descriptive statistics only (no inference)
+- **--**: Not supported
+
+.. note::
+
+   ``EfficientDiD`` does not support ``covariates`` and ``survey_design``
+   simultaneously (the DR nuisance path does not yet thread survey weights).
+
+.. note::
+
+   ``SyntheticDiD`` with ``variance_method='placebo'`` does not support
+   strata/PSU/FPC. Use ``variance_method='bootstrap'`` for full survey
+   design support.
+
+For the full walkthrough with code examples, see the
+`survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_.
+For deferred work and remaining limitations, see ``docs/survey-roadmap.md``.
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 1f1a9efe..008101f4 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -1,7 +1,7 @@
 # Survey Data Support Roadmap
 
 This document captures the survey data support roadmap for diff-diff.
-Phases 1-7 are implemented. Phase 8 (maturity refinements) is planned.
+All phases (1-8) are implemented. Remaining deferred items are listed at the bottom.
 
 ## Implemented (Phases 1-2)
 
@@ -32,9 +32,9 @@ Phase 5 infrastructure (bootstrap+survey interaction):
 
 | Estimator | Deferred Capability | Blocker |
 |-----------|-------------------|---------|
-| SunAbraham | Pairs bootstrap + survey | Phase 5: bootstrap+survey interaction |
-| ContinuousDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction |
-| EfficientDiD | Multiplier bootstrap + survey | Phase 5: bootstrap+survey interaction |
+| SunAbraham | Pairs bootstrap + survey | **Resolved** (Phase 6, Rao-Wu rescaled) |
+| ContinuousDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
+| EfficientDiD | Multiplier bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
 | EfficientDiD | Covariates (DR path) + survey | DR nuisance estimation needs survey weight threading |
 
 All blocked combinations raise `NotImplementedError` when attempted, with a
@@ -56,12 +56,12 @@ TripleDifference IPW/DR from Phase 3 deferred work.
 
 | Estimator | Deferred Capability | Blocker |
 |-----------|-------------------|---------|
-| ImputationDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
-| TwoStageDiD | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
-| CallawaySantAnna | Bootstrap + survey | Phase 5: bootstrap+survey interaction |
-| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | Phase 5: route combined IF/WIF through `compute_survey_vcov()` for design-based aggregation SEs |
-| CallawaySantAnna | Covariates + IPW/DR + survey | Phase 5: DRDID panel nuisance IF corrections |
-| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Phase 5: replace conservative plug-in IF with semiparametrically efficient IF |
+| ImputationDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
+| TwoStageDiD | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
+| CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
+| CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) |
+| CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) |
+| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | **Resolved** (Phase 7a) |
 
 ## Implemented (Phase 5): SyntheticDiD + TROP Survey Support
 
@@ -100,12 +100,13 @@ JKn requires explicit `replicate_strata` (per-replicate stratum assignment).
 - Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py`
 - Replicate weights mutually exclusive with strata/PSU/FPC
 - Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()`
-- **Limitations**: Supported in CallawaySantAnna, ContinuousDiD, EfficientDiD,
-  TripleDifference (analytical only, no bootstrap). Rejected with
-  `NotImplementedError` in DifferenceInDifferences, TwoWayFixedEffects,
-  MultiPeriodDiD, StackedDiD, SunAbraham, ImputationDiD, TwoStageDiD,
-  SyntheticDiD, TROP. Expansion to regression-based estimators (SA,
-  Imputation, TwoStage, Stacked) is straightforward but deferred.
+- **Coverage**: 12 of 15 estimators support replicate weights.
+  Supported: DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD,
+  CallawaySantAnna, TripleDifference (analytical only), StaggeredTripleDifference,
+  SunAbraham, StackedDiD, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD.
+  Rejected: SyntheticDiD, TROP (no published theory on replicate weights +
+  unit weight optimization / nuclear norm regularization), BaconDecomposition
+  (diagnostic tool, no inference).
 
 ### DEFF Diagnostics ✅ (2026-03-26)
 Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov.
@@ -210,93 +211,109 @@ variance estimation for staggered triple differences.
 Refinements to close remaining gaps versus R's `survey` package and improve
 practitioner experience. Prioritized by user impact.
 
-### 8a. Successive Difference Replication (SDR)
+### 8a. Successive Difference Replication (SDR) ✅
 
-**Priority: High.** ACS PUMS — the most common US survey dataset for DiD
-policy evaluation — provides 80 SDR replicate weight columns. Without SDR
-support, these users can't use their provided replicate weights directly.
-
-**What's needed:**
-- Add `"SDR"` to `valid_rep_methods` in `SurveyDesign`
-- Variance formula: `V = 4/R * sum((theta_r - theta)^2)` — a scaling
-  difference from BRR, not a new algorithm
-- Wire through `compute_replicate_vcov()` and `compute_replicate_if_variance()`
+**Shipped in v2.8.4.** ACS PUMS — the most common US survey dataset for DiD
+policy evaluation — provides 80 SDR replicate weight columns.
+`SurveyDesign(replicate_method="SDR")` with variance formula
+`V = 4/R * sum((theta_r - theta)^2)`.
 
 **Reference:** Fay, R.E. & Train, G.F. (1995). "Aspects of Survey and
 Model-Based Postcensal Estimation of Income and Poverty Characteristics
 for States and Counties." ASA Proceedings.
 
-### 8b. FPC in ImputationDiD and TwoStageDiD
-
-**Priority: High.** Both estimators now support replicate weights and TSL
-with strata/PSU, but reject FPC outright (`NotImplementedError`). Adding
-FPC is incremental — thread `fpc` through the existing TSL variance path.
-Matters for finite population surveys (common in state-level sampling).
+### 8b. FPC in ImputationDiD and TwoStageDiD ✅
 
-**Current gate:** `imputation.py:280`, `two_stage.py:268`
+**Shipped in v2.8.4.** Both estimators now have full strata/PSU/FPC
+support. FPC is threaded through the existing TSL variance path.
 
-### 8c. Silent Operation Warnings
+### 8c. Silent Operation Warnings ✅
 
-**Priority: High.** Add `UserWarning` emissions for operations that
-silently alter analysis results:
-- TROP lstsq → pseudo-inverse numerical fallback
-- TwoStageDiD NaN masking of unidentified fixed effects
-- TwoStageDiD always-treated unit removal
-- CallawaySantAnna silent (g,t) pair skipping
-- TROP missing treatment indicator fill with 0
-- Rust → Python backend fallback (currently debug log only)
-- Survey weight normalization (pweights rescaled to mean=1)
-- `np.inf` → 0 never-treated conversion
+**Shipped in v2.8.3.** Eight operations that previously altered analysis
+results without informing the user now emit `UserWarning`:
+TROP lstsq fallback, TwoStageDiD NaN masking, TwoStageDiD always-treated
+removal, CallawaySantAnna (g,t) pair skipping, TROP treatment indicator
+fill, Rust → Python fallback, survey weight normalization, `np.inf` → 0
+never-treated conversion.
 
-### 8d. Lonely PSU "adjust" in Bootstrap
+### 8d. Lonely PSU "adjust" in Bootstrap ✅
 
-**Priority: Medium.** `lonely_psu="adjust"` works for analytical (TSL)
-variance but raises `NotImplementedError` for survey-aware bootstrap
-(2 raises in `bootstrap_utils.py`). Real survey data regularly has
-singleton strata. Users needing bootstrap inference with such data hit
-a wall.
+**Shipped in v2.8.4.** `lonely_psu="adjust"` now works with survey-aware
+bootstrap using Rust & Rao (1996) grand-mean centering.
 
 **Reference:** Rust, K.F. & Rao, J.N.K. (1996). "Variance Estimation
 for Complex Surveys Using Replication Techniques." Statistical Methods
 in Medical Research 5(3).
 
-### 8e. Survey Diagnostics and Utilities
+### 8e. Survey Diagnostics and Utilities ✅
 
-**Priority: Medium.** Small additions that signal maturity to survey
-statisticians:
-- **CV on estimates**: coefficient of variation (SE/estimate) on results
-  objects — trivial to add, used by federal agencies for publication
-  standards (NCHS requires CV < 30% for releasable estimates)
-- **Weight trimming**: `trim_weights(data, weight_col, upper=None,
-  quantile=None)` utility in `prep.py` for capping extreme weights
-- **ImputationDiD pretrends + survey**: pre-trends F-test currently
-  ignores survey variance (`NotImplementedError` at `imputation.py:240`)
+**Shipped in v2.8.4.**
+- **CV on estimates**: `coef_var` property on all results objects (SE/|estimate|).
+  Handles edge cases (SE=0, estimate=0).
+- **Weight trimming**: `trim_weights(data, weight_col, upper=None, lower=None,
+  quantile=None)` in `prep.py` for capping extreme survey weights.
+- **ImputationDiD pretrends + survey**: pre-trends F-test now survey-aware
+  using subpopulation approach for correct variance under complex designs.
 
-### 8f. Survey Compatibility Matrix
+### 8f. Survey Compatibility Matrix ✅
 
-**Priority: Medium.** Users discover survey support limits by hitting
-`NotImplementedError` at runtime. Add a table to the survey tutorial
-or `choosing_estimator.rst` showing which estimator × survey feature
-combinations are supported (weights, strata/PSU, FPC, replicate weights,
-bootstrap + survey).
+**Shipped.** Full compatibility table added to `docs/choosing_estimator.rst`
+(Survey Design Support section) showing estimator × survey feature
+combinations. Tutorial cross-references this table.
 
 ### 8g. Documentation-Only Items
 
-**Priority: Low.** No code changes required:
+**Partially addressed.** No code changes required. Remaining items
+deferred to the consolidated list below:
 - **Multi-stage design**: document that single-stage (strata + PSU)
   is sufficient for variance estimation per Lumley (2004) Section 2.2.
-  Don't implement multi-stage — it adds complexity without changing
-  results for DiD applications.
 - **Post-stratification / calibration**: document that `SurveyDesign`
   expects pre-calibrated weights. Point users to `samplics` or R's
-  `survey::calibrate()` for weight calibration. This is data prep,
-  not DiD estimation — out of scope.
+  `survey::calibrate()` for weight calibration.
+
+## Deferred Work (Consolidated)
+
+All items below raise `NotImplementedError` when attempted, with a message
+describing the limitation. This is the single source of truth for remaining
+survey limitations.
+
+### Replicate Weights Not Supported
+
+| Estimator | Reason |
+|-----------|--------|
+| SyntheticDiD | No published theory on replicate weights + unit weight optimization |
+| TROP | No published theory on replicate weights + nuclear norm regularization |
+| BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply |
+
+### EfficientDiD Survey Limitations
 
-### Deferred
+| Limitation | Reason |
+|-----------|--------|
+| `covariates` + `survey_design` | DR nuisance path doesn't thread survey weights |
+| `cluster` + `survey_design` | Use `survey_design` with PSU/strata instead |
 
-| Estimator | Capability | Reason |
+### Bootstrap + Replicate Weights (Mutual Exclusion)
+
+Replicate weights and bootstrap are alternative variance estimation methods.
+Combining them raises `NotImplementedError`:
+
+| Estimator |
+|-----------|
+| CallawaySantAnna |
+| ContinuousDiD |
+| EfficientDiD |
+| StaggeredTripleDifference |
+
+### Other Limitations
+
+| Estimator | Limitation | Reason |
 |-----------|-----------|--------|
-| SyntheticDiD | Replicate weights | No published theory on replicate weights + unit weight optimization |
-| TROP | Replicate weights | No published theory on replicate weights + nuclear norm regularization |
-| BaconDecomposition | Replicate weights | Diagnostic tool with no inference — replicate weights don't apply |
-| EfficientDiD | Covariates + survey, cluster + survey, bootstrap + survey | Lower demand, newer estimator; 3 `NotImplementedError` paths |
+| SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` |
+| ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented |
+| ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
+| (all estimators) | Wild bootstrap + survey weights | Use analytical survey SEs or survey-aware multiplier bootstrap instead |
+
+### Documentation-Only (Phase 8g)
+
+- **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed.
+- **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration.
diff --git a/docs/tutorials/16_survey_did.ipynb b/docs/tutorials/16_survey_did.ipynb
index 543e9862..aa1ce756 100644
--- a/docs/tutorials/16_survey_did.ipynb
+++ b/docs/tutorials/16_survey_did.ipynb
@@ -95,7 +95,7 @@
     "**About the normalization warning:** You'll see `pweight weights normalized to mean=1` throughout this tutorial. ",
     "Survey weights are inverse selection probabilities -- they rarely have mean=1 out of the box. ",
     "The library rescales them internally so that weighted estimators are numerically stable. ",
-    "This is standard practice (Lumley 2004, \u00a72.2). ",
+    "This is standard practice (Lumley 2004, §2.2). ",
     "The warning confirms rescaling occurred; it is not an error."
    ]
   },
@@ -356,8 +356,9 @@
     "- **JKn** (Jackknife delete-n): Stratified jackknife, drops one PSU per stratum.\n",
     "- **BRR** (Balanced Repeated Replication): Halve each stratum, reweight.\n",
     "- **Fay's BRR**: Modified BRR with a damping factor (0 < rho < 1).\n",
+    "- **SDR** (Successive Difference Replication): Used by ACS PUMS (80 replicate columns). Variance: V = 4/R × Σ(θᵣ − θ)².\n",
     "\n",
-    "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other."
+    "`SurveyDesign` accepts replicate weights as an alternative to strata/PSU/FPC. They are **mutually exclusive** -- use one or the other.\n"
    ]
   },
   {
@@ -527,13 +528,61 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## 9. Which Estimators Support Survey Design?\n\n`diff-diff` supports survey design across all estimators, though the level of support varies:\n\n| Estimator | Weights | Strata/PSU/FPC (TSL) | Replicate Weights | Survey-Aware Bootstrap |\n|-----------|---------|---------------------|-------------------|------------------------|\n| **DifferenceInDifferences** | Full | Full | -- | -- |\n| **TwoWayFixedEffects** | Full | Full | -- | -- |\n| **MultiPeriodDiD** | Full | Full | -- | -- |\n| **CallawaySantAnna** | pweight only | Full | Full | Multiplier at PSU |\n| **TripleDifference** | pweight only | Full | Full (analytical) | -- |\n| **StaggeredTripleDifference** | pweight only | Full | Full | Multiplier at PSU |\n| **SunAbraham** | Full | Full | -- | Rao-Wu rescaled |\n| **StackedDiD** | pweight only | Full (pweight only) | -- | -- |\n| **ImputationDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **TwoStageDiD** | pweight only | Partial (no FPC) | -- | Multiplier at PSU |\n| **ContinuousDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **EfficientDiD** | Full | Full | Full (analytical) | Multiplier at PSU |\n| **SyntheticDiD** | pweight only | -- | -- | Rao-Wu rescaled |\n| **TROP** | pweight only | -- | -- | Rao-Wu rescaled |\n| **BaconDecomposition** | Diagnostic | Diagnostic | -- | -- |\n\n**Legend:**\n- **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance\n- **Full (pweight only)**: Full TSL support with strata/PSU/FPC, but only accepts `pweight` weight type (`fweight`/`aweight` rejected because Q-weight composition changes their semantics)\n- **Partial (no FPC)**: Weights + strata (for df) + PSU (for clustering); FPC raises `NotImplementedError`\n- **pweight only** (Weights column): Only `pweight` accepted; `fweight`/`aweight` raise an error\n- **pweight only** (TSL column): Sampling weights for point estimates; no strata/PSU/FPC design elements\n- **Diagnostic**: Weighted descriptive statistics only (no inference)\n- **--**: Not supported\n\n**Note:** `EfficientDiD` does not support `covariates` and `survey_design` simultaneously (the DR nuisance path does not yet thread survey weights). Use `covariates=None` with survey designs.\n\nFor full details, see `docs/survey-roadmap.md`."
+   "source": [
+    "## 9. Which Estimators Support Survey Design?\n",
+    "\n",
+    "All estimators accept `survey_design` in `fit()`. Support depth varies — see the\n",
+    "[Survey Design Support](https://diff-diff.readthedocs.io/en/latest/choosing_estimator.html#survey-design-support)\n",
+    "table in the Choosing an Estimator guide for the full compatibility matrix.\n",
+    "\n",
+    "**Key highlights:**\n",
+    "- **CallawaySantAnna** has the most complete support: all design elements, replicate weights, analytical and bootstrap SEs, panel and cross-section modes\n",
+    "- **12 of 15 estimators** support replicate weights (BRR, Fay, JK1, JKn, SDR)\n",
+    "- **SyntheticDiD** and **TROP** support strata/PSU/FPC via bootstrap only (not placebo/analytical)\n"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Summary\n\n**Key takeaways:**\n\n1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n\n2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n\n3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n\n4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n\n5. **Replicate weights** (JK1, JKn, BRR, Fay) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n\n6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n\n7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n\n8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n\n**Quick reference:**\n\n| Parameter | When to use |\n|-----------|------------|\n| `weights` | Always -- specify the sampling weight column |\n| `strata` | When the survey uses stratified sampling |\n| `psu` | When multi-stage (clustered) sampling is used |\n| `fpc` | When the sampling fraction is non-negligible |\n| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n\n**References:**\n\n- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1)."
+    "## Summary\n",
+    "\n",
+    "**Key takeaways:**\n",
+    "\n",
+    "1. **Always specify the survey design** when working with survey data. Ignoring it leads to incorrect standard errors -- typically too small, leading to false positives.\n",
+    "\n",
+    "2. **`SurveyDesign`** encapsulates your survey's sampling structure in one object. Pass column names for weights, strata, PSU, and FPC.\n",
+    "\n",
+    "3. **Pass `survey_design` to `fit()`** -- the same API works across all estimators. No changes to your estimation code beyond adding one parameter.\n",
+    "\n",
+    "4. **CallawaySantAnna** has the most complete survey support: strata/PSU/FPC, replicate weights, analytical and bootstrap SEs, and both panel and cross-section modes.\n",
+    "\n",
+    "5. **Replicate weights** (JK1, JKn, BRR, Fay, SDR) are an alternative to strata/PSU/FPC when your survey provides them (e.g., MEPS, ACS PUMS). They are mutually exclusive with strata/PSU/FPC.\n",
+    "\n",
+    "6. **Use `subpopulation()`** instead of subsetting when estimating effects for a subgroup. Subsetting drops design information and biases variance estimates.\n",
+    "\n",
+    "7. **DEFF diagnostics** help you understand *how* the survey design affects precision. DEFF > 1 means clustering costs exceed stratification gains; DEFF < 1 means the design improves precision for that coefficient. DEFF > 2 indicates substantial clustering.\n",
+    "\n",
+    "8. **Repeated cross-sections** (`panel=False`) work with survey design for non-panel surveys like BRFSS, CPS, and ACS 1-year.\n",
+    "\n",
+    "**Quick reference:**\n",
+    "\n",
+    "| Parameter | When to use |\n",
+    "|-----------|------------|\n",
+    "| `weights` | Always -- specify the sampling weight column |\n",
+    "| `strata` | When the survey uses stratified sampling |\n",
+    "| `psu` | When multi-stage (clustered) sampling is used |\n",
+    "| `fpc` | When the sampling fraction is non-negligible |\n",
+    "| `replicate_weights` | When the survey provides replicate weights instead of strata/PSU/FPC |\n",
+    "\n",
+    "**References:**\n",
+    "\n",
+    "- Lumley, T. (2004). Analysis of Complex Survey Samples. *Journal of Statistical Software* 9(8).\n",
+    "- Solon, G., Haider, S. J., & Wooldridge, J. M. (2015). What Are We Weighting For? *Journal of Human Resources* 50(2).\n",
+    "- Binder, D. A. (1983). On the Variances of Asymptotically Normal Estimators from Complex Surveys. *International Statistical Review* 51(3).\n",
+    "- Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some Recent Work on Resampling Methods for Complex Surveys. *Survey Methodology* 18(2).\n",
+    "- Callaway, B. & Sant'Anna, P. H. C. (2021). Difference-in-Differences with Multiple Time Periods. *Journal of Econometrics* 225(2).\n",
+    "- Sant'Anna, P. H. C. & Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. *Journal of Econometrics* 219(1).\n"
    ]
   }
  ],
@@ -544,4 +593,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
\ No newline at end of file
+}

From 8677e76fb856179cbed5a5056fe938c3ac3205c6 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 09:27:58 -0400
Subject: [PATCH 02/12] Address P2 review findings: fix stale notes, deferred
 IF, missing estimators

- Revert efficient DRDID nuisance IF for reg+covariates to deferred status
  (code and REGISTRY.md still use conservative plug-in IF)
- Update phase summary table Notes to reflect resolved bootstrap+survey
  paths (SA, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, CS)
- Add SunAbraham, ImputationDiD, TwoStageDiD to bootstrap+replicate
  mutual exclusion table in consolidated deferred section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 008101f4..d329c84e 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -19,11 +19,11 @@ All phases (1-8) are implemented. Remaining deferred items are listed at the bot
 | Estimator | File | Survey Support | Notes |
 |-----------|------|----------------|-------|
 | StackedDiD | `stacked_did.py` | pweight only | Q-weights compose multiplicatively with survey weights; TSL vcov on composed weights; fweight/aweight rejected (composition changes weight semantics) |
-| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap+survey deferred |
+| SunAbraham | `sun_abraham.py` | Full | Survey weights in LinearRegression + weighted within-transform; bootstrap via Rao-Wu rescaled (Phase 6) |
 | BaconDecomposition | `bacon.py` | Diagnostic | Weighted cell means, weighted within-transform, weighted group shares; no inference (diagnostic only) |
 | TripleDifference | `triple_diff.py` | Full | Regression, IPW, and DR methods with weighted OLS/logit + TSL on influence functions |
-| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap+survey deferred |
-| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap+survey deferred |
+| ContinuousDiD | `continuous_did.py` | Analytical | Weighted B-spline OLS + TSL on influence functions; bootstrap via multiplier at PSU (Phase 6) |
+| EfficientDiD | `efficient_did.py` | Analytical | Weighted means/covariances in Omega* + TSL on EIF scores; bootstrap via multiplier at PSU (Phase 6) |
 
 ### Phase 3 Deferred Work
 
@@ -44,9 +44,9 @@ message pointing to the planned phase or describing the limitation.
 
 | Estimator | File | Survey Support | Notes |
 |-----------|------|----------------|-------|
-| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred |
-| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred |
-| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; replicate IF variance for analytical SEs |
+| ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap via multiplier at PSU (Phase 6) |
+| TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap via multiplier at PSU (Phase 6) |
+| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR supports covariates (Phase 7a); survey-weighted WIF in aggregation; replicate IF variance for analytical SEs |
 
 **Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights
 enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked
@@ -61,7 +61,7 @@ TripleDifference IPW/DR from Phase 3 deferred work.
 | CallawaySantAnna | Bootstrap + survey | **Resolved** (Phase 6, multiplier at PSU) |
 | CallawaySantAnna | Strata/PSU/FPC in SurveyDesign | **Resolved** (Phase 6, `compute_survey_if_variance()`) |
 | CallawaySantAnna | Covariates + IPW/DR + survey | **Resolved** (Phase 7a, DRDID nuisance IF corrections) |
-| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | **Resolved** (Phase 7a) |
+| CallawaySantAnna | Efficient DRDID nuisance IF for reg+covariates | Deferred — code uses conservative plug-in IF (see REGISTRY.md) |
 
 ## Implemented (Phase 5): SyntheticDiD + TROP Survey Support
 
@@ -285,6 +285,12 @@ survey limitations.
 | TROP | No published theory on replicate weights + nuclear norm regularization |
 | BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply |
 
+### CallawaySantAnna Survey Limitations
+
+| Limitation | Reason |
+|-----------|--------|
+| Efficient DRDID nuisance IF for `reg`+covariates | Code uses conservative plug-in IF; efficient correction deferred (see REGISTRY.md) |
+
 ### EfficientDiD Survey Limitations
 
 | Limitation | Reason |
@@ -302,7 +308,10 @@ Combining them raises `NotImplementedError`:
 | CallawaySantAnna |
 | ContinuousDiD |
 | EfficientDiD |
+| ImputationDiD |
 | StaggeredTripleDifference |
+| SunAbraham |
+| TwoStageDiD |
 
 ### Other Limitations
 

From 38f087baeaa65679ca10eadd45315dd10c50890a Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 09:43:15 -0400
Subject: [PATCH 03/12] Fix P2: clarify error types in consolidated deferred
 section

Some bootstrap+replicate exclusions raise ValueError (not
NotImplementedError). Update wording to "raise an error" to
accurately reflect the runtime contract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index d329c84e..f7f18313 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -273,9 +273,10 @@ deferred to the consolidated list below:
 
 ## Deferred Work (Consolidated)
 
-All items below raise `NotImplementedError` when attempted, with a message
-describing the limitation. This is the single source of truth for remaining
-survey limitations.
+All items below raise an error when attempted (`NotImplementedError` or
+`ValueError` depending on the estimator), with a message describing the
+limitation. This is the single source of truth for remaining survey
+limitations.
 
 ### Replicate Weights Not Supported
 
@@ -301,7 +302,7 @@ survey limitations.
 ### Bootstrap + Replicate Weights (Mutual Exclusion)
 
 Replicate weights and bootstrap are alternative variance estimation methods.
-Combining them raises `NotImplementedError`:
+Combining them raises `NotImplementedError` or `ValueError`:
 
 | Estimator |
 |-----------|

From 7180d94d3c8748a91218affb426b4beef110bde4 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 09:54:39 -0400
Subject: [PATCH 04/12] Address P2/P3: REGISTRY replicate matrix, ROADMAP
 qualifier, soften wording

- Update REGISTRY.md replicate-weight support matrix: CS now supports
  covariates with replicate weights (IF-based path is covariate-agnostic,
  shipped in Phase 7a)
- Qualify ROADMAP.md: "replicate weights supported for 12 of 15" instead
  of "across all estimators"
- Soften consolidated deferred section from "single source of truth" to
  "summary of major remaining limitations" with TODO.md cross-reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md                   | 2 +-
 docs/methodology/REGISTRY.md | 7 ++++---
 docs/survey-roadmap.md       | 5 +++--
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 8f8fce1c..582de611 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
 - **Study design**: Power analysis tools
 - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
-- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (see [survey-roadmap.md](docs/survey-roadmap.md))
+- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (replicate weights supported for 12 of 15; see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix)
 - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
 
 ---
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 554acd28..04f1fb50 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2307,9 +2307,10 @@ variance from the distribution of replicate estimates.
   design structure is fixed and dropped replicates contribute zero to the
   sum without changing the scale. Survey df uses `n_valid - 1` for
   t-based inference.
-- **Note:** Replicate-weight support matrix:
-  - **Supported**: CallawaySantAnna (reg/ipw/dr without covariates, no
-    bootstrap), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap),
+- **Note:** Replicate-weight support matrix (12 of 15 estimators):
+  - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates,
+    no bootstrap; IF-based replicate variance is covariate-agnostic),
+    ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap),
     TripleDifference (all methods), LinearRegression (OLS path),
     DifferenceInDifferences (no-absorb via LinearRegression dispatch,
     absorb via estimator-level refit), MultiPeriodDiD (no-absorb via
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index f7f18313..b25b4f74 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -275,8 +275,9 @@ deferred to the consolidated list below:
 
 All items below raise an error when attempted (`NotImplementedError` or
 `ValueError` depending on the estimator), with a message describing the
-limitation. This is the single source of truth for remaining survey
-limitations.
+limitation. This is a summary of the major remaining survey limitations.
+See also `TODO.md` for general tech debt items (e.g., multi-absorb +
+survey weights).
 
 ### Replicate Weights Not Supported
 

From 1bb37b17e8115e63b3f87755ef2c21774ea7a53d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:02:49 -0400
Subject: [PATCH 05/12] Address P2/P3: fix REGISTRY replicate list, update
 stale error message

- Replace LinearRegression (internal helper) with StaggeredTripleDifference
  (public estimator) in REGISTRY.md replicate-weight support matrix
- Update wild bootstrap + survey error message to remove stale "planned
  Phase 5 support" reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 diff_diff/survey.py          | 4 ++--
 docs/methodology/REGISTRY.md | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index c717db39..1ad16dd3 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -1087,8 +1087,8 @@ def _resolve_survey_for_fit(survey_design, data, inference_mode="analytical"):
     if inference_mode == "wild_bootstrap":
         raise NotImplementedError(
             "Wild bootstrap with survey weights is not yet supported. "
-            "Use inference='analytical' with survey_design, or see "
-            "docs/survey-roadmap.md for planned Phase 5 support."
+            "Use inference='analytical' with survey_design, or use "
+            "survey-aware multiplier bootstrap (n_bootstrap > 0)."
         )
 
     resolved = survey_design.resolve(data)
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 04f1fb50..77d85feb 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -2307,11 +2307,11 @@ variance from the distribution of replicate estimates.
   design structure is fixed and dropped replicates contribute zero to the
   sum without changing the scale. Survey df uses `n_valid - 1` for
   t-based inference.
-- **Note:** Replicate-weight support matrix (12 of 15 estimators):
+- **Note:** Replicate-weight support matrix (12 of 15 public estimators):
   - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates,
     no bootstrap; IF-based replicate variance is covariate-agnostic),
     ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap),
-    TripleDifference (all methods), LinearRegression (OLS path),
+    TripleDifference (all methods), StaggeredTripleDifference (IF-based),
     DifferenceInDifferences (no-absorb via LinearRegression dispatch,
     absorb via estimator-level refit), MultiPeriodDiD (no-absorb via
     `compute_replicate_vcov`, absorb via estimator-level refit),

From 51bff5551cab2f38dc9c0e69ba0a8a2760ca79e3 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:11:03 -0400
Subject: [PATCH 06/12] Address P2/P3: tighten ROADMAP wording, add SDR to
 Phase 6, fix error msg

- ROADMAP.md: restructure to say "survey-aware inference across all 15
  estimators; replicate weights supported for 12 of 15"
- survey-roadmap.md Phase 6: add SDR to replicate method list
- survey.py: make wild bootstrap error message generic (not all estimators
  expose n_bootstrap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md             | 2 +-
 diff_diff/survey.py    | 3 +--
 docs/survey-roadmap.md | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 582de611..ebfd39e0 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
 - **Study design**: Power analysis tools
 - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
-- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, replicate weights (BRR/Fay/JK1/JKn/SDR), Taylor linearization, DEFF diagnostics, subpopulation analysis — integrated across all estimators (replicate weights supported for 12 of 15; see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix)
+- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, Taylor linearization, DEFF diagnostics, subpopulation analysis — survey-aware inference across all 15 estimators; replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15 (see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix)
 - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
 
 ---
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index 1ad16dd3..14ae44d0 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -1087,8 +1087,7 @@ def _resolve_survey_for_fit(survey_design, data, inference_mode="analytical"):
     if inference_mode == "wild_bootstrap":
         raise NotImplementedError(
             "Wild bootstrap with survey weights is not yet supported. "
-            "Use inference='analytical' with survey_design, or use "
-            "survey-aware multiplier bootstrap (n_bootstrap > 0)."
+            "Use analytical survey inference (the default) instead."
         )
 
     resolved = survey_design.resolve(data)
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index b25b4f74..ea877689 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -92,7 +92,7 @@ Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies:
 
 ### Replicate Weight Variance ✅ (2026-03-26)
 Re-run WLS for each replicate weight column, compute variance from distribution
-of estimates. Supports BRR, Fay's BRR, JK1, JKn methods.
+of estimates. Supports BRR, Fay's BRR, JK1, JKn, and SDR methods.
 JKn requires explicit `replicate_strata` (per-replicate stratum assignment).
 - `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign
 - `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate)

From 815e0e44adae7523eb25b3b00d1a62d11279457c Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:17:04 -0400
Subject: [PATCH 07/12] Address P2/P3: precise ROADMAP survey wording, fix
 header and wild bootstrap

- ROADMAP.md: distinguish survey weights (all 15) from design-based
  variance (varies by estimator), carve out BaconDecomposition
- survey-roadmap.md: header says "Phases 1-8f implemented" (8g partial)
  instead of "All phases implemented"
- Deferred work: scope wild bootstrap row to DiD/TWFE/MultiPeriod
  (the estimators that expose inference='wild_bootstrap')

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md             | 2 +-
 docs/survey-roadmap.md | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index ebfd39e0..2bb2a752 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -16,7 +16,7 @@ diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
 - **Study design**: Power analysis tools
 - **Data utilities**: Real-world datasets (Card-Krueger, Castle Doctrine, Divorce Laws, MPDTA), DGP functions for all supported designs
-- **Survey support**: Full `SurveyDesign` with strata, PSU, FPC, weight types, Taylor linearization, DEFF diagnostics, subpopulation analysis — survey-aware inference across all 15 estimators; replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15 (see [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the compatibility matrix)
+- **Survey support**: `SurveyDesign` with strata, PSU, FPC, weight types, DEFF diagnostics, subpopulation analysis. All 15 estimators accept survey weights; design-based variance estimation (TSL, replicate weights, survey-aware bootstrap) varies by estimator. Replicate weights (BRR/Fay/JK1/JKn/SDR) supported for 12 of 15; `BaconDecomposition` is diagnostic-only. See [choosing_estimator.rst](docs/choosing_estimator.rst#survey-design-support) for the full compatibility matrix.
 - **Performance**: Optional Rust backend for accelerated computation; faster than R at scale (see [CHANGELOG.md](CHANGELOG.md) for benchmarks)
 
 ---
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index ea877689..050bac58 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -1,7 +1,8 @@
 # Survey Data Support Roadmap
 
 This document captures the survey data support roadmap for diff-diff.
-All phases (1-8) are implemented. Remaining deferred items are listed at the bottom.
+Phases 1-8f are implemented. Phase 8g (documentation-only items) is partially
+addressed. Remaining deferred items are listed at the bottom.
 
 ## Implemented (Phases 1-2)
 
@@ -322,7 +323,7 @@ Combining them raises `NotImplementedError` or `ValueError`:
 | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` |
 | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented |
 | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
-| (all estimators) | Wild bootstrap + survey weights | Use analytical survey SEs or survey-aware multiplier bootstrap instead |
+| DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead |
 
 ### Documentation-Only (Phase 8g)
 

From 1b174a63ebbdf14c00633c0842e8841800b56d56 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:27:27 -0400
Subject: [PATCH 08/12] Address P2/P3: fix ROADMAP opening sentence, move Phase
 8g out of deferred
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- ROADMAP.md line 11: remove "Taylor linearization, replicate weights
  integrated across all estimators" — now says "all estimators accept
  survey weights, with design-based variance varying by estimator"
- survey-roadmap.md: move Phase 8g documentation tasks into their own
  section outside the consolidated runtime-limitations block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 ROADMAP.md             | 2 +-
 docs/survey-roadmap.md | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/ROADMAP.md b/ROADMAP.md
index 2bb2a752..377f8dd8 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8,7 +8,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 ## Current Status
 
-diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — design-based variance estimation (Taylor linearization, replicate weights) integrated across all estimators. No R or Python package offers this combination:
+diff-diff v2.8.4 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis, plus **unique survey support** — all estimators accept survey weights, with design-based variance estimation varying by estimator. No R or Python package offers this combination:
 
 - **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), Staggered Triple Difference (Ortiz-Villavicencio & Sant'Anna 2025), TROP, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing et al. 2024), Continuous DiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
 - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 050bac58..c235d3f4 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -325,7 +325,11 @@ Combining them raises `NotImplementedError` or `ValueError`:
 | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
 | DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead |
 
-### Documentation-Only (Phase 8g)
+---
+
+## Remaining Documentation Tasks (Phase 8g)
+
+These are documentation improvements, not runtime limitations:
 
 - **Multi-stage design**: Document that single-stage (strata + PSU) is sufficient for variance estimation per Lumley (2004) Section 2.2. No code changes needed.
 - **Post-stratification / calibration**: Document that `SurveyDesign` expects pre-calibrated weights. Point users to `samplics` or R's `survey::calibrate()` for weight calibration.

From 92df37c1c1234227a2f0b39002bbe6592ae7e1ac Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:36:40 -0400
Subject: [PATCH 09/12] Address P3: separate documented deviations from runtime
 limitations

Move CallawaySantAnna conservative plug-in IF entry into its own
"Documented Deviations" subsection (supported path, not an error).
Runtime limitations intro now accurately describes only error-raising
items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index c235d3f4..1d21f5d8 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -274,11 +274,21 @@ deferred to the consolidated list below:
 
 ## Deferred Work (Consolidated)
 
+### Documented Deviations
+
+These are supported paths that use a conservative or simplified approach
+rather than the theoretically optimal one. They do not raise errors.
+
+| Estimator | Deviation | Details |
+|-----------|-----------|---------|
+| CallawaySantAnna | `reg`+covariates uses conservative plug-in IF | Efficient DRDID nuisance IF correction deferred; see REGISTRY.md |
+
+### Runtime Limitations
+
 All items below raise an error when attempted (`NotImplementedError` or
 `ValueError` depending on the estimator), with a message describing the
-limitation. This is a summary of the major remaining survey limitations.
-See also `TODO.md` for general tech debt items (e.g., multi-absorb +
-survey weights).
+limitation. See also `TODO.md` for general tech debt items (e.g.,
+multi-absorb + survey weights).
 
 ### Replicate Weights Not Supported
 
@@ -288,12 +298,6 @@ survey weights).
 | TROP | No published theory on replicate weights + nuclear norm regularization |
 | BaconDecomposition | Diagnostic tool with no inference — replicate weights don't apply |
 
-### CallawaySantAnna Survey Limitations
-
-| Limitation | Reason |
-|-----------|--------|
-| Efficient DRDID nuisance IF for `reg`+covariates | Code uses conservative plug-in IF; efficient correction deferred (see REGISTRY.md) |
-
 ### EfficientDiD Survey Limitations
 
 | Limitation | Reason |

From f54dea835ccebdcbae7642e7f59369e13d458317 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:46:45 -0400
Subject: [PATCH 10/12] Address P3: MultiPeriodDiD wild bootstrap warns, not
 errors

Split wild bootstrap row: DiD/TWFE raise NotImplementedError,
MultiPeriodDiD warns and falls back to analytical inference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 1d21f5d8..9c2419f3 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -327,7 +327,8 @@ Combining them raises `NotImplementedError` or `ValueError`:
 | SyntheticDiD | `variance_method='placebo'` + strata/PSU/FPC | Use `variance_method='bootstrap'` |
 | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented |
 | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
-| DifferenceInDifferences, TwoWayFixedEffects, MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Use analytical survey inference (the default) instead |
+| DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead |
+| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference (no error raised) |
 
 ---
 

From 2168e2c72d275a0845db3106651c2f7f3a6d4703 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 10:59:27 -0400
Subject: [PATCH 11/12] Address P3: move MultiPeriodDiD fallback out of error
 section

MultiPeriodDiD wild bootstrap warns and falls back rather than raising.
Move it into its own "Warning/Fallback Behaviors" subsection outside
the runtime-error block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 9c2419f3..2b1a193f 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -328,7 +328,14 @@ Combining them raises `NotImplementedError` or `ValueError`:
 | ImputationDiD | `pretrends=True` + replicate weights | Per-replicate lead regression refits not implemented |
 | ImputationDiD | `pretrend_test()` + replicate weights | Per-replicate Equation 9 refits not implemented |
 | DifferenceInDifferences, TwoWayFixedEffects | `inference='wild_bootstrap'` + `survey_design` | Raises `NotImplementedError`; use analytical survey inference (the default) instead |
-| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference (no error raised) |
+
+### Warning/Fallback Behaviors
+
+These do not raise errors but silently change behavior:
+
+| Estimator | Limitation | Behavior |
+|-----------|-----------|----------|
+| MultiPeriodDiD | `inference='wild_bootstrap'` + `survey_design` | Warns and falls back to analytical inference |
 
 ---
 

From 2fee9b419ed16626e71fdae9c327a11347d2969b Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 4 Apr 2026 11:15:42 -0400
Subject: [PATCH 12/12] Address P3: update 7c blurb to reflect tutorial
 cross-reference

Tutorial Section 9 now links to the compatibility matrix rather than
containing the table itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/survey-roadmap.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index 2b1a193f..56311734 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -168,8 +168,8 @@ framed around a state-level preventive care program evaluated with a
 stratified health survey (ACS/BRFSS-like). Covers: why survey design
 matters, SurveyDesign setup, basic DiD with survey, staggered DiD
 (CallawaySantAnna) with survey, replicate weights (JK1), subpopulation
-analysis, DEFF diagnostics, repeated cross-sections, and estimator
-support reference table. Uses `generate_survey_did_data()` DGP function
+analysis, DEFF diagnostics, repeated cross-sections, and a link to the
+compatibility matrix in `choosing_estimator.rst`. Uses `generate_survey_did_data()` DGP function
 added to `diff_diff.prep`.
 
 ### 7d. HonestDiD with Survey Variance ✅