igerber · igerber · Mar 26, 2026 · Mar 26, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/TODO.md b/TODO.md
@@ -52,15 +52,16 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
 | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
+| Replicate-weight survey df — **Resolved**. `n_replicates` updated to valid count after invalid replicates are dropped, so `df_survey = n_valid - 1`. | `survey.py` | #238 | Resolved |
 | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
 | CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
 | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
-| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
+| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
 | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
 | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
-| ContinuousDiD event-study aggregation does not filter by `anticipation` — uses all (g,t) cells instead of anticipation-filtered subset; pre-existing in both survey and non-survey paths | `continuous_did.py` | #226 | Medium |
+| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
-| Duplicated survey metadata summary formatting across 6 results classes — extract shared `_format_survey_metadata(sm, width)` helper to reduce maintenance burden as more estimators gain survey support in Phases 4-5 | `results.py`, `stacked_did_results.py`, `sun_abraham.py`, `bacon.py`, `triple_diff.py`, `continuous_did_results.py`, `efficient_did_results.py` | #226 | Low |
+| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved |
 | TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |
 
 #### Performance
@@ -78,7 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
 | ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low |
 | Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
-| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium |
+| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | — | Resolved |
 
 ---
 

diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
@@ -95,8 +95,10 @@
     SyntheticDiDResults,
 )
 from diff_diff.survey import (
+    DEFFDiagnostics,
     SurveyDesign,
     SurveyMetadata,
+    compute_deff_diagnostics,
 )
 from diff_diff.staggered import (
     CallawaySantAnna,
@@ -327,6 +329,8 @@
     # Survey support
     "SurveyDesign",
     "SurveyMetadata",
+    "DEFFDiagnostics",
+    "compute_deff_diagnostics",
     # Rust backend
     "HAS_RUST_BACKEND",
     # Linear algebra helpers

diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py
@@ -17,6 +17,7 @@
 import numpy as np
 import pandas as pd
 
+from diff_diff.results import _format_survey_block
 from diff_diff.utils import within_transform as _within_transform_util
 
 
@@ -144,23 +145,7 @@ def summary(self) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<35} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<35} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<35} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<35} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<35} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<35} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         lines.extend(
             [
@@ -849,13 +834,21 @@ def _compute_treated_vs_never(
         never_post_mask = never_mask & df[time].isin(post_periods)
 
         # Guard against empty cells (unbalanced/filtered panels)
+        # Also check positive weight mass for survey/subpopulation designs
         if not (
             np.any(treated_pre_mask)
             and np.any(treated_post_mask)
             and np.any(never_pre_mask)
             and np.any(never_post_mask)
         ):
             return None
+        if (
+            np.sum(w[treated_pre_mask]) <= 0
+            or np.sum(w[treated_post_mask]) <= 0
+            or np.sum(w[never_pre_mask]) <= 0
+            or np.sum(w[never_post_mask]) <= 0
+        ):
+            return None
 
         treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
         treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
@@ -966,14 +959,21 @@ def _compute_timing_comparison(
         control_pre_mask = control_mask & df[time].isin(pre_periods)
         control_post_mask = control_mask & df[time].isin(post_periods)
 
-        # Skip if any cell is empty
+        # Skip if any cell is empty or has zero effective weight
         if (
             treated_pre_mask.sum() == 0
             or treated_post_mask.sum() == 0
             or control_pre_mask.sum() == 0
             or control_post_mask.sum() == 0
         ):
             return None
+        if (
+            np.sum(w[treated_pre_mask]) <= 0
+            or np.sum(w[treated_post_mask]) <= 0
+            or np.sum(w[control_pre_mask]) <= 0
+            or np.sum(w[control_post_mask]) <= 0
+        ):
+            return None
 
         treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
         treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])