diff --git a/TODO.md b/TODO.md
index b2dd16c7..f9f3dc16 100644
--- a/TODO.md
+++ b/TODO.md
@@ -52,15 +52,16 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
 | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
+| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved |
 | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
 | CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
 | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
-| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
+| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
 | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
 | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
-| ContinuousDiD event-study aggregation does not filter by `anticipation` — uses all (g,t) cells instead of anticipation-filtered subset; pre-existing in both survey and non-survey paths | `continuous_did.py` | #226 | Medium |
+| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
-| Duplicated survey metadata summary formatting across 6 results classes — extract shared `_format_survey_metadata(sm, width)` helper to reduce maintenance burden as more estimators gain survey support in Phases 4-5 | `results.py`, `stacked_did_results.py`, `sun_abraham.py`, `bacon.py`, `triple_diff.py`, `continuous_did_results.py`, `efficient_did_results.py` | #226 | Low |
+| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved |
 | TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |
 
 #### Performance
@@ -78,7 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
 | ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low |
 | Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
-| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium |
+| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | — | Resolved |
 
 ---
 
diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py
index b3188440..1e2eaeb8 100644
--- a/diff_diff/__init__.py
+++ b/diff_diff/__init__.py
@@ -95,8 +95,10 @@
     SyntheticDiDResults,
 )
 from diff_diff.survey import (
+    DEFFDiagnostics,
     SurveyDesign,
     SurveyMetadata,
+    compute_deff_diagnostics,
 )
 from diff_diff.staggered import (
     CallawaySantAnna,
@@ -327,6 +329,8 @@
     # Survey support
     "SurveyDesign",
     "SurveyMetadata",
+    "DEFFDiagnostics",
+    "compute_deff_diagnostics",
     # Rust backend
     "HAS_RUST_BACKEND",
     # Linear algebra helpers
diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py
index 67fe794d..7ffbdf88 100644
--- a/diff_diff/bacon.py
+++ b/diff_diff/bacon.py
@@ -17,6 +17,7 @@
 import numpy as np
 import pandas as pd
 
+from diff_diff.results import _format_survey_block
 from diff_diff.utils import within_transform as _within_transform_util
 
 
@@ -144,23 +145,7 @@ def summary(self) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<35} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<35} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<35} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<35} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<35} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<35} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         lines.extend(
             [
@@ -477,6 +462,13 @@ def fit(
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, "analytical")
         )
+        # Reject replicate-weight designs — Bacon decomposition is a
+        # diagnostic that does not compute replicate-based variance
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "BaconDecomposition does not support replicate-weight survey "
+                "designs. Use a TSL-based survey design (strata/psu/fpc)."
+            )
 
         # Validate within-unit constancy for exact survey weights only.
         # The exact-weight path collapses to per-unit weights via groupby().first(),
@@ -593,6 +585,13 @@ def fit(
                 weights=survey_weights,
             )
 
+        if not comparisons:
+            raise ValueError(
+                "No valid 2x2 comparisons remain after filtering. "
+                "All cells have zero effective weight or insufficient data. "
+                "Check subpopulation/domain definition."
+            )
+
         # Normalize weights to sum to 1
         total_weight = sum(c.weight for c in comparisons)
         if total_weight > 0:
@@ -849,6 +848,7 @@ def _compute_treated_vs_never(
         never_post_mask = never_mask & df[time].isin(post_periods)
 
         # Guard against empty cells (unbalanced/filtered panels)
+        # Also check positive weight mass for survey/subpopulation designs
         if not (
             np.any(treated_pre_mask)
             and np.any(treated_post_mask)
@@ -856,6 +856,13 @@ def _compute_treated_vs_never(
             and np.any(never_post_mask)
         ):
             return None
+        if (
+            np.sum(w[treated_pre_mask]) <= 0
+            or np.sum(w[treated_post_mask]) <= 0
+            or np.sum(w[never_pre_mask]) <= 0
+            or np.sum(w[never_post_mask]) <= 0
+        ):
+            return None
 
         treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
         treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
@@ -966,7 +973,7 @@ def _compute_timing_comparison(
         control_pre_mask = control_mask & df[time].isin(pre_periods)
         control_post_mask = control_mask & df[time].isin(post_periods)
 
-        # Skip if any cell is empty
+        # Skip if any cell is empty or has zero effective weight
         if (
             treated_pre_mask.sum() == 0
             or treated_post_mask.sum() == 0
@@ -974,6 +981,13 @@ def _compute_timing_comparison(
             or control_post_mask.sum() == 0
         ):
             return None
+        if (
+            np.sum(w[treated_pre_mask]) <= 0
+            or np.sum(w[treated_post_mask]) <= 0
+            or np.sum(w[control_pre_mask]) <= 0
+            or np.sum(w[control_post_mask]) <= 0
+        ):
+            return None
 
         treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
         treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
diff --git a/diff_diff/continuous_did.py b/diff_diff/continuous_did.py
index 636fcd6c..019796c3 100644
--- a/diff_diff/continuous_did.py
+++ b/diff_diff/continuous_did.py
@@ -355,6 +355,12 @@ def fit(
                     gt_results[(g, t)] = result
                     gt_bootstrap_info[(g, t)] = result.get("_bootstrap_info", {})
 
+        # Filter out NaN cells (e.g., from zero effective survey mass)
+        gt_results = {
+            gt: r for gt, r in gt_results.items()
+            if np.isfinite(r.get("att_glob", np.nan))
+        }
+
         if len(gt_results) == 0:
             raise ValueError("No valid (g,t) cells computed.")
 
@@ -390,6 +396,7 @@ def fit(
                 gt_bootstrap_info=gt_bootstrap_info,
                 unit_survey_weights=precomp.get("unit_survey_weights"),
                 unit_cohorts=precomp["unit_cohorts"],
+                anticipation=self.anticipation,
             )
 
         _survey_df = None  # Set by analytical branch when survey is active
@@ -509,6 +516,11 @@ def fit(
 
                 # Survey df for t-distribution inference (unit-level, not panel-level)
                 _survey_df = analytic.get("df_survey")
+                # Guard: replicate design with undefined df → NaN inference
+                if (_survey_df is None and resolved_survey is not None
+                        and hasattr(resolved_survey, 'uses_replicate_variance')
+                        and resolved_survey.uses_replicate_variance):
+                    _survey_df = 0
 
                 # Recompute survey_metadata from unit-level design so reported
                 # effective_n/n_psu/df_survey match the inference actually run
@@ -519,6 +531,13 @@ def fit(
                     raw_w_unit = _unit_resolved.weights
                     survey_metadata = compute_survey_metadata(_unit_resolved, raw_w_unit)
 
+                # Propagate replicate df override to survey_metadata for display
+                # (but not the df=0 sentinel — keep metadata as None for undefined df)
+                if (_survey_df is not None and _survey_df != 0
+                        and survey_metadata is not None):
+                    if survey_metadata.df_survey != _survey_df:
+                        survey_metadata.df_survey = _survey_df
+
                 overall_att_t, overall_att_p, overall_att_ci = safe_inference(
                     overall_att, overall_att_se, self.alpha, df=_survey_df
                 )
@@ -571,15 +590,8 @@ def fit(
                         )
                         n_strata_u = len(np.unique(us)) if us is not None else 0
                         n_psu_u = len(np.unique(up)) if up is not None else 0
-                        unit_resolved_es = ResolvedSurveyDesign(
-                            weights=uw,
-                            weight_type=resolved_survey.weight_type,
-                            strata=us,
-                            psu=up,
-                            fpc=uf,
-                            n_strata=n_strata_u,
-                            n_psu=n_psu_u,
-                            lonely_psu=resolved_survey.lonely_psu,
+                        unit_resolved_es = resolved_survey.subset_to_units(
+                            row_idx, uw, us, up, uf, n_strata_u, n_psu_u,
                         )
 
                     for e_val, info_e in event_study_effects.items():
@@ -638,12 +650,19 @@ def fit(
 
                         # Compute SE: survey-aware TSL or standard sqrt(sum(IF^2))
                         if unit_resolved_es is not None:
-                            X_ones_es = np.ones((n_units, 1))
-                            # Rescale IFs by total survey mass (not n_units) for fweight support
-                            tsl_scale_es = float(unit_resolved_es.weights.sum())
-                            if_es_tsl = if_es * tsl_scale_es
-                            vcov_es = compute_survey_vcov(X_ones_es, if_es_tsl, unit_resolved_es)
-                            es_se = float(np.sqrt(np.abs(vcov_es[0, 0])))
+                            if unit_resolved_es.uses_replicate_variance:
+                                from diff_diff.survey import compute_replicate_if_variance
+
+                                # Score-scale: psi = w * if_es (matches TSL bread)
+                                psi_es = unit_resolved_es.weights * if_es
+                                variance, _nv = compute_replicate_if_variance(psi_es, unit_resolved_es)
+                                es_se = float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan
+                            else:
+                                X_ones_es = np.ones((n_units, 1))
+                                tsl_scale_es = float(unit_resolved_es.weights.sum())
+                                if_es_tsl = if_es * tsl_scale_es
+                                vcov_es = compute_survey_vcov(X_ones_es, if_es_tsl, unit_resolved_es)
+                                es_se = float(np.sqrt(np.abs(vcov_es[0, 0])))
                         else:
                             es_se = float(np.sqrt(np.sum(if_es**2)))
 
@@ -871,6 +890,14 @@ def _compute_dose_response_gt(
         if survey_weights is not None:
             w_treated = survey_weights[treated_mask]
             w_control = survey_weights[control_mask]
+            # Guard against zero effective mass (e.g., after subpopulation)
+            if np.sum(w_treated) <= 0 or np.sum(w_control) <= 0:
+                return {
+                    "att_glob": np.nan, "acrt_glob": np.nan,
+                    "n_treated": 0, "n_control": 0,
+                    "att_d": np.full(len(dvals), np.nan),
+                    "acrt_d": np.full(len(dvals), np.nan),
+                }
 
         # Control counterfactual (weighted mean when survey weights present)
         if w_control is not None:
@@ -897,9 +924,14 @@ def _compute_dose_response_gt(
             )
 
         # Skip if not enough treated units for OLS (need n > K for residual df)
-        if n_treated <= n_basis:
+        # When survey weights are present, use positive-weight count as
+        # the effective sample size — subpopulation() can zero weights
+        # leaving rows present but the weighted regression underidentified.
+        n_eff = int(np.count_nonzero(w_treated > 0)) if w_treated is not None else n_treated
+        if n_eff <= n_basis:
+            label = "positive-weight treated units" if w_treated is not None else "treated units"
             warnings.warn(
-                f"Not enough treated units ({n_treated}) for {n_basis} basis functions "
+                f"Not enough {label} ({n_eff}) for {n_basis} basis functions "
                 f"in (g={g}, t={t}). Skipping cell.",
                 UserWarning,
                 stacklevel=3,
@@ -1052,12 +1084,15 @@ def _aggregate_event_study(
         gt_bootstrap_info: Dict[Tuple, Dict] = None,
         unit_survey_weights: Optional[np.ndarray] = None,
         unit_cohorts: Optional[np.ndarray] = None,
+        anticipation: int = 0,
     ) -> Dict[int, Dict[str, Any]]:
         """Aggregate binarized ATT_glob by relative period."""
         effects_by_e: Dict[int, List[Tuple[float, float, Tuple]]] = {}
 
         for (g, t), r in gt_results.items():
             e = t - g
+            if anticipation > 0 and e < -anticipation:
+                continue
             if e not in effects_by_e:
                 effects_by_e[e] = []
             # Compute weight for this (g,t) cell
@@ -1205,48 +1240,58 @@ def _compute_analytical_se(
             n_strata_unit = len(np.unique(unit_strata)) if unit_strata is not None else 0
             n_psu_unit = len(np.unique(unit_psu)) if unit_psu is not None else 0
 
-            unit_resolved = ResolvedSurveyDesign(
-                weights=unit_weights,
-                weight_type=resolved_survey.weight_type,
-                strata=unit_strata,
-                psu=unit_psu,
-                fpc=unit_fpc,
-                n_strata=n_strata_unit,
-                n_psu=n_psu_unit,
-                lonely_psu=resolved_survey.lonely_psu,
+            unit_resolved = resolved_survey.subset_to_units(
+                row_idx, unit_weights, unit_strata, unit_psu, unit_fpc,
+                n_strata_unit, n_psu_unit,
             )
 
             X_ones = np.ones((n_units, 1))
 
-            # Rescale IFs from 1/n convention to score scale for TSL sandwich.
-            # The per-unit IFs contain internal 1/n_t, 1/n_c scaling (for the
-            # unweighted SE = sqrt(sum(IF^2)) convention). compute_survey_vcov
-            # applies its own (X'WX)^{-1} bread, which would double-count.
-            # Rescale by the unit-level total survey mass (= n_units for
-            # pweight/aweight, but can differ for fweight).
-            tsl_scale = float(unit_resolved.weights.sum())
-            if_att_glob_tsl = if_att_glob * tsl_scale
-            if_acrt_glob_tsl = if_acrt_glob * tsl_scale
-            if_att_d_tsl = if_att_d * tsl_scale
-            if_acrt_d_tsl = if_acrt_d * tsl_scale
-
-            # Overall ATT SE via compute_survey_vcov
-            vcov_att = compute_survey_vcov(X_ones, if_att_glob_tsl, unit_resolved)
-            overall_att_se = float(np.sqrt(np.abs(vcov_att[0, 0])))
-
-            # Overall ACRT SE via compute_survey_vcov
-            vcov_acrt = compute_survey_vcov(X_ones, if_acrt_glob_tsl, unit_resolved)
-            overall_acrt_se = float(np.sqrt(np.abs(vcov_acrt[0, 0])))
-
-            # Per-grid-point SEs for dose-response curves
-            att_d_se = np.zeros(n_grid)
-            acrt_d_se = np.zeros(n_grid)
-            for d_idx in range(n_grid):
-                vcov_d = compute_survey_vcov(X_ones, if_att_d_tsl[:, d_idx], unit_resolved)
-                att_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0])))
-
-                vcov_d = compute_survey_vcov(X_ones, if_acrt_d_tsl[:, d_idx], unit_resolved)
-                acrt_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0])))
+            if unit_resolved.uses_replicate_variance:
+                # Replicate-weight variance: score-scale IFs to match TSL bread.
+                # TSL path does: scores = w * (if * tsl_scale), bread = 1/sum(w)^2
+                # Equivalent psi for replicate: w * if_vals * tsl_scale / sum(w) = w * if_vals
+                from diff_diff.survey import compute_replicate_if_variance
+
+                _w_rep = unit_resolved.weights
+                _rep_n_valid = unit_resolved.n_replicates  # track effective count
+
+                def _rep_se(if_vals):
+                    nonlocal _rep_n_valid
+                    psi_scaled = _w_rep * if_vals
+                    v, nv = compute_replicate_if_variance(psi_scaled, unit_resolved)
+                    _rep_n_valid = min(_rep_n_valid, nv)  # worst-case valid count
+                    return float(np.sqrt(max(v, 0.0))) if np.isfinite(v) else np.nan
+
+                overall_att_se = _rep_se(if_att_glob)
+                overall_acrt_se = _rep_se(if_acrt_glob)
+                att_d_se = np.zeros(n_grid)
+                acrt_d_se = np.zeros(n_grid)
+                for d_idx in range(n_grid):
+                    att_d_se[d_idx] = _rep_se(if_att_d[:, d_idx])
+                    acrt_d_se[d_idx] = _rep_se(if_acrt_d[:, d_idx])
+            else:
+                # TSL: rescale IFs from 1/n convention to score scale for sandwich.
+                tsl_scale = float(unit_resolved.weights.sum())
+                if_att_glob_tsl = if_att_glob * tsl_scale
+                if_acrt_glob_tsl = if_acrt_glob * tsl_scale
+                if_att_d_tsl = if_att_d * tsl_scale
+                if_acrt_d_tsl = if_acrt_d * tsl_scale
+
+                vcov_att = compute_survey_vcov(X_ones, if_att_glob_tsl, unit_resolved)
+                overall_att_se = float(np.sqrt(np.abs(vcov_att[0, 0])))
+
+                vcov_acrt = compute_survey_vcov(X_ones, if_acrt_glob_tsl, unit_resolved)
+                overall_acrt_se = float(np.sqrt(np.abs(vcov_acrt[0, 0])))
+
+                att_d_se = np.zeros(n_grid)
+                acrt_d_se = np.zeros(n_grid)
+                for d_idx in range(n_grid):
+                    vcov_d = compute_survey_vcov(X_ones, if_att_d_tsl[:, d_idx], unit_resolved)
+                    att_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0])))
+
+                    vcov_d = compute_survey_vcov(X_ones, if_acrt_d_tsl[:, d_idx], unit_resolved)
+                    acrt_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0])))
         else:
             # SE = sqrt(sum(IF_i^2)), matching CallawaySantAnna's convention
             # (per-unit IFs already contain 1/n_t, 1/n_c scaling)
@@ -1257,7 +1302,14 @@ def _compute_analytical_se(
             acrt_d_se = np.sqrt(np.sum(if_acrt_d**2, axis=0))
 
         # Return unit-level survey df and resolved design for metadata recomputation
-        unit_df_survey = unit_resolved.df_survey if resolved_survey is not None else None
+        # Only override with n_valid-based df when replicates were actually dropped
+        if resolved_survey is not None and hasattr(resolved_survey, 'uses_replicate_variance') and resolved_survey.uses_replicate_variance:
+            if _rep_n_valid < unit_resolved.n_replicates:
+                unit_df_survey = _rep_n_valid - 1 if _rep_n_valid > 1 else None
+            else:
+                unit_df_survey = unit_resolved.df_survey
+        else:
+            unit_df_survey = unit_resolved.df_survey if resolved_survey is not None else None
 
         return {
             "overall_att_se": overall_att_se,
@@ -1294,6 +1346,15 @@ def _run_bootstrap(
                 stacklevel=3,
             )
 
+        # Reject replicate-weight designs for bootstrap — replicate variance
+        # is an analytical alternative to bootstrap, not compatible with it
+        if resolved_survey is not None and hasattr(resolved_survey, "uses_replicate_variance") and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "ContinuousDiD bootstrap (n_bootstrap > 0) is not supported "
+                "with replicate-weight survey designs. Replicate weights provide "
+                "analytical variance; use n_bootstrap=0 instead."
+            )
+
         rng = np.random.default_rng(self.seed)
         n_units = precomp["n_units"]
         n_grid = len(dvals)
@@ -1314,15 +1375,9 @@ def _run_bootstrap(
             unit_fpc = resolved_survey.fpc[row_idx] if resolved_survey.fpc is not None else None
             n_strata_u = len(np.unique(unit_strata)) if unit_strata is not None else 0
             n_psu_u = len(np.unique(unit_psu)) if unit_psu is not None else 0
-            unit_resolved = ResolvedSurveyDesign(
-                weights=unit_weights,
-                weight_type=resolved_survey.weight_type,
-                strata=unit_strata,
-                psu=unit_psu,
-                fpc=unit_fpc,
-                n_strata=n_strata_u,
-                n_psu=n_psu_u,
-                lonely_psu=resolved_survey.lonely_psu,
+            unit_resolved = resolved_survey.subset_to_units(
+                row_idx, unit_weights, unit_strata, unit_psu, unit_fpc,
+                n_strata_u, n_psu_u,
             )
 
         # Generate bootstrap weights — PSU-level when survey design is present
@@ -1374,6 +1429,8 @@ def _run_bootstrap(
             for gt, r in gt_results.items():
                 g_val, t_val = gt
                 e = t_val - g_val
+                if self.anticipation > 0 and e < -self.anticipation:
+                    continue
                 if unit_sw is not None:
                     g_mask = unit_cohorts == g_val
                     cell_mass = float(np.sum(unit_sw[g_mask]))
@@ -1383,6 +1440,8 @@ def _run_bootstrap(
             for gt, r in gt_results.items():
                 g_val, t_val = gt
                 e = t_val - g_val
+                if self.anticipation > 0 and e < -self.anticipation:
+                    continue
                 if unit_sw is not None:
                     g_mask = unit_cohorts == g_val
                     cell_mass = float(np.sum(unit_sw[g_mask]))
diff --git a/diff_diff/continuous_did_results.py b/diff_diff/continuous_did_results.py
index b9f332f4..67a745d4 100644
--- a/diff_diff/continuous_did_results.py
+++ b/diff_diff/continuous_did_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 __all__ = ["ContinuousDiDResults", "DoseResponseCurve"]
 
@@ -182,23 +182,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * w,
-                    "Survey Design".center(w),
-                    "-" * w,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * w, ""])
+            lines.extend(_format_survey_block(sm, w))
 
         # Overall summary parameters
         lines.extend(
diff --git a/diff_diff/efficient_did.py b/diff_diff/efficient_did.py
index c3a341c6..d25f0512 100644
--- a/diff_diff/efficient_did.py
+++ b/diff_diff/efficient_did.py
@@ -373,6 +373,11 @@ def fit(
 
         # Store survey df for safe_inference calls (t-distribution with survey df)
         self._survey_df = survey_metadata.df_survey if survey_metadata is not None else None
+        # Guard: replicate design with undefined df → NaN inference
+        if (self._survey_df is None and resolved_survey is not None
+                and hasattr(resolved_survey, 'uses_replicate_variance')
+                and resolved_survey.uses_replicate_variance):
+            self._survey_df = 0
 
         # Bootstrap + survey supported via PSU-level multiplier bootstrap.
 
@@ -516,18 +521,16 @@ def fit(
             unit_fpc = resolved_survey.fpc[row_idx] if resolved_survey.fpc is not None else None
             n_strata_u = len(np.unique(unit_strata)) if unit_strata is not None else 0
             n_psu_u = len(np.unique(unit_psu)) if unit_psu is not None else 0
-            self._unit_resolved_survey = ResolvedSurveyDesign(
-                weights=unit_weights_s,
-                weight_type=resolved_survey.weight_type,
-                strata=unit_strata,
-                psu=unit_psu,
-                fpc=unit_fpc,
-                n_strata=n_strata_u,
-                n_psu=n_psu_u,
-                lonely_psu=resolved_survey.lonely_psu,
+            self._unit_resolved_survey = resolved_survey.subset_to_units(
+                row_idx, unit_weights_s, unit_strata, unit_psu, unit_fpc,
+                n_strata_u, n_psu_u,
             )
             # Use unit-level df (not panel-level) for t-distribution
             self._survey_df = self._unit_resolved_survey.df_survey
+            # Re-apply replicate guard: undefined df → NaN inference
+            if (self._survey_df is None
+                    and self._unit_resolved_survey.uses_replicate_variance):
+                self._survey_df = 0
         else:
             self._unit_resolved_survey = None
 
@@ -951,6 +954,18 @@ def fit(
             )
 
         # ----- Bootstrap -----
+        # Reject replicate-weight designs for bootstrap — replicate variance
+        # is an analytical alternative, not compatible with bootstrap
+        if (
+            self.n_bootstrap > 0
+            and self._unit_resolved_survey is not None
+            and self._unit_resolved_survey.uses_replicate_variance
+        ):
+            raise NotImplementedError(
+                "EfficientDiD bootstrap (n_bootstrap > 0) is not supported "
+                "with replicate-weight survey designs. Replicate weights provide "
+                "analytical variance; use n_bootstrap=0 instead."
+            )
         bootstrap_results = None
         if self.n_bootstrap > 0 and eif_by_gt:
             bootstrap_results = self._run_multiplier_bootstrap(
@@ -1061,10 +1076,16 @@ def _recompute_unit_survey_metadata(self, panel_metadata):
         if self._unit_resolved_survey is not None:
             from diff_diff.survey import compute_survey_metadata
 
-            return compute_survey_metadata(
+            meta = compute_survey_metadata(
                 self._unit_resolved_survey,
                 self._unit_resolved_survey.weights,
             )
+            # Propagate effective replicate df if available
+            # (but not the df=0 sentinel — keep metadata as None for undefined df)
+            if (self._survey_df is not None and self._survey_df != 0
+                    and meta.df_survey != self._survey_df):
+                meta.df_survey = self._survey_df
+            return meta
         return panel_metadata
 
     # -- Survey SE helpers ----------------------------------------------------
@@ -1076,6 +1097,18 @@ def _compute_survey_eif_se(self, eif_vals: np.ndarray) -> float:
         once in ``fit()``, ensuring consistent unit-level arrays and
         avoiding repeated subsetting of panel-level survey data.
         """
+        if self._unit_resolved_survey.uses_replicate_variance:
+            from diff_diff.survey import compute_replicate_if_variance
+
+            # Score-scale IFs to match TSL bread: psi = w * eif / sum(w)
+            w = self._unit_resolved_survey.weights
+            psi_scaled = w * eif_vals / w.sum()
+            variance, n_valid = compute_replicate_if_variance(psi_scaled, self._unit_resolved_survey)
+            # Update survey df to reflect effective replicate count
+            if n_valid < self._unit_resolved_survey.n_replicates:
+                self._survey_df = n_valid - 1 if n_valid > 1 else None
+            return float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan
+
         from diff_diff.survey import compute_survey_vcov
 
         X_ones = np.ones((len(eif_vals), 1))
@@ -1577,6 +1610,10 @@ def _aggregate_es(
             n_units = int(np.sum(row_finite))
             if cl_idx is not None:
                 cl_idx = cl_idx[row_finite]
+                # Recompute effective cluster count and remap to contiguous
+                # indices — entire clusters may have been dropped by filtering
+                unique_cl, cl_idx = np.unique(cl_idx, return_inverse=True)
+                n_cl = len(unique_cl)
 
         # Compute full covariance matrices
         if cl_idx is not None and n_cl is not None:
diff --git a/diff_diff/efficient_did_results.py b/diff_diff/efficient_did_results.py
index 8e212795..4d10c9b3 100644
--- a/diff_diff/efficient_did_results.py
+++ b/diff_diff/efficient_did_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 if TYPE_CHECKING:
     from diff_diff.efficient_did_bootstrap import EDiDBootstrapResults
@@ -201,23 +201,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py
index 88ae9672..1b8a2ac0 100644
--- a/diff_diff/estimators.py
+++ b/diff_diff/estimators.py
@@ -240,6 +240,15 @@ def fit(
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, self.inference)
         )
+        # Reject replicate-weight designs — base DiD uses compute_survey_vcov
+        # (TSL) directly, not LinearRegression's replicate dispatch.
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "DifferenceInDifferences does not yet support replicate-weight "
+                "survey designs. Use CallawaySantAnna, EfficientDiD, "
+                "ContinuousDiD, or TripleDifference for replicate-weight "
+                "inference, or use a TSL-based survey design (strata/psu/fpc)."
+            )
 
         # Handle absorbed fixed effects (within-transformation)
         working_data = data.copy()
@@ -1008,6 +1017,15 @@ def fit(  # type: ignore[override]
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, effective_inference)
         )
+        # Reject replicate-weight designs — MultiPeriodDiD uses
+        # compute_survey_vcov (TSL) directly without replicate dispatch.
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "MultiPeriodDiD does not yet support replicate-weight survey "
+                "designs. Use CallawaySantAnna for staggered adoption with "
+                "replicate weights, or use a TSL-based survey design "
+                "(strata/psu/fpc)."
+            )
 
         # Handle absorbed fixed effects (within-transformation)
         working_data = data.copy()
diff --git a/diff_diff/imputation.py b/diff_diff/imputation.py
index 77146d40..4f32a663 100644
--- a/diff_diff/imputation.py
+++ b/diff_diff/imputation.py
@@ -246,6 +246,11 @@ def fit(
 
         # Validate within-unit constancy for panel survey designs
         if resolved_survey is not None:
+            if resolved_survey.uses_replicate_variance:
+                raise NotImplementedError(
+                    "ImputationDiD does not yet support replicate-weight survey "
+                    "designs. Use a TSL-based survey design (strata/psu/fpc)."
+                )
             _validate_unit_constant_survey(data, unit, survey_design)
             if resolved_survey.weight_type != "pweight":
                 raise ValueError(
diff --git a/diff_diff/imputation_results.py b/diff_diff/imputation_results.py
index 6589af1d..3a0465e1 100644
--- a/diff_diff/imputation_results.py
+++ b/diff_diff/imputation_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 __all__ = [
     "ImputationBootstrapResults",
@@ -187,23 +187,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py
index 819af25f..5b0d5d59 100644
--- a/diff_diff/linalg.py
+++ b/diff_diff/linalg.py
@@ -402,11 +402,16 @@ def _validate_weights(weights, weight_type, n):
             raise ValueError("Weights contain Inf values")
         if np.any(weights < 0):
             raise ValueError("Weights must be non-negative")
+        if np.sum(weights) <= 0:
+            raise ValueError(
+                "Weights sum to zero — no observations have positive weight. "
+                "Cannot fit a model on an empty effective sample."
+            )
         if weight_type == "fweight":
             fractional = weights - np.round(weights)
             if np.any(np.abs(fractional) > 1e-10):
                 raise ValueError(
-                    "Frequency weights (fweight) must be positive integers. "
+                    "Frequency weights (fweight) must be non-negative integers. "
                     "Fractional values detected. Use pweight for non-integer weights."
                 )
     return weights
@@ -1041,10 +1046,16 @@ def _compute_robust_vcov_numpy(
     else:
         bread_matrix = X.T @ X
 
-    # Effective n for df computation (fweights use sum(w))
+    # Effective n for df computation
+    # fweights: sum(w) (frequency expansion)
+    # pweight/aweight with zeros: positive-weight count (zero-weight rows
+    # contribute nothing to the sandwich and should not inflate df)
     n_eff = n
-    if weights is not None and weight_type == "fweight":
-        n_eff = int(round(np.sum(weights)))
+    if weights is not None:
+        if weight_type == "fweight":
+            n_eff = int(round(np.sum(weights)))
+        elif np.any(weights == 0):
+            n_eff = int(np.count_nonzero(weights > 0))
 
     # Compute weighted scores for cluster-robust meat (outer product of sums).
     # pweight/fweight multiply by w; aweight and unweighted use raw residuals.
@@ -1053,6 +1064,9 @@ def _compute_robust_vcov_numpy(
         scores = X * (weights * residuals)[:, np.newaxis]
     else:
         scores = X * residuals[:, np.newaxis]
+        # Zero out scores for zero-weight aweight rows (subpopulation invariance)
+        if weights is not None and np.any(weights == 0):
+            scores[weights == 0] = 0.0
 
     if cluster_ids is None:
         # HC1 (heteroskedasticity-robust) standard errors
@@ -1070,6 +1084,11 @@ def _compute_robust_vcov_numpy(
         unique_clusters = np.unique(cluster_ids)
         n_clusters = len(unique_clusters)
 
+        # Exclude clusters with zero total weight (subpopulation-zeroed)
+        if weights is not None and weight_type != "fweight" and np.any(weights == 0):
+            cluster_weights = pd.Series(weights).groupby(cluster_ids).sum()
+            n_clusters = int((cluster_weights > 0).sum())
+
         if n_clusters < 2:
             raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {n_clusters}")
 
@@ -1166,8 +1185,12 @@ def solve_logit(
             raise ValueError("weights contain NaN values")
         if np.any(~np.isfinite(weights)):
             raise ValueError("weights contain Inf values")
-        if np.any(weights <= 0):
-            raise ValueError("weights must be strictly positive")
+        if np.any(weights < 0):
+            raise ValueError("weights must be non-negative")
+        if np.sum(weights) <= 0:
+            raise ValueError(
+                "weights sum to zero — no observations have positive weight"
+            )
 
     # Validate rank_deficient_action
     valid_actions = {"warn", "error", "silent"}
@@ -1177,7 +1200,57 @@ def solve_logit(
             f"got '{rank_deficient_action}'"
         )
 
-    # Check rank deficiency once before iterating
+    # Track original column count for coefficient expansion at the end
+    k_original = X_with_intercept.shape[1]
+    eff_dropped_original: list = []  # indices in original column space
+
+    # Validate effective weighted sample when weights have zeros
+    if weights is not None and np.any(weights == 0):
+        pos_mask = weights > 0
+        n_pos = int(np.sum(pos_mask))
+        y_pos = y[pos_mask]
+        # Need both outcome classes in the positive-weight subset
+        unique_y = np.unique(y_pos)
+        if len(unique_y) < 2:
+            raise ValueError(
+                f"Positive-weight observations have only {len(unique_y)} "
+                f"outcome class(es). Logistic regression requires both 0 and 1 "
+                f"in the effective (positive-weight) sample."
+            )
+        # Check rank deficiency on positive-weight rows FIRST — full design
+        # may be full rank due to zero-weight padding. Drop columns before
+        # checking sample-size identification.
+        X_eff = X_with_intercept[pos_mask]
+        eff_rank_info = _detect_rank_deficiency(X_eff)
+        if len(eff_rank_info[1]) > 0:
+            n_dropped_eff = len(eff_rank_info[1])
+            if rank_deficient_action == "error":
+                raise ValueError(
+                    f"Effective (positive-weight) sample is rank-deficient: "
+                    f"{n_dropped_eff} linearly dependent column(s). "
+                    f"Cannot identify logistic model on this subpopulation."
+                )
+            elif rank_deficient_action == "warn":
+                warnings.warn(
+                    f"Effective (positive-weight) sample is rank-deficient: "
+                    f"dropping {n_dropped_eff} column(s). Propensity estimates "
+                    f"may be unreliable on this subpopulation.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+            # Drop columns and track original indices for final expansion
+            eff_dropped_original = list(eff_rank_info[1])
+            X_with_intercept = np.delete(X_with_intercept, eff_rank_info[1], axis=1)
+            k = X_with_intercept.shape[1]
+        # Check sample-size identification AFTER column dropping
+        if n_pos <= k:
+            raise ValueError(
+                f"Only {n_pos} positive-weight observation(s) for "
+                f"{k} parameters (after rank reduction). "
+                f"Cannot identify logistic model."
+            )
+
+    # Check rank deficiency once before iterating (on possibly-shrunk matrix)
     rank_info = _detect_rank_deficiency(X_with_intercept)
     rank, dropped_cols, _ = rank_info
     if len(dropped_cols) > 0:
@@ -1273,10 +1346,21 @@ def solve_logit(
                 stacklevel=2,
             )
 
-    # Expand beta back to full size if columns were dropped
-    if len(dropped_cols) > 0:
-        beta_full = np.zeros(k)
-        beta_full[kept_cols] = beta_solve
+    # Expand beta back to original column count, accounting for columns
+    # dropped in both the effective-sample check and the full-sample check
+    if len(dropped_cols) > 0 or len(eff_dropped_original) > 0:
+        # First expand from X_solve columns back to post-eff-drop columns
+        # Use NaN for dropped coefficients (R convention: not estimable)
+        beta_post_eff = np.full(k, np.nan)
+        beta_post_eff[kept_cols] = beta_solve
+
+        # Then expand from post-eff-drop columns back to original columns
+        if len(eff_dropped_original) > 0:
+            beta_full = np.full(k_original, np.nan)
+            kept_original = [i for i in range(k_original) if i not in eff_dropped_original]
+            beta_full[kept_original] = beta_post_eff
+        else:
+            beta_full = beta_post_eff
     else:
         beta_full = beta_solve
 
@@ -1592,6 +1676,9 @@ def fit(
         X = np.asarray(X, dtype=np.float64)
         y = np.asarray(y, dtype=np.float64)
 
+        # Reset replicate df from any previous fit
+        self._replicate_df = None
+
         # Add intercept if requested
         if self.include_intercept:
             X = np.column_stack([np.ones(X.shape[0]), X])
@@ -1670,10 +1757,14 @@ def fit(
             nan_mask = np.isnan(coefficients)
             k_effective = k - np.sum(nan_mask)  # Number of identified coefficients
 
-            # For fweights, df uses sum(w) - k (effective sample size)
+            # Effective n for df: fweights use sum(w), pweight/aweight with
+            # zeros use positive-weight count (zero-weight rows don't contribute)
             n_eff_df = n
-            if self.weights is not None and self.weight_type == "fweight":
-                n_eff_df = int(round(np.sum(self.weights)))
+            if self.weights is not None:
+                if self.weight_type == "fweight":
+                    n_eff_df = int(round(np.sum(self.weights)))
+                elif np.any(self.weights == 0):
+                    n_eff_df = int(np.count_nonzero(self.weights > 0))
 
             if k_effective == 0:
                 # All coefficients dropped - no valid inference
@@ -1721,20 +1812,54 @@ def fit(
 
         # Compute survey vcov if applicable
         if _use_survey_vcov:
-            from diff_diff.survey import compute_survey_vcov
+            from diff_diff.survey import ResolvedSurveyDesign as _RSD
 
-            nan_mask = np.isnan(coefficients)
-            if np.any(nan_mask):
-                kept_cols = np.where(~nan_mask)[0]
-                if len(kept_cols) > 0:
-                    vcov_reduced = compute_survey_vcov(
-                        X[:, kept_cols], residuals, _effective_survey_design
+            _uses_rep = (
+                isinstance(_effective_survey_design, _RSD)
+                and _effective_survey_design.uses_replicate_variance
+            )
+
+            if _uses_rep:
+                from diff_diff.survey import compute_replicate_vcov
+
+                nan_mask = np.isnan(coefficients)
+                if np.any(nan_mask):
+                    kept_cols = np.where(~nan_mask)[0]
+                    if len(kept_cols) > 0:
+                        vcov_reduced, _n_valid_rep = compute_replicate_vcov(
+                            X[:, kept_cols], y, coefficients[kept_cols],
+                            _effective_survey_design,
+                            weight_type=self.weight_type,
+                        )
+                        vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols)
+                    else:
+                        vcov = np.full((X.shape[1], X.shape[1]), np.nan)
+                        _n_valid_rep = 0
+                else:
+                    vcov, _n_valid_rep = compute_replicate_vcov(
+                        X, y, coefficients, _effective_survey_design,
+                        weight_type=self.weight_type,
                     )
-                    vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols)
+                # Store effective replicate df only when replicates were dropped
+                if _n_valid_rep < _effective_survey_design.n_replicates:
+                    self._replicate_df = _n_valid_rep - 1 if _n_valid_rep > 1 else None
                 else:
-                    vcov = np.full((X.shape[1], X.shape[1]), np.nan)
+                    self._replicate_df = None  # use rank-based df from design
             else:
-                vcov = compute_survey_vcov(X, residuals, _effective_survey_design)
+                from diff_diff.survey import compute_survey_vcov
+
+                nan_mask = np.isnan(coefficients)
+                if np.any(nan_mask):
+                    kept_cols = np.where(~nan_mask)[0]
+                    if len(kept_cols) > 0:
+                        vcov_reduced = compute_survey_vcov(
+                            X[:, kept_cols], residuals, _effective_survey_design
+                        )
+                        vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols)
+                    else:
+                        vcov = np.full((X.shape[1], X.shape[1]), np.nan)
+                else:
+                    vcov = compute_survey_vcov(X, residuals, _effective_survey_design)
 
         # Store fitted attributes
         self.coefficients_ = coefficients
@@ -1750,10 +1875,14 @@ def fit(
         # This is needed for correct degrees of freedom in inference
         nan_mask = np.isnan(coefficients)
         self.n_params_effective_ = int(self.n_params_ - np.sum(nan_mask))
-        # For fweights, df uses sum(w) - k (effective sample size)
+        # Effective n for df: fweights use sum(w), pweight/aweight with
+        # zeros use positive-weight count (matches compute_robust_vcov)
         n_eff_df = self.n_obs_
-        if self.weights is not None and self.weight_type == "fweight":
-            n_eff_df = int(round(np.sum(self.weights)))
+        if self.weights is not None:
+            if self.weight_type == "fweight":
+                n_eff_df = int(round(np.sum(self.weights)))
+            elif np.any(self.weights == 0):
+                n_eff_df = int(np.count_nonzero(self.weights > 0))
         self.df_ = n_eff_df - self.n_params_effective_ - df_adjustment
 
         # Survey degrees of freedom: n_PSU - n_strata (overrides standard df)
@@ -1763,9 +1892,74 @@ def fit(
 
             if isinstance(_effective_survey_design, ResolvedSurveyDesign):
                 self.survey_df_ = _effective_survey_design.df_survey
+                # Override with effective replicate df if available
+                if hasattr(self, '_replicate_df') and self._replicate_df is not None:
+                    self.survey_df_ = self._replicate_df
 
         return self
 
+    def compute_deff(self, coefficient_names=None):
+        """Compute per-coefficient design effect diagnostics.
+
+        Compares the survey vcov to an SRS (HC1) baseline.  Must be called
+        after ``fit()`` with a survey design.
+
+        Returns
+        -------
+        DEFFDiagnostics
+        """
+        self._check_fitted()
+        if not (hasattr(self, 'survey_design') and self.survey_design is not None):
+            raise ValueError(
+                "compute_deff() requires a survey design. "
+                "Fit with survey_design= first."
+            )
+        from diff_diff.survey import compute_deff_diagnostics
+
+        # Handle rank-deficient fits: compute DEFF only on kept columns,
+        # then expand back with NaN for dropped columns
+        nan_mask = np.isnan(self.coefficients_)
+        if np.any(nan_mask):
+            kept = np.where(~nan_mask)[0]
+            if len(kept) == 0:
+                k = len(self.coefficients_)
+                nan_arr = np.full(k, np.nan)
+                from diff_diff.survey import DEFFDiagnostics
+                return DEFFDiagnostics(
+                    deff=nan_arr, effective_n=nan_arr.copy(),
+                    srs_se=nan_arr.copy(), survey_se=nan_arr.copy(),
+                    coefficient_names=coefficient_names,
+                )
+            # Compute on kept columns only
+            X_kept = self._X[:, kept]
+            vcov_kept = self.vcov_[np.ix_(kept, kept)]
+            deff_kept = compute_deff_diagnostics(
+                X_kept, self.residuals_, vcov_kept,
+                self.weights, weight_type=self.weight_type,
+            )
+            # Expand back to full size with NaN for dropped
+            k = len(self.coefficients_)
+            full_deff = np.full(k, np.nan)
+            full_eff_n = np.full(k, np.nan)
+            full_srs_se = np.full(k, np.nan)
+            full_survey_se = np.full(k, np.nan)
+            full_deff[kept] = deff_kept.deff
+            full_eff_n[kept] = deff_kept.effective_n
+            full_srs_se[kept] = deff_kept.srs_se
+            full_survey_se[kept] = deff_kept.survey_se
+            from diff_diff.survey import DEFFDiagnostics
+            return DEFFDiagnostics(
+                deff=full_deff, effective_n=full_eff_n,
+                srs_se=full_srs_se, survey_se=full_survey_se,
+                coefficient_names=coefficient_names,
+            )
+
+        return compute_deff_diagnostics(
+            self._X, self.residuals_, self.vcov_,
+            self.weights, weight_type=self.weight_type,
+            coefficient_names=coefficient_names,
+        )
+
     def _check_fitted(self) -> None:
         """Raise error if model has not been fitted."""
         if self.coefficients_ is None:
@@ -1859,11 +2053,25 @@ def get_inference(
             effective_df = df
         elif self.survey_df_ is not None:
             effective_df = self.survey_df_
+        elif (hasattr(self, 'survey_design') and self.survey_design is not None
+              and hasattr(self.survey_design, 'uses_replicate_variance')
+              and self.survey_design.uses_replicate_variance):
+            # Replicate design with undefined df (rank <= 1) — NaN inference
+            warnings.warn(
+                "Replicate design has undefined survey d.f. (rank <= 1). "
+                "Inference fields will be NaN.",
+                UserWarning, stacklevel=2,
+            )
+            effective_df = 0  # Forces NaN from t-distribution
         else:
             effective_df = self.df_
 
         # Warn if df is non-positive and fall back to normal distribution
-        if effective_df is not None and effective_df <= 0:
+        # (skip for replicate designs — df=0 is intentional for NaN inference)
+        _is_replicate = (hasattr(self, 'survey_design') and self.survey_design is not None
+                         and hasattr(self.survey_design, 'uses_replicate_variance')
+                         and self.survey_design.uses_replicate_variance)
+        if effective_df is not None and effective_df <= 0 and not _is_replicate:
             import warnings
 
             warnings.warn(
diff --git a/diff_diff/results.py b/diff_diff/results.py
index 1c62b8b7..bf833192 100644
--- a/diff_diff/results.py
+++ b/diff_diff/results.py
@@ -11,6 +11,41 @@
 import pandas as pd
 
 
+def _format_survey_block(sm, width: int) -> list:
+    """Format survey design metadata block for summary() output.
+
+    Parameters
+    ----------
+    sm : SurveyMetadata
+        Survey metadata from results object.
+    width : int
+        Total width for separator lines and centering.
+    """
+    label_width = 30 if width >= 80 else 25
+    lines = [
+        "",
+        "-" * width,
+        "Survey Design".center(width),
+        "-" * width,
+        f"{'Weight type:':<{label_width}} {sm.weight_type:>10}",
+    ]
+    if getattr(sm, "replicate_method", None) is not None:
+        lines.append(f"{'Replicate method:':<{label_width}} {sm.replicate_method:>10}")
+        if getattr(sm, "n_replicates", None) is not None:
+            lines.append(f"{'Replicates:':<{label_width}} {sm.n_replicates:>10}")
+    else:
+        if sm.n_strata is not None:
+            lines.append(f"{'Strata:':<{label_width}} {sm.n_strata:>10}")
+        if sm.n_psu is not None:
+            lines.append(f"{'PSU/Cluster:':<{label_width}} {sm.n_psu:>10}")
+    lines.append(f"{'Effective sample size:':<{label_width}} {sm.effective_n:>10.1f}")
+    lines.append(f"{'Kish DEFF (weights):':<{label_width}} {sm.design_effect:>10.2f}")
+    if sm.df_survey is not None:
+        lines.append(f"{'Survey d.f.:':<{label_width}} {sm.df_survey:>10}")
+    lines.append("-" * width)
+    return lines
+
+
 @dataclass
 class DiDResults:
     """
@@ -103,24 +138,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "",
-                    "-" * 70,
-                    "Survey Design".center(70),
-                    "-" * 70,
-                    f"{'Weight type:':<25} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<25} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}")
-            lines.append("-" * 70)
+            lines.extend(_format_survey_block(sm, 70))
 
         # Add inference method info
         if self.inference_method != "analytical":
@@ -405,24 +423,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "",
-                    "-" * 80,
-                    "Survey Design".center(80),
-                    "-" * 80,
-                    f"{'Weight type:':<25} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<25} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}")
-            lines.append("-" * 80)
+            lines.extend(_format_survey_block(sm, 80))
 
         # Pre-period effects (parallel trends test)
         pre_effects = {p: pe for p, pe in self.period_effects.items() if p in self.pre_periods}
@@ -740,24 +741,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "",
-                    "-" * 75,
-                    "Survey Design".center(75),
-                    "-" * 75,
-                    f"{'Weight type:':<25} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<25} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}")
-            lines.append("-" * 75)
+            lines.extend(_format_survey_block(sm, 75))
 
         lines.extend(
             [
diff --git a/diff_diff/stacked_did.py b/diff_diff/stacked_did.py
index 4ef6a6ee..d923c707 100644
--- a/diff_diff/stacked_did.py
+++ b/diff_diff/stacked_did.py
@@ -242,6 +242,15 @@ def fit(
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, "analytical")
         )
+        # Reject replicate-weight designs — StackedDiD uses
+        # compute_survey_vcov (TSL) directly without replicate dispatch.
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "StackedDiD does not yet support replicate-weight survey "
+                "designs. Use CallawaySantAnna for staggered adoption with "
+                "replicate weights, or use a TSL-based survey design "
+                "(strata/psu/fpc)."
+            )
 
         # Reject fweight and aweight — Q-weight composition is ratio-valued
         # and breaks both frequency-weight (integer) and analytic-weight
diff --git a/diff_diff/stacked_did_results.py b/diff_diff/stacked_did_results.py
index 631f3817..20bfb65a 100644
--- a/diff_diff/stacked_did_results.py
+++ b/diff_diff/stacked_did_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 __all__ = [
     "StackedDiDResults",
@@ -144,23 +144,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py
index 2c831668..718dc93b 100644
--- a/diff_diff/staggered.py
+++ b/diff_diff/staggered.py
@@ -379,6 +379,20 @@ def _collapse_survey_to_unit_level(resolved_survey, df, unit_col, all_units):
                 .values
             )
 
+        # Collapse replicate weights to unit level (same groupby pattern)
+        rep_weights_unit = None
+        if resolved_survey.replicate_weights is not None:
+            R = resolved_survey.replicate_weights.shape[1]
+            rep_weights_unit = np.zeros((n_units, R))
+            for r in range(R):
+                rep_weights_unit[:, r] = (
+                    pd.Series(resolved_survey.replicate_weights[:, r], index=df.index)
+                    .groupby(df[unit_col])
+                    .first()
+                    .reindex(all_units)
+                    .values
+                )
+
         return ResolvedSurveyDesign(
             weights=weights_unit.astype(np.float64),
             weight_type=resolved_survey.weight_type,
@@ -388,6 +402,15 @@ def _collapse_survey_to_unit_level(resolved_survey, df, unit_col, all_units):
             n_strata=resolved_survey.n_strata,
             n_psu=resolved_survey.n_psu,
             lonely_psu=resolved_survey.lonely_psu,
+            replicate_weights=rep_weights_unit,
+            replicate_method=resolved_survey.replicate_method,
+            fay_rho=resolved_survey.fay_rho,
+            n_replicates=resolved_survey.n_replicates,
+            replicate_strata=resolved_survey.replicate_strata,
+            combined_weights=resolved_survey.combined_weights,
+            replicate_scale=resolved_survey.replicate_scale,
+            replicate_rscales=resolved_survey.replicate_rscales,
+            mse=resolved_survey.mse,
         )
 
     def _precompute_structures(
@@ -585,6 +608,12 @@ def _compute_att_gt_fast(
         sw_treated = survey_w[treated_valid] if survey_w is not None else None
         sw_control = survey_w[control_valid] if survey_w is not None else None
 
+        # Guard against zero effective mass after subpopulation filtering
+        if sw_treated is not None and np.sum(sw_treated) <= 0:
+            return np.nan, np.nan, 0, 0, None, None
+        if sw_control is not None and np.sum(sw_control) <= 0:
+            return np.nan, np.nan, 0, 0, None, None
+
         # Get covariates if specified (from the base period)
         X_treated = None
         X_control = None
@@ -781,6 +810,9 @@ def _compute_all_att_gt_vectorized(
             if survey_w is not None:
                 sw_t = survey_w[treated_valid]
                 sw_c = survey_w[control_valid]
+                # Guard against zero effective mass
+                if np.sum(sw_t) <= 0 or np.sum(sw_c) <= 0:
+                    continue
                 sw_t_norm = sw_t / np.sum(sw_t)
                 sw_c_norm = sw_c / np.sum(sw_c)
                 mu_t = float(np.sum(sw_t_norm * treated_change))
@@ -842,6 +874,12 @@ def _compute_all_att_gt_vectorized(
         # Batch inference for all (g,t) pairs at once
         if task_keys:
             df_survey_val = precomputed.get("df_survey")
+            # Guard: replicate design with undefined df → NaN inference
+            if (df_survey_val is None
+                    and precomputed.get("resolved_survey_unit") is not None
+                    and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance')
+                    and precomputed["resolved_survey_unit"].uses_replicate_variance):
+                df_survey_val = 0
             t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch(
                 np.array(atts),
                 np.array(ses),
@@ -1176,8 +1214,16 @@ def _compute_all_att_gt_covariate_reg(
 
         # Batch inference
         if task_keys:
+            # Use survey df for replicate designs (propagated from precomputed)
+            _ipw_dr_df = precomputed.get("df_survey") if precomputed is not None else None
+            # Guard: replicate design with undefined df → NaN inference
+            if (_ipw_dr_df is None and precomputed is not None
+                    and precomputed.get("resolved_survey_unit") is not None
+                    and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance')
+                    and precomputed["resolved_survey_unit"].uses_replicate_variance):
+                _ipw_dr_df = 0
             t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch(
-                np.array(atts), np.array(ses), alpha=self.alpha
+                np.array(atts), np.array(ses), alpha=self.alpha, df=_ipw_dr_df
             )
             for idx, key in enumerate(task_keys):
                 group_time_effects[key]["t_stat"] = float(t_stats[idx])
@@ -1372,6 +1418,11 @@ def fit(
         # Survey df for safe_inference calls — use the unit-level resolved
         # survey df computed in _precompute_structures for consistency.
         df_survey = precomputed.get("df_survey")
+        # Guard: replicate design with undefined df (rank <= 1) → NaN inference
+        if (df_survey is None and resolved_survey is not None
+                and hasattr(resolved_survey, 'uses_replicate_variance')
+                and resolved_survey.uses_replicate_variance):
+            df_survey = 0
 
         # Compute ATT(g,t) for each group-time combination
         min_period = min(time_periods)
@@ -1465,6 +1516,17 @@ def fit(
         overall_att, overall_se = self._aggregate_simple(
             group_time_effects, influence_func_info, df, unit, precomputed
         )
+        # Re-read df_survey in case replicate aggregation updated it
+        df_survey = precomputed.get("df_survey")
+        # Propagate replicate df override to survey_metadata for display consistency
+        if df_survey is not None and survey_metadata is not None:
+            if survey_metadata.df_survey != df_survey:
+                survey_metadata.df_survey = df_survey
+        # Guard: replicate design with undefined df (rank <= 1) → NaN inference
+        if (df_survey is None and resolved_survey is not None
+                and hasattr(resolved_survey, 'uses_replicate_variance')
+                and resolved_survey.uses_replicate_variance):
+            df_survey = 0
         overall_t, overall_p, overall_ci = safe_inference(
             overall_att,
             overall_se,
@@ -1498,6 +1560,20 @@ def fit(
                 unit=unit,
             )
 
+        # Reject replicate-weight designs for bootstrap — replicate variance
+        # is an analytical alternative, not compatible with bootstrap
+        if (
+            self.n_bootstrap > 0
+            and resolved_survey is not None
+            and hasattr(resolved_survey, "uses_replicate_variance")
+            and resolved_survey.uses_replicate_variance
+        ):
+            raise NotImplementedError(
+                "CallawaySantAnna bootstrap (n_bootstrap > 0) is not supported "
+                "with replicate-weight survey designs. Replicate weights provide "
+                "analytical variance; use n_bootstrap=0 instead."
+            )
+
         # Run bootstrap inference if requested
         bootstrap_results = None
         if self.n_bootstrap > 0 and influence_func_info:
diff --git a/diff_diff/staggered_aggregation.py b/diff_diff/staggered_aggregation.py
index b0c18670..3b75bd8b 100644
--- a/diff_diff/staggered_aggregation.py
+++ b/diff_diff/staggered_aggregation.py
@@ -473,6 +473,17 @@ def _compute_aggregated_se_with_wif(
         resolved_survey = (
             precomputed.get("resolved_survey_unit") if precomputed is not None else None
         )
+        if resolved_survey is not None and hasattr(resolved_survey, "uses_replicate_variance") and resolved_survey.uses_replicate_variance:
+            from diff_diff.survey import compute_replicate_if_variance
+
+            variance, n_valid_rep = compute_replicate_if_variance(psi_total, resolved_survey)
+            # Update precomputed survey df to reflect valid replicate count
+            if precomputed is not None and n_valid_rep < resolved_survey.n_replicates:
+                precomputed["df_survey"] = n_valid_rep - 1 if n_valid_rep > 1 else None
+            if np.isnan(variance):
+                return np.nan
+            return np.sqrt(max(variance, 0.0))
+
         if resolved_survey is not None and (
             resolved_survey.strata is not None
             or resolved_survey.psu is not None
@@ -606,6 +617,12 @@ def _aggregate_event_study(
         if not agg_effects_list:
             return {}
         df_survey_val = precomputed.get("df_survey") if precomputed is not None else None
+        # Guard: replicate design with undefined df → NaN inference
+        if (df_survey_val is None and precomputed is not None
+                and precomputed.get("resolved_survey_unit") is not None
+                and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance')
+                and precomputed["resolved_survey_unit"].uses_replicate_variance):
+            df_survey_val = 0
         t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch(
             np.array(agg_effects_list),
             np.array(agg_ses_list),
@@ -699,6 +716,12 @@ def _aggregate_by_group(
         agg_effects = np.array([x[1] for x in group_data_list])
         agg_ses = np.array([x[2] for x in group_data_list])
         df_survey_val = precomputed.get("df_survey") if precomputed is not None else None
+        # Guard: replicate design with undefined df → NaN inference
+        if (df_survey_val is None and precomputed is not None
+                and precomputed.get("resolved_survey_unit") is not None
+                and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance')
+                and precomputed["resolved_survey_unit"].uses_replicate_variance):
+            df_survey_val = 0
         t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch(
             agg_effects,
             agg_ses,
diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py
index 7f47dc31..3fea9cc8 100644
--- a/diff_diff/staggered_results.py
+++ b/diff_diff/staggered_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 if TYPE_CHECKING:
     from diff_diff.staggered_bootstrap import CSBootstrapResults
@@ -165,23 +165,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py
index c1eada45..8ab8a916 100644
--- a/diff_diff/sun_abraham.py
+++ b/diff_diff/sun_abraham.py
@@ -18,7 +18,7 @@
 
 from diff_diff.bootstrap_utils import compute_effect_bootstrap_stats
 from diff_diff.linalg import LinearRegression
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 from diff_diff.utils import (
     safe_inference,
 )
@@ -131,23 +131,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
@@ -517,6 +501,18 @@ def fit(
         if resolved_survey is not None:
             _validate_unit_constant_survey(data, unit, survey_design)
 
+        # Reject replicate-weight designs — SunAbraham's weighted
+        # within-transformation bakes survey weights into X and y, so
+        # replicate refits on the already-transformed design are incorrect.
+        # Full estimator-level replicate refits are not yet implemented.
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "SunAbraham does not yet support replicate-weight survey designs. "
+                "The weighted within-transformation must be recomputed for each "
+                "replicate, which requires estimator-level replicate refits. "
+                "Use a TSL-based survey design (strata/psu/fpc) instead."
+            )
+
         # Bootstrap + survey supported via Rao-Wu rescaled bootstrap.
         # Determine Rao-Wu eligibility from the *original* survey_design
         # (before cluster-as-PSU injection which adds PSU to weights-only designs).
diff --git a/diff_diff/survey.py b/diff_diff/survey.py
index 76d21b23..a230d867 100644
--- a/diff_diff/survey.py
+++ b/diff_diff/survey.py
@@ -16,7 +16,7 @@
 
 import warnings
 from dataclasses import dataclass, field, replace
-from typing import Optional, Tuple
+from typing import List, Optional, Tuple
 
 import numpy as np
 import pandas as pd
@@ -62,6 +62,14 @@ class SurveyDesign:
     weight_type: str = "pweight"
     nest: bool = False
     lonely_psu: str = "remove"
+    replicate_weights: Optional[List[str]] = None
+    replicate_method: Optional[str] = None
+    fay_rho: float = 0.0
+    replicate_strata: Optional[List[int]] = None
+    combined_weights: bool = True
+    replicate_scale: Optional[float] = None
+    replicate_rscales: Optional[List[float]] = None
+    mse: bool = False
 
     def __post_init__(self):
         valid_weight_types = {"pweight", "fweight", "aweight"}
@@ -74,6 +82,73 @@ def __post_init__(self):
             raise ValueError(
                 f"lonely_psu must be one of {valid_lonely}, " f"got '{self.lonely_psu}'"
             )
+        # Replicate weight validation
+        valid_rep_methods = {"BRR", "Fay", "JK1", "JKn"}
+        if self.replicate_method is not None:
+            if self.replicate_method not in valid_rep_methods:
+                raise ValueError(
+                    f"replicate_method must be one of {valid_rep_methods}, "
+                    f"got '{self.replicate_method}'"
+                )
+            if self.replicate_weights is None:
+                raise ValueError(
+                    "replicate_weights must be provided when replicate_method is set"
+                )
+        if self.replicate_weights is not None and self.replicate_method is None:
+            raise ValueError(
+                "replicate_method must be provided when replicate_weights is set"
+            )
+        if self.replicate_method == "Fay":
+            if not (0 < self.fay_rho < 1):
+                raise ValueError(
+                    f"fay_rho must be in (0, 1) for Fay's method, got {self.fay_rho}"
+                )
+        elif self.replicate_method is not None and self.fay_rho != 0.0:
+            raise ValueError(
+                f"fay_rho must be 0 for method '{self.replicate_method}', "
+                f"got {self.fay_rho}"
+            )
+        # Replicate weights are mutually exclusive with strata/psu/fpc
+        if self.replicate_weights is not None:
+            if self.strata is not None or self.psu is not None or self.fpc is not None:
+                raise ValueError(
+                    "replicate_weights cannot be combined with strata/psu/fpc. "
+                    "Replicate weights encode the design structure implicitly."
+                )
+            if self.weights is None:
+                raise ValueError(
+                    "Full-sample weights must be provided alongside replicate_weights"
+                )
+        # JKn requires replicate_strata
+        if self.replicate_method == "JKn":
+            if self.replicate_strata is None:
+                raise ValueError(
+                    "replicate_strata is required for JKn method. Provide a list "
+                    "of stratum assignments (one per replicate weight column)."
+                )
+            if len(self.replicate_strata) != len(self.replicate_weights):
+                raise ValueError(
+                    f"replicate_strata length ({len(self.replicate_strata)}) must "
+                    f"match replicate_weights length ({len(self.replicate_weights)})"
+                )
+        # Validate scale/rscales values and length
+        if self.replicate_scale is not None:
+            if not (np.isfinite(self.replicate_scale) and self.replicate_scale > 0):
+                raise ValueError(
+                    f"replicate_scale must be a positive finite number, "
+                    f"got {self.replicate_scale}"
+                )
+        if self.replicate_rscales is not None and self.replicate_weights is not None:
+            if len(self.replicate_rscales) != len(self.replicate_weights):
+                raise ValueError(
+                    f"replicate_rscales length ({len(self.replicate_rscales)}) must "
+                    f"match replicate_weights length ({len(self.replicate_weights)})"
+                )
+            rscales_arr = np.asarray(self.replicate_rscales, dtype=float)
+            if not np.all(np.isfinite(rscales_arr)):
+                raise ValueError("replicate_rscales must be finite")
+            if np.any(rscales_arr < 0):
+                raise ValueError("replicate_rscales must be non-negative")
 
     def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign":
         """
@@ -102,26 +177,109 @@ def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign":
                 raise ValueError("Weights contain NaN values")
             if np.any(~np.isfinite(raw_weights)):
                 raise ValueError("Weights contain Inf values")
-            if np.any(raw_weights <= 0):
-                raise ValueError("Weights must be strictly positive")
+            if np.any(raw_weights < 0):
+                raise ValueError("Weights must be non-negative")
+            if np.any(raw_weights == 0) and np.all(raw_weights == 0):
+                raise ValueError(
+                    "All weights are zero. At least one observation must "
+                    "have a positive weight."
+                )
 
-            # fweight validation: must be positive integers
+            # fweight validation: must be non-negative integers
             if self.weight_type == "fweight":
-                fractional = raw_weights - np.round(raw_weights)
-                if np.any(np.abs(fractional) > 1e-10):
-                    raise ValueError(
-                        "Frequency weights (fweight) must be positive integers. "
-                        "Fractional values detected. Use pweight for non-integer weights."
-                    )
+                pos_mask = raw_weights > 0
+                if np.any(pos_mask):
+                    fractional = raw_weights[pos_mask] - np.round(raw_weights[pos_mask])
+                    if np.any(np.abs(fractional) > 1e-10):
+                        raise ValueError(
+                            "Frequency weights (fweight) must be non-negative integers. "
+                            "Fractional values detected. Use pweight for non-integer weights."
+                        )
 
             # Normalize: pweights/aweights to sum=n (mean=1); fweights unchanged
-            if self.weight_type in ("pweight", "aweight"):
+            # Skip normalization for replicate designs — the IF path uses
+            # w_r / w_full ratios that must be on the same raw scale
+            if self.replicate_weights is not None:
+                weights = raw_weights.copy()
+            elif self.weight_type in ("pweight", "aweight"):
                 weights = raw_weights * (n / np.sum(raw_weights))
             else:
                 weights = raw_weights.copy()
         else:
             weights = np.ones(n, dtype=np.float64)
 
+        # --- Replicate weights (short-circuit strata/psu/fpc) ---
+        if self.replicate_weights is not None:
+            rep_cols = self.replicate_weights
+            for col in rep_cols:
+                if col not in data.columns:
+                    raise ValueError(
+                        f"Replicate weight column '{col}' not found in data"
+                    )
+            rep_arr = np.column_stack(
+                [data[col].values.astype(np.float64) for col in rep_cols]
+            )
+            if np.any(np.isnan(rep_arr)):
+                raise ValueError("Replicate weights contain NaN values")
+            if np.any(~np.isfinite(rep_arr)):
+                raise ValueError("Replicate weights contain Inf values")
+            if np.any(rep_arr < 0):
+                raise ValueError("Replicate weights must be non-negative")
+            # Validate combined_weights contract: when True, replicate columns
+            # include the full-sample weight, so w_r > 0 with w_full == 0 is
+            # malformed (observation excluded from full sample but included in
+            # a replicate).
+            combined = (
+                self.combined_weights
+                if self.combined_weights is not None
+                else True
+            )
+            if combined:
+                zero_full = weights == 0
+                if np.any(zero_full):
+                    rep_positive_on_zero = np.any(rep_arr[zero_full] > 0, axis=1)
+                    if np.any(rep_positive_on_zero):
+                        raise ValueError(
+                            "Malformed combined_weights=True design: some "
+                            "replicate columns have positive weight where "
+                            "full-sample weight is zero. Either fix the "
+                            "replicate columns or use combined_weights=False."
+                        )
+            # Do NOT normalize replicate columns — the IF path uses w_r/w_full
+            # ratios that must reflect the true replicate design, not rescaled sums
+            n_rep = rep_arr.shape[1]
+            if n_rep < 2:
+                raise ValueError(
+                    "At least 2 replicate weight columns are required"
+                )
+            return ResolvedSurveyDesign(
+                weights=weights,
+                weight_type=self.weight_type,
+                strata=None,
+                psu=None,
+                fpc=None,
+                n_strata=0,
+                n_psu=0,
+                lonely_psu=self.lonely_psu,
+                replicate_weights=rep_arr,
+                replicate_method=self.replicate_method,
+                fay_rho=self.fay_rho,
+                n_replicates=n_rep,
+                replicate_strata=(
+                    np.asarray(self.replicate_strata, dtype=int)
+                    if self.replicate_strata is not None
+                    else None
+                ),
+                combined_weights=self.combined_weights,
+                replicate_scale=self.replicate_scale,
+                replicate_rscales=(
+                    np.asarray(self.replicate_rscales, dtype=float)
+                    if self.replicate_rscales is not None
+                    else None
+                ),
+                mse=self.mse,
+            )
+
         # --- Strata ---
         strata_arr = None
         n_strata = 0
@@ -273,6 +431,144 @@ def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign":
             lonely_psu=self.lonely_psu,
         )
 
+    def subpopulation(
+        self,
+        data: pd.DataFrame,
+        mask,
+    ) -> Tuple["SurveyDesign", pd.DataFrame]:
+        """Create a subpopulation design by zeroing out excluded observations.
+
+        Preserves the full survey design structure (strata, PSU) while setting
+        weights to zero for observations outside the subpopulation.  This is
+        the correct approach for subpopulation analysis — unlike naive
+        subsetting, it retains design information for variance estimation.
+
+        Parameters
+        ----------
+        mask : array-like of bool, str, or callable
+            Defines the subpopulation:
+            - bool array/Series of length ``len(data)`` — True = included
+            - str — column name in ``data`` containing boolean values
+            - callable — applied to ``data``, must return bool array
+
+        Returns
+        -------
+        (SurveyDesign, pd.DataFrame)
+            A new SurveyDesign pointing to a ``_subpop_weight`` column in the
+            returned DataFrame copy.  The pair should be used together: pass
+            the returned DataFrame to ``fit()`` with the returned SurveyDesign.
+        """
+        # Resolve mask to boolean array
+        if callable(mask):
+            raw_mask = np.asarray(mask(data))
+        elif isinstance(mask, str):
+            if mask not in data.columns:
+                raise ValueError(f"Mask column '{mask}' not found in data")
+            raw_mask = np.asarray(data[mask].values)
+        else:
+            raw_mask = np.asarray(mask)
+
+        # Validate: reject pd.NA/pd.NaT/None before bool coercion
+        try:
+            if pd.isna(raw_mask).any():
+                raise ValueError(
+                    "Subpopulation mask contains NA/missing values. "
+                    "Provide a boolean mask with no missing values."
+                )
+        except (TypeError, ValueError) as e:
+            if "NA/missing" in str(e):
+                raise
+            # pd.isna can't handle some dtypes — fall through to specific checks
+        if raw_mask.dtype.kind == 'f' and np.any(np.isnan(raw_mask)):
+            raise ValueError(
+                "Subpopulation mask contains NaN values. "
+                "Provide a boolean mask with no missing values."
+            )
+        if hasattr(raw_mask, 'dtype') and raw_mask.dtype == object:
+            # Check for None values (pd.NA, None, etc.)
+            if any(v is None for v in raw_mask):
+                raise ValueError(
+                    "Subpopulation mask contains None/NA values. "
+                    "Provide a boolean mask with no missing values."
+                )
+            # Reject string/object masks — non-empty strings coerce to True
+            # which silently defines the wrong domain
+            if any(isinstance(v, str) for v in raw_mask):
+                raise ValueError(
+                    "Subpopulation mask has object dtype with string values. "
+                    "Provide a boolean or numeric (0/1) mask, not strings."
+                )
+        if hasattr(raw_mask, 'dtype') and raw_mask.dtype.kind in ('U', 'S'):
+            raise ValueError(
+                "Subpopulation mask contains string values. "
+                "Provide a boolean or numeric (0/1) mask."
+            )
+        # Validate numeric masks: only {0, 1} allowed (not {1, 2}, etc.)
+        if hasattr(raw_mask, 'dtype') and raw_mask.dtype.kind in ('i', 'u', 'f'):
+            unique_vals = set(np.unique(raw_mask[np.isfinite(raw_mask)]).tolist())
+            if not unique_vals.issubset({0, 1, 0.0, 1.0, True, False}):
+                raise ValueError(
+                    f"Subpopulation mask contains non-binary numeric values "
+                    f"{unique_vals - {0, 1, 0.0, 1.0}}. "
+                    f"Provide a boolean or numeric (0/1) mask."
+                )
+        mask_arr = raw_mask.astype(bool)
+
+        if len(mask_arr) != len(data):
+            raise ValueError(
+                f"Mask length ({len(mask_arr)}) does not match data "
+                f"length ({len(data)})"
+            )
+
+        if not np.any(mask_arr):
+            raise ValueError(
+                "Subpopulation mask excludes all observations. "
+                "At least one observation must be included."
+            )
+
+        # Build subpopulation weights
+        if self.weights is not None:
+            if self.weights not in data.columns:
+                raise ValueError(
+                    f"Weight column '{self.weights}' not found in data"
+                )
+            base_weights = data[self.weights].values.astype(np.float64)
+        else:
+            base_weights = np.ones(len(data), dtype=np.float64)
+
+        subpop_weights = np.where(mask_arr, base_weights, 0.0)
+
+        # Create data copy with synthetic weight column
+        data_out = data.copy()
+        data_out["_subpop_weight"] = subpop_weights
+
+        # Zero out replicate weight columns for excluded observations
+        if self.replicate_weights is not None:
+            for col in self.replicate_weights:
+                if col in data.columns:
+                    data_out[col] = np.where(mask_arr, data[col].values, 0.0)
+
+        # Return new SurveyDesign using the synthetic column
+        new_design = SurveyDesign(
+            weights="_subpop_weight",
+            strata=self.strata,
+            psu=self.psu,
+            fpc=self.fpc,
+            weight_type=self.weight_type,
+            nest=self.nest,
+            lonely_psu=self.lonely_psu,
+            replicate_weights=self.replicate_weights,
+            replicate_method=self.replicate_method,
+            fay_rho=self.fay_rho,
+            replicate_strata=self.replicate_strata,
+            combined_weights=self.combined_weights,
+            replicate_scale=self.replicate_scale,
+            replicate_rscales=self.replicate_rscales,
+            mse=self.mse,
+        )
+
+        return new_design, data_out
+
 
 @dataclass
 class ResolvedSurveyDesign:
@@ -290,10 +586,50 @@ class ResolvedSurveyDesign:
     n_strata: int
     n_psu: int
     lonely_psu: str
+    replicate_weights: Optional[np.ndarray] = None  # (n, R) array
+    replicate_method: Optional[str] = None
+    fay_rho: float = 0.0
+    n_replicates: int = 0
+    replicate_strata: Optional[np.ndarray] = None  # (R,) for JKn
+    combined_weights: bool = True
+    replicate_scale: Optional[float] = None
+    replicate_rscales: Optional[np.ndarray] = None  # (R,) per-replicate scales
+    mse: bool = False
+
+    @property
+    def uses_replicate_variance(self) -> bool:
+        """Whether replicate-based variance should be used instead of TSL."""
+        return self.replicate_method is not None
 
     @property
     def df_survey(self) -> Optional[int]:
-        """Survey degrees of freedom: n_PSU - n_strata."""
+        """Survey degrees of freedom.
+
+        For replicate designs: QR-rank of the analysis-weight matrix minus 1,
+        matching R's ``survey::degf()`` which uses ``qr(..., tol=1e-5)$rank``.
+        Returns ``None`` when rank <= 1 (insufficient for t-based inference).
+        For TSL: n_PSU - n_strata.
+        """
+        if self.uses_replicate_variance:
+            if self.replicate_weights is None or self.n_replicates < 2:
+                return None
+            # QR-rank of analysis-weight matrix, matching R's survey::degf()
+            # which uses qr(weights(design, "analysis"), tol=1e-5)$rank.
+            # For combined_weights=True, replicate cols ARE analysis weights.
+            # For combined_weights=False, analysis weights = rep * full-sample.
+            if self.combined_weights:
+                analysis_weights = self.replicate_weights
+            else:
+                analysis_weights = self.replicate_weights * self.weights[:, np.newaxis]
+            # Pivoted QR with R-compatible tolerance, matching R's
+            # qr(..., tol=1e-5) which uses column pivoting (LAPACK dgeqp3)
+            from scipy.linalg import qr as scipy_qr
+            _, R_mat, _ = scipy_qr(analysis_weights, pivoting=True, mode='economic')
+            diag_abs = np.abs(np.diag(R_mat))
+            tol = 1e-5
+            rank = int(np.sum(diag_abs > tol * diag_abs.max())) if diag_abs.max() > 0 else 0
+            df = rank - 1
+            return df if df > 0 else None
         if self.psu is not None and self.n_psu > 0:
             if self.strata is not None and self.n_strata > 0:
                 return self.n_psu - self.n_strata
@@ -304,6 +640,52 @@ def df_survey(self) -> Optional[int]:
             return n_obs - self.n_strata
         return n_obs - 1
 
+    def subset_to_units(
+        self,
+        row_idx: np.ndarray,
+        weights: np.ndarray,
+        strata: Optional[np.ndarray],
+        psu: Optional[np.ndarray],
+        fpc: Optional[np.ndarray],
+        n_strata: int,
+        n_psu: int,
+    ) -> "ResolvedSurveyDesign":
+        """Create a unit-level copy preserving replicate metadata.
+
+        Used by panel estimators (ContinuousDiD, EfficientDiD) that collapse
+        panel-level survey info to one row per unit.
+
+        Parameters
+        ----------
+        row_idx : np.ndarray
+            Indices into the panel-level arrays to select one row per unit.
+        weights, strata, psu, fpc, n_strata, n_psu
+            Already-subsetted TSL fields (computed by the caller).
+        """
+        rep_weights_sub = None
+        if self.replicate_weights is not None:
+            rep_weights_sub = self.replicate_weights[row_idx, :]
+
+        return ResolvedSurveyDesign(
+            weights=weights,
+            weight_type=self.weight_type,
+            strata=strata,
+            psu=psu,
+            fpc=fpc,
+            n_strata=n_strata,
+            n_psu=n_psu,
+            lonely_psu=self.lonely_psu,
+            replicate_weights=rep_weights_sub,
+            replicate_method=self.replicate_method,
+            fay_rho=self.fay_rho,
+            n_replicates=self.n_replicates,
+            replicate_strata=self.replicate_strata,
+            combined_weights=self.combined_weights,
+            replicate_scale=self.replicate_scale,
+            replicate_rscales=self.replicate_rscales,
+            mse=self.mse,
+        )
+
     @property
     def needs_survey_vcov(self) -> bool:
         """Whether survey vcov (not generic sandwich) should be used."""
@@ -343,6 +725,9 @@ class SurveyMetadata:
     n_psu: Optional[int] = None
     weight_range: Tuple[float, float] = field(default=(0.0, 0.0))
     df_survey: Optional[int] = None
+    replicate_method: Optional[str] = None
+    n_replicates: Optional[int] = None
+    deff_diagnostics: Optional["DEFFDiagnostics"] = None
 
 
 def compute_survey_metadata(
@@ -371,13 +756,21 @@ def compute_survey_metadata(
     design_effect = n * sum_w2 / (sum_w**2) if sum_w > 0 else 1.0
 
     n_strata = resolved.n_strata if resolved.strata is not None else None
-    if resolved.psu is not None:
+    if resolved.uses_replicate_variance:
+        # Replicate designs don't have meaningful PSU/strata counts
+        n_psu = None
+        n_strata = None
+    elif resolved.psu is not None:
         n_psu = resolved.n_psu
     else:
         # Implicit PSU: each observation is its own PSU
         n_psu = len(resolved.weights)
     df_survey = resolved.df_survey
 
+    # Replicate info
+    rep_method = resolved.replicate_method if resolved.uses_replicate_variance else None
+    n_rep = resolved.n_replicates if resolved.uses_replicate_variance else None
+
     return SurveyMetadata(
         weight_type=resolved.weight_type,
         effective_n=effective_n,
@@ -387,6 +780,101 @@ def compute_survey_metadata(
         n_psu=n_psu,
         weight_range=(float(np.min(raw_weights)), float(np.max(raw_weights))),
         df_survey=df_survey,
+        replicate_method=rep_method,
+        n_replicates=n_rep,
+    )
+
+
+@dataclass
+class DEFFDiagnostics:
+    """Per-coefficient design effect diagnostics.
+
+    Compares survey-design variance to simple random sampling (SRS)
+    variance for each coefficient, giving the variance inflation factor
+    due to the survey design (clustering, stratification, weighting).
+
+    Attributes
+    ----------
+    deff : np.ndarray
+        Per-coefficient DEFF: survey_var / srs_var. Shape (k,).
+    effective_n : np.ndarray
+        Effective sample size per coefficient: n / DEFF. Shape (k,).
+    srs_se : np.ndarray
+        SRS (HC1) standard errors. Shape (k,).
+    survey_se : np.ndarray
+        Survey standard errors. Shape (k,).
+    coefficient_names : list of str or None
+        Names for display.
+    """
+
+    deff: np.ndarray
+    effective_n: np.ndarray
+    srs_se: np.ndarray
+    survey_se: np.ndarray
+    coefficient_names: Optional[List[str]] = None
+
+
+def compute_deff_diagnostics(
+    X: np.ndarray,
+    residuals: np.ndarray,
+    survey_vcov: np.ndarray,
+    weights: np.ndarray,
+    weight_type: str = "pweight",
+    coefficient_names: Optional[List[str]] = None,
+) -> DEFFDiagnostics:
+    """Compute per-coefficient design effects.
+
+    Compares the survey variance-covariance matrix to a simple random
+    sampling (SRS) baseline (HC1 sandwich, ignoring strata/PSU/FPC).
+
+    Parameters
+    ----------
+    X : np.ndarray
+        Design matrix of shape (n, k).
+    residuals : np.ndarray
+        Residuals from the WLS fit, shape (n,).
+    survey_vcov : np.ndarray
+        Survey variance-covariance matrix, shape (k, k).
+    weights : np.ndarray
+        Observation weights (normalized), shape (n,).
+    weight_type : str, default "pweight"
+        Weight type for SRS computation.
+    coefficient_names : list of str, optional
+        Names for display.
+
+    Returns
+    -------
+    DEFFDiagnostics
+    """
+    from diff_diff.linalg import compute_robust_vcov
+
+    n = X.shape[0]
+    # Use positive-weight count for effective n (zero-weight rows from
+    # subpopulation don't contribute to the effective sample)
+    n_eff = int(np.count_nonzero(weights > 0)) if np.any(weights == 0) else n
+
+    # SRS baseline: HC1 weighted sandwich ignoring design structure
+    srs_vcov = compute_robust_vcov(
+        X, residuals, cluster_ids=None, weights=weights, weight_type=weight_type,
+    )
+
+    survey_var = np.diag(survey_vcov)
+    srs_var = np.diag(srs_vcov)
+
+    # DEFF = survey_var / srs_var
+    with np.errstate(divide="ignore", invalid="ignore"):
+        deff = np.where(srs_var > 0, survey_var / srs_var, np.nan)
+        eff_n = np.where(deff > 0, n_eff / deff, np.nan)
+
+    survey_se = np.sqrt(np.maximum(survey_var, 0.0))
+    srs_se = np.sqrt(np.maximum(srs_var, 0.0))
+
+    return DEFFDiagnostics(
+        deff=deff,
+        effective_n=eff_n,
+        srs_se=srs_se,
+        survey_se=survey_se,
+        coefficient_names=coefficient_names,
     )
 
 
@@ -417,6 +905,9 @@ def _validate_unit_constant_survey(data, unit_col, survey_design):
         survey_design.psu,
         survey_design.fpc,
     ]
+    # Also validate replicate weight columns for within-unit constancy
+    if survey_design.replicate_weights is not None:
+        cols_to_check.extend(survey_design.replicate_weights)
     for col in cols_to_check:
         if col is not None and col in data.columns:
             n_unique = data.groupby(unit_col)[col].nunique()
@@ -760,6 +1251,9 @@ def compute_survey_vcov(
     # Compute weighted scores per observation: w_i * X_i * u_i
     if resolved.weight_type == "aweight":
         scores = X * residuals[:, np.newaxis]
+        # Zero-weight observations should not contribute to aweight meat
+        if np.any(weights == 0):
+            scores[weights == 0] = 0.0
     else:
         scores = X * (weights * residuals)[:, np.newaxis]
 
@@ -833,6 +1327,292 @@ def compute_survey_if_variance(
     return meat_scalar
 
 
+def _replicate_variance_factor(
+    method: str, n_replicates: int, fay_rho: float,
+) -> float:
+    """Compute the scalar variance factor for replicate methods."""
+    if method == "BRR":
+        return 1.0 / n_replicates
+    elif method == "Fay":
+        return 1.0 / (n_replicates * (1.0 - fay_rho) ** 2)
+    elif method == "JK1":
+        return (n_replicates - 1.0) / n_replicates
+    # JKn handled separately (per-stratum factors)
+    raise ValueError(f"Unknown replicate method: {method}")
+
+
+def compute_replicate_vcov(
+    X: np.ndarray,
+    y: np.ndarray,
+    full_sample_coef: np.ndarray,
+    resolved: "ResolvedSurveyDesign",
+    weight_type: str = "pweight",
+) -> np.ndarray:
+    """Compute replicate-weight variance-covariance matrix.
+
+    Re-runs WLS for each replicate weight column and computes variance
+    from the distribution of replicate coefficient vectors.
+
+    Parameters
+    ----------
+    X : np.ndarray
+        Design matrix of shape (n, k).
+    y : np.ndarray
+        Response vector of shape (n,).
+    full_sample_coef : np.ndarray
+        Coefficients from the full-sample fit, shape (k,).
+    resolved : ResolvedSurveyDesign
+        Must have ``uses_replicate_variance == True``.
+    weight_type : str, default "pweight"
+        Weight type for per-replicate WLS.
+
+    Returns
+    -------
+    np.ndarray
+        Variance-covariance matrix of shape (k, k).
+    """
+    from diff_diff.linalg import solve_ols
+
+    rep_weights = resolved.replicate_weights
+    method = resolved.replicate_method
+    R = resolved.n_replicates
+    k = X.shape[1]
+
+    # Collect replicate coefficient vectors
+    coef_reps = np.full((R, k), np.nan)
+    for r in range(R):
+        w_r = rep_weights[:, r]
+        # For non-combined weights, multiply by full-sample weights
+        if not resolved.combined_weights:
+            w_r = w_r * resolved.weights
+        # Skip replicates where all weights are zero
+        if np.sum(w_r) == 0:
+            continue
+        try:
+            coef_r, _, _ = solve_ols(
+                X, y,
+                weights=w_r,
+                weight_type=weight_type,
+                rank_deficient_action="silent",
+                return_vcov=False,
+                check_finite=False,
+            )
+            coef_reps[r] = coef_r
+        except (np.linalg.LinAlgError, ValueError):
+            pass  # NaN row for singular/degenerate replicate solve
+
+    # Remove replicates with NaN coefficients
+    valid = np.all(np.isfinite(coef_reps), axis=1)
+    n_invalid = int(R - np.sum(valid))
+    if n_invalid > 0:
+        warnings.warn(
+            f"{n_invalid} of {R} replicate solves failed (singular or degenerate). "
+            f"Variance computed from {int(np.sum(valid))} valid replicates.",
+            UserWarning,
+            stacklevel=2,
+        )
+    n_valid = int(np.sum(valid))
+    if n_valid < 2:
+        if n_valid == 0:
+            warnings.warn(
+                "All replicate solves failed. Returning NaN variance.",
+                UserWarning, stacklevel=2,
+            )
+        else:
+            warnings.warn(
+                f"Only {n_valid} valid replicate(s) — variance is not estimable "
+                f"with fewer than 2. Returning NaN.",
+                UserWarning, stacklevel=2,
+            )
+        return np.full((k, k), np.nan), n_valid
+    coef_valid = coef_reps[valid]
+    c = full_sample_coef
+
+    # Compute variance by method
+    # Support mse=False: center on replicate mean instead of full-sample estimate
+    # When rscales present and mse=False, center only over rscales > 0
+    # (R's svrVar convention — zero-scaled replicates should not shift center)
+    if resolved.mse:
+        center = c
+    else:
+        if resolved.replicate_rscales is not None:
+            pos_scale = resolved.replicate_rscales[valid] > 0
+            if np.any(pos_scale):
+                center = np.mean(coef_valid[pos_scale], axis=0)
+            else:
+                center = np.mean(coef_valid, axis=0)
+        else:
+            center = np.mean(coef_valid, axis=0)
+    diffs = coef_valid - center[np.newaxis, :]
+
+    outer_sum = diffs.T @ diffs  # (k, k)
+
+    # BRR/Fay: use fixed scaling, ignore user-supplied scale/rscales (R convention)
+    if method in ("BRR", "Fay"):
+        if resolved.replicate_scale is not None or resolved.replicate_rscales is not None:
+            warnings.warn(
+                f"Custom replicate_scale/replicate_rscales ignored for {method} "
+                f"(BRR/Fay use fixed scaling).",
+                UserWarning, stacklevel=2,
+            )
+        factor = _replicate_variance_factor(method, R, resolved.fay_rho)
+        return factor * outer_sum, n_valid
+
+    # JK1/JKn: apply scale * rscales multiplicatively (R's svrVar contract)
+    scale = resolved.replicate_scale if resolved.replicate_scale is not None else 1.0
+
+    if resolved.replicate_rscales is not None:
+        valid_rscales = resolved.replicate_rscales[valid]
+        V = np.zeros((k, k))
+        for i in range(len(diffs)):
+            V += valid_rscales[i] * np.outer(diffs[i], diffs[i])
+        return scale * V, n_valid
+
+    if method == "JK1":
+        factor = _replicate_variance_factor(method, R, resolved.fay_rho)
+        return scale * factor * outer_sum, n_valid
+    elif method == "JKn":
+        # JKn: V = sum_h ((n_h-1)/n_h) * sum_{r in h} (c_r - c)(c_r - c)^T
+        rep_strata = resolved.replicate_strata
+        if rep_strata is None:
+            raise ValueError("JKn requires replicate_strata")
+        valid_strata = rep_strata[valid]
+        V = np.zeros((k, k))
+        for h in np.unique(rep_strata):
+            n_h_original = int(np.sum(rep_strata == h))
+            mask_h = valid_strata == h
+            if not np.any(mask_h):
+                continue
+            diffs_h = diffs[mask_h]
+            V += ((n_h_original - 1.0) / n_h_original) * (diffs_h.T @ diffs_h)
+        return scale * V, n_valid
+    else:
+        raise ValueError(f"Unknown replicate method: {method}")
+
+
+def compute_replicate_if_variance(
+    psi: np.ndarray,
+    resolved: "ResolvedSurveyDesign",
+) -> Tuple[float, int]:
+    """Compute replicate-based variance for influence-function estimators.
+
+    Instead of re-running the full estimator, reweights the influence
+    function under each replicate weight set.
+
+    Parameters
+    ----------
+    psi : np.ndarray
+        Per-unit influence function values, shape (n,).
+    resolved : ResolvedSurveyDesign
+        Must have ``uses_replicate_variance == True``.
+
+    Returns
+    -------
+    float
+        Replicate-based variance estimate.
+    """
+    psi = np.asarray(psi, dtype=np.float64).ravel()
+    rep_weights = resolved.replicate_weights
+    method = resolved.replicate_method
+    R = resolved.n_replicates
+
+    # Match the contract of compute_survey_if_variance(): psi is accepted
+    # as-is (the combined IF/WIF object), with NO extra weight multiplication.
+    # Replicate contrasts are formed by rescaling each unit's contribution
+    # by the ratio w_r/w_full (Rao-Wu reweighting).
+    full_weights = resolved.weights
+    theta_full = float(np.sum(psi))
+
+    # Validate: combined_weights=True requires w_full > 0 wherever w_r > 0
+    if resolved.combined_weights:
+        for r in range(R):
+            bad = (rep_weights[:, r] > 0) & (full_weights <= 0)
+            if np.any(bad):
+                raise ValueError(
+                    f"Replicate column {r} has positive weight where full-sample "
+                    f"weight is zero. With combined_weights=True, every "
+                    f"replicate-positive observation must have a positive "
+                    f"full-sample weight."
+                )
+
+    # Compute replicate estimates via weight-ratio rescaling
+    theta_reps = np.full(R, np.nan)
+    for r in range(R):
+        w_r = rep_weights[:, r]
+        if np.any(w_r > 0):
+            if resolved.combined_weights:
+                # Combined: w_r already includes full-sample weight
+                ratio = np.divide(
+                    w_r, full_weights,
+                    out=np.zeros_like(w_r, dtype=np.float64),
+                    where=full_weights > 0,
+                )
+            else:
+                # Non-combined: w_r is perturbation factor directly
+                ratio = w_r
+            theta_reps[r] = np.sum(ratio * psi)
+
+    valid = np.isfinite(theta_reps)
+    n_valid = int(np.sum(valid))
+    if n_valid < 2:
+        return np.nan, n_valid
+
+    # Support mse=False: center on replicate mean
+    # When rscales present and mse=False, center only over rscales > 0
+    # (R's svrVar convention — zero-scaled replicates should not shift center)
+    if resolved.mse:
+        center = theta_full
+    else:
+        if resolved.replicate_rscales is not None:
+            pos_scale = resolved.replicate_rscales[valid] > 0
+            if np.any(pos_scale):
+                center = float(np.mean(theta_reps[valid][pos_scale]))
+            else:
+                center = float(np.mean(theta_reps[valid]))
+        else:
+            center = float(np.mean(theta_reps[valid]))
+    diffs = theta_reps[valid] - center
+
+    ss = float(np.sum(diffs**2))
+
+    # BRR/Fay: use fixed scaling, ignore user-supplied scale/rscales (R convention)
+    if method in ("BRR", "Fay"):
+        if resolved.replicate_scale is not None or resolved.replicate_rscales is not None:
+            warnings.warn(
+                f"Custom replicate_scale/replicate_rscales ignored for {method} "
+                f"(BRR/Fay use fixed scaling).",
+                UserWarning, stacklevel=2,
+            )
+        factor = _replicate_variance_factor(method, R, resolved.fay_rho)
+        return factor * ss, n_valid
+
+    # JK1/JKn: apply scale * rscales multiplicatively (R's svrVar contract)
+    scale = resolved.replicate_scale if resolved.replicate_scale is not None else 1.0
+
+    if resolved.replicate_rscales is not None:
+        valid_rscales = resolved.replicate_rscales[valid]
+        return scale * float(np.sum(valid_rscales * diffs**2)), n_valid
+
+    if method == "JK1":
+        factor = _replicate_variance_factor(method, R, resolved.fay_rho)
+        return scale * factor * ss, n_valid
+    elif method == "JKn":
+        rep_strata = resolved.replicate_strata
+        if rep_strata is None:
+            raise ValueError("JKn requires replicate_strata")
+        valid_strata = rep_strata[valid]
+        result = 0.0
+        for h in np.unique(rep_strata):
+            n_h_original = int(np.sum(rep_strata == h))
+            mask_h = valid_strata == h
+            if not np.any(mask_h):
+                continue
+            result += ((n_h_original - 1.0) / n_h_original) * float(np.sum(diffs[mask_h] ** 2))
+        return scale * result, n_valid
+    else:
+        raise ValueError(f"Unknown replicate method: {method}")
+
+
 def aggregate_to_psu(
     values: np.ndarray,
     resolved: "ResolvedSurveyDesign",
diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py
index cea1e481..43f10460 100644
--- a/diff_diff/synthetic_did.py
+++ b/diff_diff/synthetic_did.py
@@ -256,6 +256,12 @@ def fit(  # type: ignore[override]
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, "analytical")
         )
+        # Reject replicate-weight designs — SyntheticDiD uses bootstrap variance
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "SyntheticDiD does not yet support replicate-weight survey "
+                "designs. Use a TSL-based survey design (strata/psu/fpc)."
+            )
         # Validate pweight only (strata/PSU/FPC are allowed for Rao-Wu bootstrap)
         if resolved_survey is not None and resolved_survey.weight_type != "pweight":
             raise ValueError(
diff --git a/diff_diff/triple_diff.py b/diff_diff/triple_diff.py
index 5d9a70ea..6e9b6eb9 100644
--- a/diff_diff/triple_diff.py
+++ b/diff_diff/triple_diff.py
@@ -30,7 +30,7 @@
 import pandas as pd
 
 from diff_diff.linalg import solve_logit, solve_ols
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 from diff_diff.utils import safe_inference
 
 _MIN_CELL_SIZE = 10
@@ -152,24 +152,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "",
-                    "-" * 75,
-                    "Survey Design".center(75),
-                    "-" * 75,
-                    f"{'Weight type:':<30} {sm.weight_type:>15}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>15}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>15}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>15.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>15.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>15}")
-            lines.append("-" * 75)
+            lines.extend(_format_survey_block(sm, 75))
 
         if self.inference_method != "analytical":
             lines.append(f"{'Inference method:':<30} {self.inference_method:>15}")
@@ -495,6 +478,9 @@ def fit(
         NotImplementedError
             If survey_design is used with wild_bootstrap inference.
         """
+        # Reset replicate state from any previous fit
+        self._replicate_n_valid = None
+
         # Resolve survey design if provided
         from diff_diff.survey import (
             _inject_cluster_as_psu,
@@ -595,7 +581,21 @@ def fit(
         # Compute inference
         # When survey design is active, use survey df (n_PSU - n_strata)
         if survey_metadata is not None and survey_metadata.df_survey is not None:
-            df = max(survey_metadata.df_survey, 1)
+            df = survey_metadata.df_survey
+            # Override with effective replicate df only when replicates were dropped
+            if (hasattr(self, '_replicate_n_valid') and self._replicate_n_valid is not None
+                    and resolved_survey is not None
+                    and self._replicate_n_valid < resolved_survey.n_replicates):
+                df = self._replicate_n_valid - 1
+                survey_metadata.df_survey = self._replicate_n_valid - 1
+            # df <= 0 means insufficient rank for t-based inference
+            if df is not None and df <= 0:
+                df = 0  # Forces NaN from t-distribution
+        elif (resolved_survey is not None
+              and hasattr(resolved_survey, 'uses_replicate_variance')
+              and resolved_survey.uses_replicate_variance):
+            # Replicate design with undefined df (rank <= 1) — NaN inference
+            df = 0  # Forces NaN from t-distribution
         else:
             df = n_obs - 8  # Approximate df (8 cell means)
             if covariates:
@@ -721,7 +721,14 @@ def _compute_cell_means(
                     mask = (G == g_val) & (P == p_val) & (T == t_val)
                     cell_name = f"{g_name}, {p_name}, {t_name}"
                     if weights is not None:
-                        means[cell_name] = float(np.average(y[mask], weights=weights[mask]))
+                        w_cell = weights[mask]
+                        if np.sum(w_cell) <= 0:
+                            raise ValueError(
+                                f"Cell '{cell_name}' has zero effective survey "
+                                f"weight. Cannot compute weighted cell mean. "
+                                f"Check subpopulation/domain definition."
+                            )
+                        means[cell_name] = float(np.average(y[mask], weights=w_cell))
                     else:
                         means[cell_name] = float(np.mean(y[mask]))
         return means
@@ -1089,15 +1096,32 @@ def _estimate_ddd_decomposition(
             # For reg: pairwise IFs are already on the unweighted scale (WLS
             # fits use weights but the IF is residual-based, not Riesz-weighted),
             # so pass directly to TSL without de-weighting.
-            from diff_diff.survey import compute_survey_vcov
-
             inf_for_tsl = inf_func.copy()
             if est_method in ("ipw", "dr") and survey_weights is not None:
                 sw = survey_weights
                 nz = sw > 0
                 inf_for_tsl[nz] = inf_for_tsl[nz] / sw[nz]
-            vcov_survey = compute_survey_vcov(np.ones((n, 1)), inf_for_tsl, resolved_survey)
-            se = float(np.sqrt(vcov_survey[0, 0]))
+
+            if resolved_survey.uses_replicate_variance:
+                from diff_diff.survey import compute_replicate_if_variance
+
+                # Score-scale to match TSL bread (1/sum(w)^2):
+                # reg: psi = w * inf_func / sum(w)
+                # ipw/dr: psi = inf_func / sum(w)  (survey weights cancel)
+                w_sum = float(resolved_survey.weights.sum())
+                if est_method in ("ipw", "dr") and survey_weights is not None:
+                    psi_rep = inf_func / w_sum
+                else:
+                    psi_rep = resolved_survey.weights * inf_func / w_sum
+                variance, n_valid_rep = compute_replicate_if_variance(psi_rep, resolved_survey)
+                se = float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan
+                # Store effective replicate count for df update in fit()
+                self._replicate_n_valid = n_valid_rep
+            else:
+                from diff_diff.survey import compute_survey_vcov
+
+                vcov_survey = compute_survey_vcov(np.ones((n, 1)), inf_for_tsl, resolved_survey)
+                se = float(np.sqrt(vcov_survey[0, 0]))
         elif self._cluster_ids is not None:
             # Cluster-robust SE: sum IF within clusters, then Liang-Zeger variance
             unique_clusters = np.unique(self._cluster_ids)
@@ -1187,6 +1211,14 @@ def _fit_predict_mu(
             if weights is not None:
                 # WLS: transform by sqrt(weights) for weighted least squares
                 w_fit = weights[fit_mask]
+                # Check positive-weight effective sample — subpopulation()
+                # can zero weights leaving rows but an underidentified WLS.
+                n_eff = int(np.count_nonzero(w_fit > 0))
+                n_cols = X_fit.shape[1]
+                if n_eff <= n_cols:
+                    if np.sum(w_fit) > 0:
+                        return np.full(n_total, np.average(y_fit, weights=w_fit))
+                    return np.full(n_total, np.mean(y_fit))
                 sqrt_w = np.sqrt(w_fit)
                 beta, _, _ = solve_ols(
                     X_fit * sqrt_w[:, np.newaxis],
diff --git a/diff_diff/trop.py b/diff_diff/trop.py
index c9ab4286..4d26a69f 100644
--- a/diff_diff/trop.py
+++ b/diff_diff/trop.py
@@ -461,6 +461,12 @@ def fit(
         resolved_survey, _survey_weights, _survey_wt, survey_metadata = _resolve_survey_for_fit(
             survey_design, data, "analytical"
         )
+        # Reject replicate-weight designs — TROP uses Rao-Wu bootstrap
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "TROP does not yet support replicate-weight survey designs. "
+                "Use a TSL-based survey design (strata/psu/fpc)."
+            )
         # Validate weight_type is pweight (keep restriction), but allow
         # strata/PSU/FPC — those are handled via Rao-Wu rescaled bootstrap.
         if resolved_survey is not None and resolved_survey.weight_type != "pweight":
diff --git a/diff_diff/trop_results.py b/diff_diff/trop_results.py
index 94c38657..2123499f 100644
--- a/diff_diff/trop_results.py
+++ b/diff_diff/trop_results.py
@@ -16,7 +16,7 @@
 except ImportError:
     from typing_extensions import TypedDict
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 __all__ = [
     "_LAMBDA_INF",
@@ -208,24 +208,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Add survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "",
-                    "-" * 75,
-                    "Survey Design".center(75),
-                    "-" * 75,
-                    f"{'Weight type:':<25} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<25} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}")
-            lines.append("-" * 75)
+            lines.extend(_format_survey_block(sm, 75))
 
         lines.extend(
             [
diff --git a/diff_diff/twfe.py b/diff_diff/twfe.py
index 0b49266a..0d581589 100644
--- a/diff_diff/twfe.py
+++ b/diff_diff/twfe.py
@@ -127,6 +127,16 @@ def fit(  # type: ignore[override]
         resolved_survey, survey_weights, survey_weight_type, survey_metadata = (
             _resolve_survey_for_fit(survey_design, data, self.inference)
         )
+        # Reject replicate-weight designs — TWFE within-transformation must
+        # be recomputed per replicate (same reason as SunAbraham rejection)
+        if resolved_survey is not None and resolved_survey.uses_replicate_variance:
+            raise NotImplementedError(
+                "TwoWayFixedEffects does not yet support replicate-weight "
+                "survey designs. The weighted within-transformation must be "
+                "recomputed for each replicate. Use CallawaySantAnna or "
+                "TripleDifference for replicate-weight inference, or use a "
+                "TSL-based survey design (strata/psu/fpc)."
+            )
 
         # Use unit-level clustering if not specified (use local variable to avoid mutation)
         cluster_var = self.cluster if self.cluster is not None else unit
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index bfe7b4fd..cd4baa4e 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -242,6 +242,11 @@ def fit(
 
         # Validate within-unit constancy for panel survey designs
         if resolved_survey is not None:
+            if resolved_survey.uses_replicate_variance:
+                raise NotImplementedError(
+                    "TwoStageDiD does not yet support replicate-weight survey "
+                    "designs. Use a TSL-based survey design (strata/psu/fpc)."
+                )
             _validate_unit_constant_survey(data, unit, survey_design)
             if resolved_survey.weight_type != "pweight":
                 raise ValueError(
diff --git a/diff_diff/two_stage_results.py b/diff_diff/two_stage_results.py
index b06cd000..bb064258 100644
--- a/diff_diff/two_stage_results.py
+++ b/diff_diff/two_stage_results.py
@@ -11,7 +11,7 @@
 import numpy as np
 import pandas as pd
 
-from diff_diff.results import _get_significance_stars
+from diff_diff.results import _format_survey_block, _get_significance_stars
 
 __all__ = [
     "TwoStageBootstrapResults",
@@ -185,23 +185,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
         # Survey design info
         if self.survey_metadata is not None:
             sm = self.survey_metadata
-            lines.extend(
-                [
-                    "-" * 85,
-                    "Survey Design".center(85),
-                    "-" * 85,
-                    f"{'Weight type:':<30} {sm.weight_type:>10}",
-                ]
-            )
-            if sm.n_strata is not None:
-                lines.append(f"{'Strata:':<30} {sm.n_strata:>10}")
-            if sm.n_psu is not None:
-                lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}")
-            lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}")
-            lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}")
-            if sm.df_survey is not None:
-                lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}")
-            lines.extend(["-" * 85, ""])
+            lines.extend(_format_survey_block(sm, 85))
 
         # Overall ATT
         lines.extend(
diff --git a/diff_diff/utils.py b/diff_diff/utils.py
index aeef49a9..c252c418 100644
--- a/diff_diff/utils.py
+++ b/diff_diff/utils.py
@@ -177,6 +177,9 @@ def safe_inference(effect, se, alpha=0.05, df=None):
     """
     if not (np.isfinite(se) and se > 0):
         return np.nan, np.nan, (np.nan, np.nan)
+    if df is not None and df <= 0:
+        # Undefined degrees of freedom (e.g., rank-deficient replicate design)
+        return np.nan, np.nan, (np.nan, np.nan)
     t_stat = effect / se
     p_value = compute_p_value(t_stat, df=df)
     conf_int = compute_confidence_interval(effect, se, alpha, df=df)
@@ -213,6 +216,10 @@ def safe_inference_batch(effects, ses, alpha=0.05, df=None):
     ci_lowers = np.full(n, np.nan)
     ci_uppers = np.full(n, np.nan)
 
+    # Undefined df (e.g., rank-deficient replicate design) → all NaN
+    if df is not None and df <= 0:
+        return t_stats, p_values, ci_lowers, ci_uppers
+
     valid = np.isfinite(ses) & (ses > 0)
     if not np.any(valid):
         return t_stats, p_values, ci_lowers, ci_uppers
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index e2615274..076947ff 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1907,9 +1907,9 @@ unequal selection probabilities).
 - **Note:** pweight HC1 meat uses score outer products (Σ s_i s_i' where
   s_i = w_i x_i u_i), giving w² in the meat. fweight HC1 meat uses
   X'diag(w u²)X (one power of w), matching frequency-expanded HC1.
-- **Note:** fweights must be positive integers; fractional values are
-  rejected by `_validate_weights()`. This matches Stata's convention and
-  avoids ambiguous frequency semantics.
+- **Note:** fweights must be non-negative integers; fractional values are
+  rejected by `_validate_weights()`. All-zero vectors rejected at solver
+  level. This matches Stata's convention.
 
 ### Absorbed Fixed Effects with Survey Weights
 
@@ -1993,10 +1993,128 @@ ContinuousDiD, EfficientDiD):
   tau values once and only varies the ATT aggregation weights across draws. This is
   mathematically equivalent to refitting per draw and avoids redundant computation.
 
+### Replicate Weight Variance (Phase 6)
+
+Alternative to TSL: re-run WLS for each replicate weight column and compute
+variance from the distribution of replicate estimates.
+
+- **Reference**: Wolter (2007) "Introduction to Variance Estimation", 2nd ed.
+  Rao & Wu (1988).
+- **Supported methods**: BRR, Fay's BRR, JK1, JKn
+- **Formulas**:
+  - BRR: `V = (1/R) * sum_r (theta_r - theta)^2`
+  - Fay: `V = 1/(R*(1-rho)^2) * sum_r (theta_r - theta)^2`
+  - JK1: `V = (R-1)/R * sum_r (theta_r - theta)^2`
+  - JKn: `V = sum_h ((n_h-1)/n_h) * sum_{r in h} (theta_r - theta)^2`
+- **IF-based replicate variance**: For influence-function estimators (CS
+  aggregation, ContinuousDiD, EfficientDiD, TripleDifference), replicate
+  contrasts are formed via weight-ratio rescaling:
+  `theta_r = sum((w_r/w_full) * psi)` when `combined_weights=True`,
+  `theta_r = sum(w_r * psi)` when `combined_weights=False`.
+- **Survey df**: QR-rank of the analysis-weight matrix minus 1,
+  matching R's `survey::degf()` which uses `qr(..., tol=1e-5)$rank`.
+  For `combined_weights=True` (default), analysis weights are the raw
+  replicate columns. For `combined_weights=False`, analysis weights are
+  `replicate_weights * full_sample_weights`. Returns `None` (undefined)
+  when rank <= 1, yielding NaN inference. Replaces `n_PSU - n_strata`.
+- **Mutual exclusion**: Replicate weights cannot be combined with
+  strata/psu/fpc (the replicates encode design structure implicitly)
+- **Design parameters** (matching R `svrepdesign()`):
+  - `combined_weights` (default True): replicate columns include full-sample
+    weight. If False, replicate columns are perturbation factors multiplied
+    by full-sample weight before WLS.
+  - `replicate_scale`: overall variance multiplier, applied multiplicatively
+    with `replicate_rscales` when both are provided (`scale * rscales`)
+  - `replicate_rscales`: per-replicate scaling factors (vector of length R).
+    BRR and Fay ignore custom `replicate_scale`/`replicate_rscales` with a
+    warning (fixed scaling by design); JK1/JKn allow overrides.
+  - `mse` (default False, matching R's `survey::svrepdesign()`): if True,
+    center variance on full-sample estimate; if False, center on mean of
+    replicate estimates. When `replicate_rscales` contains zero entries
+    and `mse=False`, centering excludes zero-scaled replicates, matching
+    R's `survey::svrVar()` convention.
+- **Note:** Replicate columns are NOT normalized — raw values are preserved
+  to maintain correct weight ratios in the IF path.
+- **Note:** JKn requires explicit `replicate_strata` (per-replicate stratum
+  assignment). Auto-derivation from weight patterns is not supported.
+- **Note:** Invalid replicate solves (singular/degenerate) are dropped with
+  a warning. Variance is computed from valid replicates only. Fewer than 2
+  valid replicates returns NaN variance. The variance scaling factor
+  (e.g., `1/R` for BRR, `(R-1)/R` for JK1) uses the original design's `R`,
+  not the valid count — matching R's `survey` package convention where the
+  design structure is fixed and dropped replicates contribute zero to the
+  sum without changing the scale. Survey df uses `n_valid - 1` for
+  t-based inference.
+- **Note:** Replicate-weight support matrix:
+  - **Supported**: CallawaySantAnna (reg/ipw/dr without covariates, no
+    bootstrap), ContinuousDiD
+    (no bootstrap), EfficientDiD (no bootstrap), TripleDifference (all
+    methods), LinearRegression (OLS path)
+  - **Rejected with NotImplementedError**: SunAbraham, TwoWayFixedEffects
+    (within-transformation must be recomputed per replicate),
+    DifferenceInDifferences, MultiPeriodDiD, StackedDiD (use
+    compute_survey_vcov directly), ImputationDiD, TwoStageDiD (custom
+    variance), SyntheticDiD, TROP (bootstrap-based variance),
+    BaconDecomposition (diagnostic only)
+  - CS/ContinuousDiD/EfficientDiD reject replicate + `n_bootstrap > 0`
+    (replicate weights provide analytical variance)
+- **Note:** When invalid replicates are dropped in `compute_replicate_vcov`
+  (OLS path), `n_valid` is returned and used for `df_survey = n_valid - 1`
+  in `LinearRegression.fit()`. For IF-based replicate paths, replicates
+  essentially never fail (weighted sums cannot be singular), so `n_valid`
+  equals `R` in practice and df propagation is not needed.
+
+### DEFF Diagnostics (Phase 6)
+
+Per-coefficient design effect comparing survey variance to SRS variance.
+
+- **Reference**: Kish (1965) "Survey Sampling", Wiley. Chapter 8.
+- **Formula**: `DEFF_k = Var_survey(beta_k) / Var_SRS(beta_k)` where
+  SRS baseline uses HC1 sandwich ignoring design structure
+- **Effective n**: `n_eff_k = n / DEFF_k`
+- **Display**: Existing weight-based DEFF labeled "Kish DEFF (weights)";
+  per-coefficient DEFF available via `compute_deff_diagnostics()` or
+  `LinearRegression.compute_deff()` post-fit
+- **Note:** Opt-in computation — not run automatically. Users call standalone
+  function or post-fit method when diagnostics are needed.
+
+### Subpopulation Analysis (Phase 6)
+
+Domain estimation preserving full design structure.
+
+- **Reference**: Lumley (2004) Section 3.4. Stata `svy: subpop`.
+- **Method**: `SurveyDesign.subpopulation(data, mask)` zeros out weights for
+  excluded observations while retaining strata/PSU layout for correct
+  variance estimation
+- **Note:** Unlike naive subsetting, subpopulation analysis preserves design
+  information (PSU structure, strata counts) that would be lost by dropping
+  observations. This is the methodologically correct approach for domain
+  estimation under complex survey designs.
+- **Note:** Weight validation relaxed from "strictly positive" to
+  "non-negative" to support zero-weight observations. Negative weights
+  still rejected. All-zero weight vectors rejected at solver level.
+- **Note:** Survey design df (`n_PSU - n_strata`) uses total design
+  structure (including zero-weight rows), matching R's `survey::degf()`
+  convention after `subset()`. The generic HC1/classical inference paths
+  use positive-weight count for df adjustments, ensuring zero-weight
+  padding is inference-invariant outside the survey vcov path. DEFF
+  effective-n also uses positive-weight count.
+- **Note:** For replicate-weight designs, `subpopulation()` zeros out both
+  full-sample and replicate weight columns for excluded observations,
+  preserving all replicate metadata.
+- **Note:** Defensive enhancement: ContinuousDiD and TripleDifference
+  validate the positive-weight effective sample size before WLS cell fits.
+  After `subpopulation()` zeroes weights, raw row counts may exceed the
+  regression rank requirement while the weighted effective sample does not.
+  Underidentified cells are skipped (ContinuousDiD) or fall back to
+  weighted means (TripleDifference).
+
 ---
 
 # Version History
 
+- **v1.3** (2026-03-26): Added Replicate Weight Variance, DEFF Diagnostics,
+  and Subpopulation Analysis sections (Phase 6 completion)
 - **v1.2** (2026-03-24): Added Survey-Aware Bootstrap section (Phase 6)
 - **v1.1** (2026-03-20): Added Survey Data Support section
 - **v1.0** (2025-01-19): Initial registry with 12 estimators
diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
index ab95fac0..a521d4ef 100644
--- a/docs/survey-roadmap.md
+++ b/docs/survey-roadmap.md
@@ -1,7 +1,7 @@
 # Survey Data Support Roadmap
 
-This document captures planned future work for survey data support in diff-diff.
-Phases 1-4 are implemented. Phase 5 is deferred for future PRs.
+This document captures the survey data support roadmap for diff-diff.
+All phases (1-6) are implemented.
 
 ## Implemented (Phases 1-2)
 
@@ -46,7 +46,7 @@ message pointing to the planned phase or describing the limitation.
 |-----------|------|----------------|-------|
 | ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred |
 | TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred |
-| CallawaySantAnna | `staggered.py` | Weights-only | Weights-only SurveyDesign (strata/PSU/FPC rejected); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; full design SEs, covariates+IPW/DR, and bootstrap+survey deferred |
+| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; replicate IF variance for analytical SEs |
 
 **Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights
 enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked
@@ -90,16 +90,33 @@ Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies:
 - **TROP**: cross-classified pseudo-strata (survey_stratum × treatment_group).
 - **Rust TROP**: pweight-only in Rust; full design falls to Python path.
 
-### Replicate Weight Variance
+### Replicate Weight Variance ✅ (2026-03-26)
 Re-run WLS for each replicate weight column, compute variance from distribution
-of estimates. Supports BRR, Fay's BRR, JK1, JKn, bootstrap replicate methods.
-Add `replicate_weights`, `replicate_type`, `replicate_rho` fields to SurveyDesign.
-
-### DEFF Diagnostics
-Compare survey vcov to SRS vcov element-wise. Report design effect per
-coefficient. Effective n = n / DEFF.
-
-### Subpopulation Analysis
+of estimates. Supports BRR, Fay's BRR, JK1, JKn methods.
+JKn requires explicit `replicate_strata` (per-replicate stratum assignment).
+- `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign
+- `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate)
+- `compute_replicate_if_variance()` for IF-based estimators (reweights IF)
+- Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py`
+- Replicate weights mutually exclusive with strata/PSU/FPC
+- Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()`
+- **Limitations**: Supported in CallawaySantAnna, ContinuousDiD, EfficientDiD,
+  TripleDifference (analytical only, no bootstrap). Rejected with
+  `NotImplementedError` in DifferenceInDifferences, TwoWayFixedEffects,
+  MultiPeriodDiD, StackedDiD, SunAbraham, ImputationDiD, TwoStageDiD,
+  SyntheticDiD, TROP.
+
+### DEFF Diagnostics ✅ (2026-03-26)
+Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov.
+- `DEFFDiagnostics` dataclass with per-coefficient DEFF, effective n, SRS/survey SE
+- `compute_deff_diagnostics()` standalone function
+- `LinearRegression.compute_deff()` post-fit method
+- Display label "Kish DEFF (weights)" for existing weight-based DEFF
+
+### Subpopulation Analysis ✅ (2026-03-26)
 `SurveyDesign.subpopulation(data, mask)` — zero-out weights for excluded
 observations while preserving the full design structure for correct variance
 estimation (unlike simple subsetting, which would drop design information).
+- Mask: bool array/Series, column name, or callable
+- Returns (SurveyDesign, DataFrame) pair with synthetic `_subpop_weight` column
+- Weight validation relaxed: zero weights allowed (negative still rejected)
diff --git a/tests/test_continuous_did.py b/tests/test_continuous_did.py
index a123361e..c519c3bb 100644
--- a/tests/test_continuous_did.py
+++ b/tests/test_continuous_did.py
@@ -801,6 +801,44 @@ def test_anticipation_event_study(self):
         )
         assert np.isfinite(results.event_study_effects[-1]["effect"])
 
+    def test_anticipation_event_study_excludes_contaminated_periods(self):
+        """With anticipation=2, event study should not contain e < -2."""
+        rng = np.random.default_rng(42)
+        n_per_group = 30
+        periods = list(range(1, 9))  # 8 periods
+
+        rows = []
+        # Never-treated
+        for i in range(n_per_group):
+            for t in periods:
+                rows.append({
+                    "unit": i, "period": t, "first_treat": 0,
+                    "dose": 0.0, "outcome": rng.normal(0, 0.5),
+                })
+        # Cohort g=5 — treatment at t=5, anticipation=2 means post at t>=3
+        for i in range(n_per_group):
+            uid = n_per_group + i
+            d = rng.uniform(0.5, 2.0)
+            for t in periods:
+                y = rng.normal(0, 0.5) + (2.0 * d if t >= 5 else 0)
+                rows.append({
+                    "unit": uid, "period": t, "first_treat": 5,
+                    "dose": d, "outcome": y,
+                })
+
+        data = pd.DataFrame(rows)
+        est = ContinuousDiD(anticipation=2, n_bootstrap=0)
+        results = est.fit(
+            data, "outcome", "unit", "period", "first_treat", "dose",
+            aggregate="eventstudy",
+        )
+        assert results.event_study_effects is not None
+        for e in results.event_study_effects.keys():
+            assert e >= -2, (
+                f"Found relative period e={e} with anticipation=2; "
+                f"expected e >= -2"
+            )
+
     def test_anticipation_not_yet_treated_excludes_anticipation_window(self):
         """Not-yet-treated controls must exclude cohorts in the anticipation window.
 
diff --git a/tests/test_efficient_did.py b/tests/test_efficient_did.py
index 76c8e282..951c46f5 100644
--- a/tests/test_efficient_did.py
+++ b/tests/test_efficient_did.py
@@ -705,6 +705,52 @@ def test_hausman_clustered(self):
         assert pretest.recommendation in ("pt_all", "pt_post", "inconclusive")
         assert pretest.df >= 0
 
+    def test_hausman_clustered_stale_ncl_after_nan_filtering(self):
+        """After filtering NaN EIF rows, n_cl must be recomputed.
+
+        If entire clusters are dropped by the row_finite filter, the original
+        n_cl overcounts clusters, inflating variance via the (n_cl / (n_cl-1))
+        correction and phantom zero rows in _cluster_aggregate.
+        """
+        rng = np.random.default_rng(99)
+        n_clusters = 10
+        units_per_cluster = 8
+        n_units = n_clusters * units_per_cluster
+        n_periods = 7
+
+        cluster_ids = np.repeat(np.arange(n_clusters), units_per_cluster)
+        units = np.repeat(np.arange(n_units), n_periods)
+        times = np.tile(np.arange(1, n_periods + 1), n_units)
+        ft = np.full(n_units, np.inf)
+        # Two small treatment cohorts
+        ft[:8] = 3  # cluster 0 only
+        ft[8:24] = 5  # clusters 1-2
+
+        ft_col = np.repeat(ft, n_periods)
+        unit_fe = np.repeat(rng.normal(0, 0.3, n_units), n_periods)
+        eps = rng.normal(0, 0.3, len(units))
+        tau = np.where((ft_col < np.inf) & (times >= ft_col), 2.0, 0.0)
+        y = unit_fe + tau + eps
+
+        df = pd.DataFrame(
+            {
+                "unit": units,
+                "time": times,
+                "first_treat": ft_col,
+                "y": y,
+                "cluster_id": np.repeat(cluster_ids, n_periods),
+            }
+        )
+
+        pretest = EfficientDiD.hausman_pretest(
+            df, "y", "unit", "time", "first_treat", cluster="cluster_id"
+        )
+        assert pretest.recommendation in ("pt_all", "pt_post", "inconclusive")
+        # Key assertion: if the statistic is finite, df should reflect the
+        # actual number of clusters remaining after NaN filtering
+        if np.isfinite(pretest.statistic):
+            assert pretest.df > 0
+
     def test_hausman_last_cohort(self):
         """Hausman pretest on all-treated panel with last_cohort control."""
         df = _make_staggered_panel(
diff --git a/tests/test_survey.py b/tests/test_survey.py
index f1093064..534379e0 100644
--- a/tests/test_survey.py
+++ b/tests/test_survey.py
@@ -692,7 +692,7 @@ def test_summary_output(self, survey_2x2_data):
         assert "Strata:" in summary_text
         assert "PSU/Cluster:" in summary_text
         assert "Effective sample size:" in summary_text
-        assert "Design effect (DEFF):" in summary_text
+        assert "Kish DEFF (weights):" in summary_text
 
     def test_to_dict_survey_fields(self, survey_2x2_data):
         """Survey metadata fields appear in to_dict()."""
@@ -756,11 +756,18 @@ def test_survey_design_validation_missing_column(self):
         with pytest.raises(ValueError, match="not found"):
             sd.resolve(df)
 
-    def test_survey_design_validation_zero_weights(self):
-        """Zero weights raise ValueError."""
+    def test_survey_design_validation_zero_weights_allowed(self):
+        """Zero weights are allowed (for subpopulation analysis)."""
         df = pd.DataFrame({"y": [1, 2, 3], "w": [1.0, 0.0, 1.0]})
         sd = SurveyDesign(weights="w")
-        with pytest.raises(ValueError, match="strictly positive"):
+        resolved = sd.resolve(df)
+        assert resolved.weights[1] == 0.0
+
+    def test_survey_design_validation_all_zero_weights_rejected(self):
+        """All-zero weights raise ValueError."""
+        df = pd.DataFrame({"y": [1, 2, 3], "w": [0.0, 0.0, 0.0]})
+        sd = SurveyDesign(weights="w")
+        with pytest.raises(ValueError, match="All weights are zero"):
             sd.resolve(df)
 
     def test_survey_design_validation_nan_weights(self):
@@ -947,7 +954,7 @@ def test_fweight_error_for_fractional(self):
             }
         )
         sd = SurveyDesign(weights="w", weight_type="fweight")
-        with pytest.raises(ValueError, match="Frequency weights.*must be positive integers"):
+        with pytest.raises(ValueError, match="Frequency weights.*must be non-negative integers"):
             sd.resolve(df)
 
     def test_lonely_psu_remove_warning(self):
@@ -2360,7 +2367,7 @@ def test_fractional_fweight_rejected_solve_ols(self):
         y = np.random.randn(n)
         w = np.array([1.5, 2.3, 1.0, 2.0, 1.7, 3.0, 1.0, 2.0, 1.0, 1.0])
 
-        with pytest.raises(ValueError, match="positive integers"):
+        with pytest.raises(ValueError, match="non-negative integers"):
             solve_ols(X, y, weights=w, weight_type="fweight")
 
     def test_fractional_fweight_rejected_compute_robust_vcov(self):
@@ -2371,7 +2378,7 @@ def test_fractional_fweight_rejected_compute_robust_vcov(self):
         resid = np.random.randn(n)
         w = np.array([1.5, 2.0, 1.0, 2.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0])
 
-        with pytest.raises(ValueError, match="positive integers"):
+        with pytest.raises(ValueError, match="non-negative integers"):
             compute_robust_vcov(X, resid, weights=w, weight_type="fweight")
 
     def test_integer_fweight_accepted(self):
diff --git a/tests/test_survey_phase3.py b/tests/test_survey_phase3.py
index 830561af..f43c4968 100644
--- a/tests/test_survey_phase3.py
+++ b/tests/test_survey_phase3.py
@@ -1195,3 +1195,83 @@ def test_continuous_did_dose_response_survey_pvalue(self, continuous_survey_data
             if np.isfinite(row["effect"]) and np.isfinite(row["se"]) and row["se"] > 0:
                 _, expected_p, _ = safe_inference(row["effect"], row["se"], df=sm.df_survey)
                 assert row["p_value"] == pytest.approx(expected_p, rel=1e-10)
+
+
+# =============================================================================
+# Survey Edge Case Coverage Gaps
+# =============================================================================
+
+
+class TestSurveyEdgeCases:
+    """Tests for FPC census, single-PSU NaN, and full-design bootstrap."""
+
+    def test_fpc_census_zero_variance_stacked(self, staggered_survey_data):
+        """When FPC == n_PSU (full census), sampling variance should be ~0."""
+        from diff_diff import StackedDiD
+
+        data = staggered_survey_data.copy()
+        # Set FPC = n_PSU per stratum (census)
+        for h in data["stratum"].unique():
+            mask = data["stratum"] == h
+            n_psu_h = data.loc[mask, "psu"].nunique()
+            data.loc[mask, "fpc"] = float(n_psu_h)
+
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", fpc="fpc",
+            nest=True,
+        )
+        result = StackedDiD().fit(
+            data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        assert result.overall_se == pytest.approx(0.0, abs=1e-10)
+
+    def test_single_psu_lonely_remove_nan_se(self, staggered_survey_data):
+        """All strata with 1 PSU + lonely_psu='remove' -> NaN SE."""
+        from diff_diff import SunAbraham
+
+        data = staggered_survey_data.copy()
+        # Force all units into a single PSU per stratum
+        data["single_psu"] = data["stratum"]
+
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="single_psu",
+            lonely_psu="remove",
+        )
+        with pytest.warns(UserWarning, match="only 1 PSU"):
+            result = SunAbraham().fit(
+                data, "outcome", "unit", "time", "first_treat",
+                survey_design=sd,
+            )
+        # All strata removed -> NaN SE
+        assert np.isnan(result.overall_se)
+
+    def test_full_design_bootstrap_continuous_did(self, staggered_survey_data):
+        """Bootstrap with strata survey design works for ContinuousDiD."""
+        from diff_diff import ContinuousDiD
+
+        data = staggered_survey_data.copy()
+        # Add dose column
+        data["dose"] = np.where(data["first_treat"] > 0, 1.0 + 0.5 * data["unit"] % 3, 0.0)
+
+        sd = SurveyDesign(weights="weight", strata="stratum")
+        result = ContinuousDiD(n_bootstrap=30, seed=42).fit(
+            data, "outcome", "unit", "time", "first_treat", "dose",
+            survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_att_se)
+        assert result.survey_metadata is not None
+
+    def test_full_design_bootstrap_efficient_did(self, staggered_survey_data):
+        """Bootstrap with strata survey design works for EfficientDiD."""
+        from diff_diff import EfficientDiD
+
+        sd = SurveyDesign(weights="weight", strata="stratum")
+        result = EfficientDiD(n_bootstrap=30, seed=42).fit(
+            staggered_survey_data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_se)
+        assert result.survey_metadata is not None
diff --git a/tests/test_survey_phase4.py b/tests/test_survey_phase4.py
index a4244b88..f4781b69 100644
--- a/tests/test_survey_phase4.py
+++ b/tests/test_survey_phase4.py
@@ -202,7 +202,7 @@ def test_negative_weights_raises(self):
         y = (X @ [0.5, -0.5] + rng.randn(n) > 0).astype(float)
         weights = np.ones(n)
         weights[0] = -1.0
-        with pytest.raises(ValueError, match="strictly positive"):
+        with pytest.raises(ValueError, match="non-negative"):
             solve_logit(X, y, weights=weights)
 
     def test_wrong_shape_weights_raises(self):
@@ -1509,3 +1509,23 @@ def test_weight_scale_invariance_callaway_santanna(self, staggered_survey_data):
 
         assert abs(r1.overall_att - r2.overall_att) < 1e-10
         assert abs(r1.overall_se - r2.overall_se) < 1e-8
+
+
+class TestCallawaySantAnnaFullDesignBootstrap:
+    """Test full-design bootstrap for CS with strata+PSU+FPC."""
+
+    def test_bootstrap_full_design_cs(self, staggered_survey_data):
+        """Bootstrap + full survey (strata+PSU+FPC) works for CS."""
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", fpc="fpc",
+        )
+        result = CallawaySantAnna(
+            estimation_method="reg", n_bootstrap=30, seed=42,
+        ).fit(
+            staggered_survey_data, "outcome", "unit", "period",
+            "first_treat", survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_se)
+        assert result.survey_metadata is not None
+        assert result.survey_metadata.n_strata is not None
diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py
index fcf7e6e0..3cf80968 100644
--- a/tests/test_survey_phase5.py
+++ b/tests/test_survey_phase5.py
@@ -852,3 +852,38 @@ def test_sdid_to_dict_schema_matches_did(self):
             "df_survey",
         ]:
             assert key in d, f"Missing key: {key}"
+
+
+class TestTROPRaoWuEquivalence:
+    """Test Rao-Wu vs block bootstrap equivalence under degenerate design."""
+
+    def test_rao_wu_approximates_block_no_strata(self, trop_survey_data):
+        """Without real strata variation, Rao-Wu SE ~ block bootstrap SE."""
+        from diff_diff import TROP
+
+        data = trop_survey_data.copy()
+        # Single stratum, PSU = unit (effectively block bootstrap)
+        data["single_stratum"] = 0
+        data["unit_psu"] = data["unit"]
+
+        sd_rw = SurveyDesign(
+            weights="weight", strata="single_stratum", psu="unit_psu",
+        )
+        sd_block = SurveyDesign(weights="weight")
+
+        result_rw = TROP(method="local", n_bootstrap=99, seed=42, max_iter=5).fit(
+            data, "outcome", "D", "unit", "time", survey_design=sd_rw,
+        )
+        result_block = TROP(method="local", n_bootstrap=99, seed=42, max_iter=5).fit(
+            data, "outcome", "D", "unit", "time", survey_design=sd_block,
+        )
+
+        # Point estimates identical (same weights)
+        assert result_rw.att == pytest.approx(result_block.att, abs=1e-10)
+        # SEs should be within factor of 2
+        if result_block.se > 0:
+            ratio = result_rw.se / result_block.se
+            assert 0.5 < ratio < 2.0, (
+                f"Rao-Wu SE ({result_rw.se:.4f}) and block SE "
+                f"({result_block.se:.4f}) differ by > 2x (ratio={ratio:.2f})"
+            )
diff --git a/tests/test_survey_phase6.py b/tests/test_survey_phase6.py
new file mode 100644
index 00000000..bb54ada3
--- /dev/null
+++ b/tests/test_survey_phase6.py
@@ -0,0 +1,1697 @@
+"""Tests for Phase 6 survey support: Subpopulation, DEFF, Replicate Weights."""
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff import (
+    DEFFDiagnostics,
+    DifferenceInDifferences,
+    SurveyDesign,
+    compute_deff_diagnostics,
+)
+
+
+# =============================================================================
+# Shared Fixtures
+# =============================================================================
+
+
+@pytest.fixture
+def basic_did_data():
+    """Simple 2x2 DiD panel with survey design columns."""
+    np.random.seed(42)
+    n_units = 80
+    n_periods = 4
+    rows = []
+    for unit in range(n_units):
+        treated = unit < 40
+        ft = 3 if treated else 0
+        stratum = unit // 20  # 4 strata
+        psu = unit // 10  # 8 PSUs
+        wt = 1.0 + 0.5 * (unit % 5)
+
+        for t in range(1, n_periods + 1):
+            y = 10.0 + unit * 0.05 + t * 0.3
+            if treated and t >= 3:
+                y += 2.0
+            y += np.random.normal(0, 0.5)
+            rows.append({
+                "unit": unit, "time": t, "first_treat": ft,
+                "outcome": y, "stratum": stratum, "psu": psu,
+                "weight": wt, "fpc": 200.0,
+            })
+    return pd.DataFrame(rows)
+
+
+# =============================================================================
+# Subpopulation Analysis
+# =============================================================================
+
+
+class TestSubpopulationAnalysis:
+    """Tests for SurveyDesign.subpopulation()."""
+
+    def test_basic_subpopulation(self, basic_did_data):
+        """Subpopulation returns a new SurveyDesign and DataFrame."""
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        mask = basic_did_data["unit"] < 60  # Exclude last 20 units
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+
+        assert new_sd.weights == "_subpop_weight"
+        assert new_sd.strata == "stratum"
+        assert new_sd.psu == "psu"
+        assert "_subpop_weight" in new_data.columns
+
+        # Excluded obs have zero weight
+        excluded = new_data[~mask.values]
+        assert (excluded["_subpop_weight"] == 0.0).all()
+
+        # Included obs have original weights
+        included = new_data[mask.values]
+        np.testing.assert_array_equal(
+            included["_subpop_weight"].values,
+            basic_did_data.loc[mask.values, "weight"].values,
+        )
+
+    def test_subpopulation_preserves_design_structure(self, basic_did_data):
+        """PSU/strata structure is preserved even when some PSUs are zeroed."""
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", nest=True,
+        )
+        # Exclude all units in stratum 0
+        mask = basic_did_data["stratum"] != 0
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+
+        # Should resolve without error — stratum 0 PSUs still exist in design
+        resolved = new_sd.resolve(new_data)
+        assert resolved.n_strata == 4  # All strata still present
+        assert resolved.n_psu == 8  # All PSUs still present
+
+    def test_subpopulation_with_did(self, basic_did_data):
+        """Subpopulation works end-to-end with DifferenceInDifferences."""
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        mask = basic_did_data["unit"] < 60
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+
+        # Add treated/post columns for the estimator
+        new_data["treated"] = (new_data["first_treat"] > 0).astype(int)
+        new_data["post"] = (new_data["time"] >= 3).astype(int)
+        est = DifferenceInDifferences()
+        result = est.fit(
+            new_data, outcome="outcome", treatment="treated", time="post",
+            survey_design=new_sd,
+        )
+        assert np.isfinite(result.att)
+        assert np.isfinite(result.se)
+        assert result.survey_metadata is not None
+
+    def test_subpopulation_preserves_more_psus_than_subset(self, basic_did_data):
+        """Subpopulation retains all PSUs in the design; subset drops them."""
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", nest=True,
+        )
+        mask = basic_did_data["unit"] < 60
+
+        # Subpopulation: preserves full design
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+        resolved_subpop = new_sd.resolve(new_data)
+
+        # Naive subset: loses PSUs in excluded observations
+        subset = basic_did_data[mask.values].copy()
+        resolved_subset = sd.resolve(subset)
+
+        # Subpopulation preserves the full PSU count
+        assert resolved_subpop.n_psu >= resolved_subset.n_psu
+
+    def test_subpopulation_mask_as_column_name(self, basic_did_data):
+        """Mask can be a column name string."""
+        sd = SurveyDesign(weights="weight")
+        data = basic_did_data.copy()
+        data["include"] = data["unit"] < 60
+        new_sd, new_data = sd.subpopulation(data, "include")
+        assert (new_data.loc[~data["include"], "_subpop_weight"] == 0.0).all()
+
+    def test_subpopulation_mask_as_callable(self, basic_did_data):
+        """Mask can be a callable."""
+        sd = SurveyDesign(weights="weight")
+        new_sd, new_data = sd.subpopulation(
+            basic_did_data, lambda df: df["unit"] < 60,
+        )
+        excluded = new_data[basic_did_data["unit"] >= 60]
+        assert (excluded["_subpop_weight"] == 0.0).all()
+
+    def test_subpopulation_no_weights_uses_uniform(self, basic_did_data):
+        """When SurveyDesign has no weights, subpopulation uses 1.0."""
+        sd = SurveyDesign(strata="stratum")
+        mask = basic_did_data["unit"] < 60
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+
+        included = new_data[mask.values]
+        assert (included["_subpop_weight"] == 1.0).all()
+
+    def test_subpopulation_all_excluded_raises(self, basic_did_data):
+        """Excluding all obs should raise ValueError."""
+        sd = SurveyDesign(weights="weight")
+        mask = np.zeros(len(basic_did_data), dtype=bool)
+        with pytest.raises(ValueError, match="excludes all"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_subpopulation_none_excluded_identity(self, basic_did_data):
+        """Including all obs should return weights unchanged."""
+        sd = SurveyDesign(weights="weight")
+        mask = np.ones(len(basic_did_data), dtype=bool)
+        new_sd, new_data = sd.subpopulation(basic_did_data, mask)
+        np.testing.assert_array_equal(
+            new_data["_subpop_weight"].values,
+            basic_did_data["weight"].values,
+        )
+
+    def test_subpopulation_mask_length_mismatch_raises(self, basic_did_data):
+        """Mask length must match data length."""
+        sd = SurveyDesign(weights="weight")
+        mask = np.ones(10, dtype=bool)
+        with pytest.raises(ValueError, match="length"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_zero_weights_allowed_in_resolve(self):
+        """resolve() should accept zero weights (no longer strictly positive)."""
+        data = pd.DataFrame({
+            "w": [1.0, 2.0, 0.0, 1.5],
+            "y": [1, 2, 3, 4],
+        })
+        sd = SurveyDesign(weights="w")
+        resolved = sd.resolve(data)
+        assert resolved.weights[2] == 0.0  # Zero preserved after normalization
+
+    def test_negative_weights_rejected(self):
+        """Negative weights should still be rejected."""
+        data = pd.DataFrame({"w": [1.0, -1.0, 2.0], "y": [1, 2, 3]})
+        sd = SurveyDesign(weights="w")
+        with pytest.raises(ValueError, match="non-negative"):
+            sd.resolve(data)
+
+
+# =============================================================================
+# DEFF Diagnostics
+# =============================================================================
+
+
+class TestDEFFDiagnostics:
+    """Tests for per-coefficient design effect diagnostics."""
+
+    @staticmethod
+    def _fit_lr_with_survey(data, sd):
+        """Fit a LinearRegression with survey design for testing DEFF."""
+        from diff_diff.linalg import LinearRegression
+        from diff_diff.survey import _resolve_survey_for_fit
+
+        resolved, sw, swt, sm = _resolve_survey_for_fit(sd, data, "analytical")
+        y = data["outcome"].values
+        X = np.column_stack([
+            np.ones(len(data)),
+            data["treated"].values,
+            data["post"].values,
+            (data["treated"] * data["post"]).values,
+        ])
+        reg = LinearRegression(
+            include_intercept=False, weights=sw,
+            weight_type=swt, survey_design=resolved,
+        )
+        reg.fit(X, y)
+        return reg
+
+    def test_uniform_weights_deff_near_one(self, basic_did_data):
+        """With uniform weights and no clustering, DEFF should be ~1."""
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+        data["uniform_wt"] = 1.0
+
+        sd = SurveyDesign(weights="uniform_wt")
+        reg = self._fit_lr_with_survey(data, sd)
+        deff = reg.compute_deff()
+        assert isinstance(deff, DEFFDiagnostics)
+        # With uniform weights, DEFF should be close to 1
+        for d in deff.deff:
+            if np.isfinite(d):
+                assert d == pytest.approx(1.0, abs=0.15)
+
+    def test_clustered_design_deff_gt_one(self, basic_did_data):
+        """With clustered design, DEFF should be > 1 for at least some coefs."""
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+
+        sd = SurveyDesign(
+            weights="weight", strata="stratum", psu="psu", nest=True,
+        )
+        reg = self._fit_lr_with_survey(data, sd)
+        deff = reg.compute_deff()
+        # At least one coefficient should have DEFF > 1 (design effect)
+        assert np.any(deff.deff[np.isfinite(deff.deff)] > 1.0)
+
+    def test_deff_srs_se_matches_robust_vcov(self, basic_did_data):
+        """SRS SE in DEFF diagnostics should match compute_robust_vcov."""
+        from diff_diff.linalg import compute_robust_vcov
+
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        reg = self._fit_lr_with_survey(data, sd)
+        deff = reg.compute_deff()
+
+        # Manually compute SRS SE
+        srs_vcov = compute_robust_vcov(
+            reg._X, reg.residuals_,
+            cluster_ids=None, weights=reg.weights,
+            weight_type=reg.weight_type,
+        )
+        expected_srs_se = np.sqrt(np.diag(srs_vcov))
+        np.testing.assert_allclose(deff.srs_se, expected_srs_se, rtol=1e-10)
+
+    def test_standalone_compute_deff_diagnostics(self, basic_did_data):
+        """compute_deff_diagnostics() works as a standalone function."""
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        reg = self._fit_lr_with_survey(data, sd)
+        deff = compute_deff_diagnostics(
+            reg._X, reg.residuals_, reg.vcov_,
+            reg.weights, weight_type=reg.weight_type,
+            coefficient_names=["intercept", "treated", "post", "treated:post"],
+        )
+        assert deff.coefficient_names is not None
+        assert len(deff.coefficient_names) == 4
+        assert len(deff.deff) == 4
+
+    def test_deff_effective_n(self, basic_did_data):
+        """effective_n should be n / DEFF."""
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+
+        sd = SurveyDesign(weights="weight", strata="stratum", psu="psu")
+        reg = self._fit_lr_with_survey(data, sd)
+        deff = reg.compute_deff()
+        n = reg.n_obs_
+        for i in range(len(deff.deff)):
+            if np.isfinite(deff.deff[i]) and deff.deff[i] > 0:
+                assert deff.effective_n[i] == pytest.approx(
+                    n / deff.deff[i], rel=1e-10
+                )
+
+    def test_compute_deff_no_survey_raises(self, basic_did_data):
+        """compute_deff() without survey should raise."""
+        from diff_diff.linalg import LinearRegression
+
+        data = basic_did_data.copy()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+
+        y = data["outcome"].values
+        X = np.column_stack([np.ones(len(data)), data["treated"].values])
+        reg = LinearRegression(include_intercept=False)
+        reg.fit(X, y)
+        with pytest.raises(ValueError, match="survey design"):
+            reg.compute_deff()
+
+
+# =============================================================================
+# Replicate Weight Variance
+# =============================================================================
+
+
+@pytest.fixture
+def replicate_data():
+    """Data with full-sample + replicate weight columns for testing."""
+    np.random.seed(123)
+    n = 200
+    k = 20  # number of replicates
+
+    # Simple regression: y = 1 + 2*x + eps
+    x = np.random.randn(n)
+    eps = np.random.randn(n) * 0.5
+    y = 1.0 + 2.0 * x + eps
+
+    # Full-sample weights (non-uniform)
+    w = 1.0 + np.random.exponential(0.5, n)
+
+    data = pd.DataFrame({"x": x, "y": y, "weight": w})
+
+    # Generate JK1 replicate weights (delete-1 jackknife on PSU clusters)
+    # Each replicate drops one cluster and rescales remaining weights
+    cluster_size = n // k
+    for r in range(k):
+        w_r = w.copy()
+        start = r * cluster_size
+        end = min((r + 1) * cluster_size, n)
+        w_r[start:end] = 0.0
+        # Rescale remaining: multiply by k/(k-1)
+        w_r[w_r > 0] *= k / (k - 1)
+        data[f"rep_{r}"] = w_r
+
+    return data, [f"rep_{r}" for r in range(k)]
+
+
+class TestReplicateWeightVariance:
+    """Tests for replicate weight variance estimation."""
+
+    def test_brr_validation(self):
+        """BRR validation: method + weights required together."""
+        with pytest.raises(ValueError, match="replicate_method must be provided"):
+            SurveyDesign(weights="w", replicate_weights=["r1", "r2"])
+        with pytest.raises(ValueError, match="replicate_weights must be provided"):
+            SurveyDesign(weights="w", replicate_method="BRR")
+
+    def test_fay_rho_validation(self):
+        """Fay requires rho in (0, 1)."""
+        with pytest.raises(ValueError, match="fay_rho"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="Fay", fay_rho=0.0,
+            )
+        with pytest.raises(ValueError, match="fay_rho"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="BRR", fay_rho=0.5,
+            )
+
+    def test_replicate_strata_mutual_exclusion(self):
+        """Replicate weights cannot be combined with strata/psu/fpc."""
+        with pytest.raises(ValueError, match="cannot be combined"):
+            SurveyDesign(
+                weights="w", strata="s",
+                replicate_weights=["r1", "r2"], replicate_method="BRR",
+            )
+
+    def test_replicate_requires_weights(self):
+        """Full-sample weights required alongside replicates."""
+        with pytest.raises(ValueError, match="Full-sample weights"):
+            SurveyDesign(
+                replicate_weights=["r1", "r2"], replicate_method="BRR",
+            )
+
+    def test_jk1_resolve(self, replicate_data):
+        """JK1 replicates resolve correctly."""
+        data, rep_cols = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        resolved = sd.resolve(data)
+        assert resolved.uses_replicate_variance
+        assert resolved.n_replicates == len(rep_cols)
+        assert resolved.replicate_weights.shape == (len(data), len(rep_cols))
+        assert resolved.strata is None  # Short-circuited
+        assert resolved.df_survey == len(rep_cols) - 1
+
+    def test_jk1_vcov_finite(self, replicate_data):
+        """JK1 replicate vcov produces finite results."""
+        from diff_diff.linalg import LinearRegression
+
+        data, rep_cols = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        resolved = sd.resolve(data)
+        y = data["y"].values
+        X = np.column_stack([np.ones(len(data)), data["x"].values])
+
+        reg = LinearRegression(
+            include_intercept=False, weights=resolved.weights,
+            weight_type="pweight", survey_design=resolved,
+        )
+        reg.fit(X, y)
+
+        assert np.all(np.isfinite(reg.coefficients_))
+        assert np.all(np.isfinite(np.diag(reg.vcov_)))
+        # SE should be positive
+        se = np.sqrt(np.diag(reg.vcov_))
+        assert np.all(se > 0)
+
+    def test_brr_vcov(self, replicate_data):
+        """BRR uses 1/R factor."""
+        from diff_diff.survey import compute_replicate_vcov
+
+        data, rep_cols = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="BRR",
+        )
+        resolved = sd.resolve(data)
+        y = data["y"].values
+        X = np.column_stack([np.ones(len(data)), data["x"].values])
+
+        from diff_diff.linalg import solve_ols
+        coef, _, _ = solve_ols(
+            X, y, weights=resolved.weights, weight_type="pweight",
+        )
+
+        vcov, _nv = compute_replicate_vcov(X, y, coef, resolved)
+        assert np.all(np.isfinite(np.diag(vcov)))
+
+    def test_fay_inflates_over_brr(self, replicate_data):
+        """Fay's BRR with rho > 0 produces larger variance than BRR."""
+        from diff_diff.survey import compute_replicate_vcov
+        from diff_diff.linalg import solve_ols
+
+        data, rep_cols = replicate_data
+        y = data["y"].values
+        X = np.column_stack([np.ones(len(data)), data["x"].values])
+
+        sd_brr = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="BRR",
+        )
+        resolved_brr = sd_brr.resolve(data)
+        coef, _, _ = solve_ols(X, y, weights=resolved_brr.weights)
+        vcov_brr, _nv = compute_replicate_vcov(X, y, coef, resolved_brr)
+
+        sd_fay = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="Fay", fay_rho=0.5,
+        )
+        resolved_fay = sd_fay.resolve(data)
+        vcov_fay, _nv = compute_replicate_vcov(X, y, coef, resolved_fay)
+
+        # Fay variance = BRR variance / (1-rho)^2 > BRR variance
+        assert np.all(np.diag(vcov_fay) > np.diag(vcov_brr))
+
+    def test_replicate_metadata(self, replicate_data):
+        """SurveyMetadata should include replicate info."""
+        from diff_diff.survey import compute_survey_metadata
+
+        data, rep_cols = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        resolved = sd.resolve(data)
+        sm = compute_survey_metadata(resolved, data["weight"].values)
+        assert sm.replicate_method == "JK1"
+        assert sm.n_replicates == len(rep_cols)
+        assert sm.df_survey == len(rep_cols) - 1
+
+    def test_replicate_rejected_by_base_did(self, replicate_data):
+        """DifferenceInDifferences rejects replicate-weight designs."""
+        data, rep_cols = replicate_data
+        n = len(data)
+        data["treated"] = (np.arange(n) < n // 2).astype(int)
+        data["post"] = (np.arange(n) % 4 >= 2).astype(int)
+        data["outcome"] = data["y"] + 1.5 * data["treated"] * data["post"]
+
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="DifferenceInDifferences"):
+            DifferenceInDifferences().fit(
+                data, outcome="outcome", treatment="treated", time="post",
+                survey_design=sd,
+            )
+
+    def test_replicate_if_variance(self, replicate_data):
+        """IF-based replicate variance produces finite results."""
+        from diff_diff.survey import compute_replicate_if_variance
+
+        data, rep_cols = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        resolved = sd.resolve(data)
+
+        # Synthetic influence function
+        psi = np.random.randn(len(data)) * 0.1
+        var, _nv = compute_replicate_if_variance(psi, resolved)
+        assert np.isfinite(var)
+        assert var >= 0
+
+    def test_min_2_replicates(self, replicate_data):
+        """Need at least 2 replicate columns."""
+        data, _ = replicate_data
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=["rep_0"],
+            replicate_method="BRR",
+        )
+        with pytest.raises(ValueError, match="At least 2"):
+            sd.resolve(data)
+
+    def test_jkn_requires_replicate_strata(self):
+        """JKn must have replicate_strata."""
+        with pytest.raises(ValueError, match="replicate_strata is required"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="JKn",
+            )
+
+    def test_jkn_replicate_strata_length_mismatch(self):
+        """replicate_strata length must match replicate_weights."""
+        with pytest.raises(ValueError, match="length"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="JKn", replicate_strata=[0],
+            )
+
+    def test_jkn_variance(self, replicate_data):
+        """JKn with per-stratum scaling produces finite results."""
+        from diff_diff.survey import compute_replicate_vcov
+        from diff_diff.linalg import solve_ols
+
+        data, rep_cols = replicate_data
+        # Assign first half of replicates to stratum 0, second half to stratum 1
+        n_rep = len(rep_cols)
+        strata = [0] * (n_rep // 2) + [1] * (n_rep - n_rep // 2)
+
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JKn", replicate_strata=strata,
+        )
+        resolved = sd.resolve(data)
+        y = data["y"].values
+        X = np.column_stack([np.ones(len(data)), data["x"].values])
+
+        coef, _, _ = solve_ols(X, y, weights=resolved.weights)
+        vcov, _nv = compute_replicate_vcov(X, y, coef, resolved)
+        assert np.all(np.isfinite(np.diag(vcov)))
+        assert np.all(np.diag(vcov) > 0)
+
+    def test_replicate_if_scale_matches_analytical(self):
+        """Replicate IF variance should match analytical sum(psi^2) for SRS."""
+        from diff_diff.survey import compute_replicate_if_variance
+
+        np.random.seed(42)
+        n = 50
+        psi = np.random.randn(n) * 0.1
+
+        # Build JK1 replicates: delete-1 jackknife
+        w_full = np.ones(n)
+        rep_cols = []
+        data = pd.DataFrame({"w": w_full})
+        for r in range(n):
+            w_r = np.ones(n) * n / (n - 1)
+            w_r[r] = 0.0
+            col = f"rep_{r}"
+            data[col] = w_r
+            rep_cols.append(col)
+
+        sd = SurveyDesign(
+            weights="w", replicate_weights=rep_cols, replicate_method="JK1",
+        )
+        resolved = sd.resolve(data)
+
+        v_rep, _nv = compute_replicate_if_variance(psi, resolved)
+        v_analytical = float(np.sum(psi**2))
+
+        # JK1 gives (n-1)/n * sum(...) which should approximate sum(psi^2)
+        assert v_rep == pytest.approx(v_analytical, rel=0.05), (
+            f"Replicate IF variance {v_rep:.6f} does not match analytical "
+            f"{v_analytical:.6f}"
+        )
+
+    def test_replicate_if_matches_survey_if_variance(self):
+        """Replicate IF variance should approximate compute_survey_if_variance
+        when JK1 replicates match the PSU structure."""
+        from diff_diff.survey import (
+            compute_replicate_if_variance,
+            compute_survey_if_variance,
+            ResolvedSurveyDesign,
+        )
+
+        np.random.seed(123)
+        n = 60
+        n_psu = 15
+        psu = np.repeat(np.arange(n_psu), n // n_psu)
+        psi = np.random.randn(n) * 0.1
+        weights = np.ones(n)
+
+        # TSL variance with PSU clustering
+        resolved_tsl = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=psu, fpc=None,
+            n_strata=0, n_psu=n_psu, lonely_psu="remove",
+        )
+        v_tsl = compute_survey_if_variance(psi, resolved_tsl)
+
+        # JK1 replicates: delete one PSU at a time, rescale remaining
+        rep_arr = np.zeros((n, n_psu))
+        for r in range(n_psu):
+            w_r = weights.copy()
+            w_r[psu == r] = 0.0
+            w_r[psu != r] *= n_psu / (n_psu - 1)
+            rep_arr[:, r] = w_r
+        # Normalize each column to sum=n (matching resolve() normalization)
+        for c in range(n_psu):
+            cs = rep_arr[:, c].sum()
+            if cs > 0:
+                rep_arr[:, c] *= n / cs
+
+        resolved_rep = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1",
+            n_replicates=n_psu,
+        )
+        v_rep, _nv = compute_replicate_if_variance(psi, resolved_rep)
+
+        # Should be in the same ballpark (within 50% — different estimators
+        # of the same quantity)
+        assert v_rep > 0 and v_tsl > 0
+        ratio = v_rep / v_tsl
+        assert 0.5 < ratio < 2.0, (
+            f"Replicate IF var ({v_rep:.6f}) vs TSL IF var ({v_tsl:.6f}) "
+            f"ratio={ratio:.2f}, expected within [0.5, 2.0]"
+        )
+
+    def test_all_zero_weights_rejected_by_solve_ols(self):
+        """solve_ols should reject all-zero weight vectors."""
+        from diff_diff.linalg import solve_ols
+
+        X = np.column_stack([np.ones(10), np.random.randn(10)])
+        y = np.random.randn(10)
+        with pytest.raises(ValueError, match="sum to zero"):
+            solve_ols(X, y, weights=np.zeros(10))
+
+    def test_solve_logit_rejects_single_class_positive_weight(self):
+        """solve_logit should reject when positive-weight obs have one class."""
+        from diff_diff.linalg import solve_logit
+
+        n = 20
+        X = np.column_stack([np.ones(n), np.random.randn(n)])
+        y = np.array([1] * 10 + [0] * 10, dtype=float)
+        w = np.ones(n)
+        # Zero out all class-0 obs — only class 1 remains
+        w[10:] = 0.0
+        with pytest.raises(ValueError, match="outcome class"):
+            solve_logit(X, y, weights=w)
+
+    def test_solve_logit_rejects_too_few_positive_weight_obs(self):
+        """solve_logit should reject when too few positive-weight obs."""
+        from diff_diff.linalg import solve_logit
+
+        n = 20
+        X = np.column_stack([np.ones(n), np.random.randn(n)])
+        y = np.array([1] * 10 + [0] * 10, dtype=float)
+        w = np.zeros(n)
+        # Only 2 positive-weight obs (one per class), but 2 params
+        w[0] = 1.0  # class 1
+        w[10] = 1.0  # class 0
+        with pytest.raises(ValueError, match="Cannot identify"):
+            solve_logit(X, y, weights=w)
+
+    def test_solve_logit_rank_deficient_positive_weight_subset_error_mode(self):
+        """solve_logit with rank_deficient_action='error' should reject when
+        positive-weight subset is rank-deficient."""
+        from diff_diff.linalg import solve_logit
+
+        n = 20
+        x1 = np.random.randn(n)
+        x2 = np.random.randn(n)
+        X = np.column_stack([x1, x2])
+        y = np.array([1] * 10 + [0] * 10, dtype=float)
+        w = np.zeros(n)
+        for i in [0, 1, 2, 10, 11, 12]:
+            w[i] = 1.0
+            X[i, 1] = 5.0  # x2 constant → rank deficient in subset
+        with pytest.raises(ValueError, match="rank-deficient"):
+            solve_logit(X, y, weights=w, rank_deficient_action="error")
+
+    def test_solve_logit_rank_deficient_positive_weight_subset_warn_mode(self):
+        """solve_logit with rank_deficient_action='warn' should warn but not
+        error on rank-deficient positive-weight subset."""
+        from diff_diff.linalg import solve_logit
+
+        n = 20
+        x1 = np.random.randn(n)
+        x2 = np.random.randn(n)
+        X = np.column_stack([x1, x2])
+        y = np.array([1] * 10 + [0] * 10, dtype=float)
+        w = np.zeros(n)
+        for i in [0, 1, 2, 10, 11, 12]:
+            w[i] = 1.0
+            X[i, 1] = 5.0
+        with pytest.warns(UserWarning, match="rank-deficient"):
+            beta, probs = solve_logit(X, y, weights=w, rank_deficient_action="warn")
+        # Key assertion: beta must have original p+1 length (intercept + 2 covariates)
+        assert len(beta) == 3, (
+            f"Expected beta length 3 (p+1), got {len(beta)}. "
+            f"Column dropping broke the coefficient vector shape."
+        )
+
+    def test_subpopulation_string_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with string values should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        mask = ["yes"] * len(basic_did_data)
+        mask[0] = "no"
+        with pytest.raises(ValueError, match="string"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_nonbinary_numeric_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with non-binary numeric codes should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        # Coded domain column {1, 2} — not boolean, should fail
+        mask = np.ones(len(basic_did_data), dtype=int)
+        mask[:10] = 2
+        with pytest.raises(ValueError, match="non-binary"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_scale_invariance_combined_weights(self):
+        """Multiplying all replicate columns by a constant should not change SE."""
+        from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign
+
+        np.random.seed(42)
+        n = 40
+        R = 10
+        psi = np.random.randn(n) * 0.1
+        weights = 1.0 + np.random.exponential(0.3, n)
+
+        # Build replicate weights (combined: include full-sample weight)
+        rep_arr = np.zeros((n, R))
+        cluster_size = n // R
+        for r in range(R):
+            w_r = weights.copy()
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n)
+            w_r[start:end] = 0.0
+            w_r[w_r > 0] *= R / (R - 1)
+            rep_arr[:, r] = w_r
+
+        resolved1 = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+            combined_weights=True,
+        )
+        v1, _ = compute_replicate_if_variance(psi, resolved1)
+
+        # Scale all replicate columns by 3x
+        resolved2 = ResolvedSurveyDesign(
+            weights=weights * 3, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr * 3,
+            replicate_method="JK1", n_replicates=R,
+            combined_weights=True,
+        )
+        v2, _ = compute_replicate_if_variance(psi, resolved2)
+
+        assert v1 == pytest.approx(v2, rel=1e-10), (
+            f"Scale invariance violated: v1={v1:.8f} vs v2={v2:.8f}"
+        )
+
+    def test_combined_weights_false(self):
+        """combined_weights=False treats replicate cols as perturbation factors."""
+        sd = SurveyDesign(
+            weights="w", replicate_weights=["r1", "r2"],
+            replicate_method="BRR", combined_weights=False,
+        )
+        assert sd.combined_weights is False
+
+    def test_custom_scale(self):
+        """Custom replicate_scale overrides default factor."""
+        sd = SurveyDesign(
+            weights="w", replicate_weights=["r1", "r2"],
+            replicate_method="BRR", replicate_scale=0.5,
+        )
+        assert sd.replicate_scale == 0.5
+
+    def test_scale_rscales_multiplicative(self):
+        """scale and rscales are applied multiplicatively for JK1."""
+        from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign
+        from diff_diff.linalg import solve_ols
+
+        np.random.seed(42)
+        n, R = 100, 5
+        x = np.random.randn(n)
+        y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5
+        X = np.column_stack([np.ones(n), x])
+        w = np.ones(n)
+
+        cluster_size = n // R
+        rep_arr = np.ones((n, R))
+        for r in range(R):
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n)
+            rep_arr[start:end, r] = 0.0
+            rep_arr[~((np.arange(n) >= start) & (np.arange(n) < end)), r] *= R / (R - 1)
+
+        coef, _, _ = solve_ols(X, y, weights=w)
+
+        # rscales only
+        rscales = np.array([0.5, 1.0, 1.5, 1.0, 0.5])
+        resolved_rscales = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+            replicate_rscales=rscales,
+        )
+        vcov_rscales, _ = compute_replicate_vcov(X, y, coef, resolved_rscales)
+
+        # scale * rscales should multiply
+        scale_val = 2.0
+        resolved_both = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+            replicate_scale=scale_val, replicate_rscales=rscales,
+        )
+        vcov_both, _ = compute_replicate_vcov(X, y, coef, resolved_both)
+
+        # V(scale+rscales) == scale * V(rscales)
+        assert np.allclose(np.diag(vcov_both), scale_val * np.diag(vcov_rscales), rtol=1e-10)
+
+    def test_brr_ignores_custom_scaling(self):
+        """BRR ignores custom scale/rscales with a warning."""
+        import warnings
+        from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign
+        from diff_diff.linalg import solve_ols
+
+        np.random.seed(42)
+        n, R = 100, 5
+        x = np.random.randn(n)
+        y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5
+        X = np.column_stack([np.ones(n), x])
+        w = np.ones(n)
+        rep_arr = np.random.uniform(0.8, 1.2, (n, R))
+
+        coef, _, _ = solve_ols(X, y, weights=w)
+
+        # Default BRR
+        resolved_default = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="BRR", n_replicates=R,
+        )
+        vcov_default, _ = compute_replicate_vcov(X, y, coef, resolved_default)
+
+        # BRR with custom scale (should be ignored with warning)
+        resolved_custom = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="BRR", n_replicates=R,
+            replicate_scale=99.0,
+        )
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            vcov_custom, _ = compute_replicate_vcov(X, y, coef, resolved_custom)
+        assert any("ignored" in str(w.message) for w in caught)
+        assert np.allclose(np.diag(vcov_default), np.diag(vcov_custom), rtol=1e-10)
+
+    def test_replicate_if_no_divide_by_zero_warning(self):
+        """compute_replicate_if_variance should not warn on zero weights."""
+        from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign
+
+        n = 20
+        psi = np.random.randn(n)
+        weights = np.ones(n)
+        weights[:5] = 0.0  # Some zero full-sample weights (subpopulation)
+
+        rep_arr = np.ones((n, 5))
+        rep_arr[:5, :] = 0.0  # Match zero pattern
+        for r in range(5):
+            cs = rep_arr[:, r].sum()
+            if cs > 0:
+                rep_arr[:, r] *= n / cs
+
+        resolved = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=5,
+        )
+        import warnings
+        with warnings.catch_warnings():
+            warnings.simplefilter("error", RuntimeWarning)
+            # Should NOT raise RuntimeWarning for divide by zero
+            v, _nv = compute_replicate_if_variance(psi, resolved)
+            assert np.isfinite(v)
+
+
+class TestReplicateEdgeCases:
+    """Regression tests for analysis-weight df and rscales centering."""
+
+    def test_df_survey_combined_weights_false(self):
+        """df_survey uses analysis-weight rank when combined_weights=False."""
+        from diff_diff.survey import ResolvedSurveyDesign
+
+        np.random.seed(42)
+        n = 50
+        R = 5
+        weights = 1.0 + np.random.exponential(0.5, n)
+        # Perturbation factors (not full weights)
+        rep_factors = np.random.uniform(0.8, 1.2, (n, R))
+
+        resolved = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_factors,
+            replicate_method="BRR", n_replicates=R,
+            combined_weights=False,
+        )
+        # df should match pivoted QR rank of analysis weights
+        analysis_weights = rep_factors * weights[:, np.newaxis]
+        from scipy.linalg import qr as scipy_qr
+        _, R_mat, _ = scipy_qr(analysis_weights, pivoting=True, mode='economic')
+        diag_abs = np.abs(np.diag(R_mat))
+        expected_rank = int(np.sum(diag_abs > 1e-5 * diag_abs.max()))
+        expected_df = expected_rank - 1 if expected_rank > 1 else None
+        assert resolved.df_survey == expected_df
+
+        # Verify it differs from raw perturbation-factor rank when weights
+        # cause a rank reduction (e.g., zero full-sample weights)
+        weights_with_zeros = weights.copy()
+        weights_with_zeros[:10] = 0.0  # subpopulation-zeroed
+        resolved2 = ResolvedSurveyDesign(
+            weights=weights_with_zeros, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_factors,
+            replicate_method="BRR", n_replicates=R,
+            combined_weights=False,
+        )
+        analysis_z = rep_factors * weights_with_zeros[:, np.newaxis]
+        _, R_z, _ = scipy_qr(analysis_z, pivoting=True, mode='economic')
+        diag_z = np.abs(np.diag(R_z))
+        z_rank = int(np.sum(diag_z > 1e-5 * diag_z.max())) if diag_z.max() > 0 else 0
+        z_df = z_rank - 1 if z_rank > 1 else None
+        assert resolved2.df_survey == z_df
+
+    def test_rscales_zero_centering_vcov(self):
+        """mse=False with zero rscales: center only on rscales > 0 replicates."""
+        from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign
+        from diff_diff.linalg import solve_ols
+
+        np.random.seed(42)
+        n = 100
+        R = 6
+        x = np.random.randn(n)
+        y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5
+        X = np.column_stack([np.ones(n), x])
+        w = np.ones(n)
+
+        # Build JK1-style replicates
+        cluster_size = n // R
+        rep_arr = np.ones((n, R))
+        for r in range(R):
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n)
+            rep_arr[start:end, :] = 0.0
+            # Correct column r only
+            rep_arr[:, r] = np.where(
+                (np.arange(n) >= start) & (np.arange(n) < end), 0.0,
+                R / (R - 1)
+            )
+
+        # rscales with one zero entry
+        rscales = np.array([1.0, 1.0, 0.0, 1.0, 1.0, 1.0])
+
+        coef, _, _ = solve_ols(X, y, weights=w)
+
+        resolved = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+            replicate_rscales=rscales, mse=False,
+        )
+        vcov, _nv = compute_replicate_vcov(X, y, coef, resolved)
+
+        # Manual computation: center only on replicates with rscales > 0
+        coef_reps = []
+        for r in range(R):
+            c_r, _, _ = solve_ols(X, y, weights=rep_arr[:, r])
+            coef_reps.append(c_r)
+        coef_reps = np.array(coef_reps)
+        pos_mask = rscales > 0
+        center = np.mean(coef_reps[pos_mask], axis=0)
+        diffs = coef_reps - center[np.newaxis, :]
+        V_manual = np.zeros((2, 2))
+        for r in range(R):
+            V_manual += rscales[r] * np.outer(diffs[r], diffs[r])
+
+        assert np.allclose(np.diag(vcov), np.diag(V_manual), rtol=1e-10)
+
+    def test_rscales_zero_centering_if(self):
+        """mse=False with zero rscales: IF path centers only on rscales > 0."""
+        from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign
+
+        np.random.seed(42)
+        n = 50
+        R = 5
+        psi = np.random.randn(n) * 0.1
+        w = np.ones(n)
+
+        # Build simple replicates
+        rep_arr = np.ones((n, R))
+        for r in range(R):
+            start = r * (n // R)
+            end = min((r + 1) * (n // R), n)
+            rep_arr[start:end, r] = 0.0
+            rep_arr[:, r] = np.where(
+                (np.arange(n) >= start) & (np.arange(n) < end), 0.0,
+                R / (R - 1)
+            )
+
+        rscales = np.array([1.0, 0.0, 1.0, 1.0, 1.0])
+
+        resolved = ResolvedSurveyDesign(
+            weights=w, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+            replicate_rscales=rscales, mse=False,
+        )
+        var, _nv = compute_replicate_if_variance(psi, resolved)
+
+        # Manual: theta_r = sum((w_r/w) * psi), center on rscales > 0 only
+        theta_full = float(np.sum(psi))
+        theta_reps = np.array([
+            float(np.sum(np.divide(rep_arr[:, r], w, out=np.zeros(n), where=w > 0) * psi))
+            for r in range(R)
+        ])
+        pos_mask = rscales > 0
+        center = float(np.mean(theta_reps[pos_mask]))
+        diffs = theta_reps - center
+        var_manual = float(np.sum(rscales * diffs**2))
+
+        assert var == pytest.approx(var_manual, rel=1e-10)
+
+    def test_rank_one_replicate_nan_inference(self):
+        """Rank-1 replicate design yields df_survey=None and NaN inference."""
+        from diff_diff.survey import ResolvedSurveyDesign
+        from diff_diff.utils import safe_inference
+
+        np.random.seed(42)
+        n = 50
+        R = 3
+        weights = np.ones(n)
+        # All replicate columns identical → rank 1
+        col = 1.0 + np.random.randn(n) * 0.1
+        rep_arr = np.column_stack([col, col, col])
+
+        resolved = ResolvedSurveyDesign(
+            weights=weights, weight_type="pweight",
+            strata=None, psu=None, fpc=None,
+            n_strata=0, n_psu=0, lonely_psu="remove",
+            replicate_weights=rep_arr,
+            replicate_method="JK1", n_replicates=R,
+        )
+        # Rank-1 → df_survey should be None
+        assert resolved.df_survey is None
+
+        # Passing df=0 (the guard value) should produce NaN p-value and CI
+        _, p_val, ci = safe_inference(1.0, 0.5, df=0)
+        assert np.isnan(p_val)
+        assert np.isnan(ci[0]) and np.isnan(ci[1])
+
+
+# =============================================================================
+# Estimator-Level Replicate Weight Tests
+# =============================================================================
+
+
+class TestEstimatorReplicateWeights:
+    """End-to-end tests for replicate weights with each estimator."""
+
+    @staticmethod
+    def _make_staggered_replicate_data():
+        """Create staggered panel with replicate weight columns."""
+        np.random.seed(42)
+        n_units, n_periods = 60, 6
+        n_rep = 15
+        rows = []
+        for unit in range(n_units):
+            if unit < 20:
+                ft = 3
+            elif unit < 40:
+                ft = 5
+            else:
+                ft = 0
+            wt = 1.0 + 0.3 * (unit % 5)
+            for t in range(1, n_periods + 1):
+                y = 10.0 + unit * 0.05 + t * 0.2
+                if ft > 0 and t >= ft:
+                    y += 2.0
+                y += np.random.normal(0, 0.5)
+                rows.append({
+                    "unit": unit, "time": t, "first_treat": ft,
+                    "outcome": y, "weight": wt,
+                })
+        data = pd.DataFrame(rows)
+        # Generate JK1 replicates (delete-cluster jackknife)
+        cluster_size = n_units // n_rep
+        rep_cols = []
+        for r in range(n_rep):
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n_units)
+            w_r = data["weight"].values.copy()
+            mask = (data["unit"] >= start) & (data["unit"] < end)
+            w_r[mask] = 0.0
+            w_r[~mask] *= n_rep / (n_rep - 1)
+            col = f"rep_{r}"
+            data[col] = w_r
+            rep_cols.append(col)
+        return data, rep_cols
+
+    def test_callaway_santanna_replicate(self):
+        """CallawaySantAnna with replicate weights produces finite results."""
+        from diff_diff import CallawaySantAnna
+
+        data, rep_cols = self._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        result = CallawaySantAnna(estimation_method="reg", n_bootstrap=0).fit(
+            data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_se)
+        assert result.survey_metadata is not None
+
+    def test_efficient_did_replicate(self):
+        """EfficientDiD with replicate weights produces finite results."""
+        from diff_diff import EfficientDiD
+
+        data, rep_cols = self._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        result = EfficientDiD(n_bootstrap=0).fit(
+            data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_se)
+        assert result.survey_metadata is not None
+
+    def test_subpopulation_preserves_replicate_metadata(self):
+        """subpopulation() should preserve replicate design fields."""
+        data, rep_cols = self._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        mask = data["unit"] < 40
+        new_sd, new_data = sd.subpopulation(data, mask)
+
+        assert new_sd.replicate_method == "JK1"
+        assert new_sd.replicate_weights == rep_cols
+        # Excluded obs have zero replicate weights
+        excluded = new_data[~mask.values]
+        for col in rep_cols:
+            assert (excluded[col] == 0.0).all()
+
+    def test_continuous_did_replicate_bootstrap_rejected(self):
+        """ContinuousDiD should reject replicate weights + n_bootstrap > 0."""
+        from diff_diff import ContinuousDiD
+
+        data, rep_cols = self._make_staggered_replicate_data()
+        data["dose"] = np.where(data["first_treat"] > 0, 1.0 + 0.5 * data["unit"] % 3, 0.0)
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="not supported"):
+            ContinuousDiD(n_bootstrap=30, seed=42).fit(
+                data, "outcome", "unit", "time", "first_treat", "dose",
+                survey_design=sd,
+            )
+
+    @pytest.mark.parametrize("est_method", ["reg", "ipw", "dr"])
+    def test_triple_diff_replicate_all_methods(self, est_method):
+        """TripleDifference with replicate weights works for reg/ipw/dr."""
+        from diff_diff import TripleDifference
+
+        np.random.seed(42)
+        n = 200
+        n_rep = 10
+        d1 = np.repeat([0, 1], n // 2)
+        d2 = np.tile([0, 1], n // 2)
+        post = np.random.choice([0, 1], n)
+        y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5
+        w = 1.0 + np.random.exponential(0.3, n)
+
+        data = pd.DataFrame({
+            "y": y, "d1": d1, "d2": d2, "post": post, "weight": w,
+        })
+        # Build JK1 replicates
+        cluster_size = n // n_rep
+        rep_cols = []
+        for r in range(n_rep):
+            w_r = w.copy()
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n)
+            w_r[start:end] = 0.0
+            w_r[w_r > 0] *= n_rep / (n_rep - 1)
+            col = f"rep_{r}"
+            data[col] = w_r
+            rep_cols.append(col)
+
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        result = TripleDifference(estimation_method=est_method).fit(
+            data, outcome="y", group="d1", partition="d2",
+            time="post", survey_design=sd,
+        )
+        assert np.isfinite(result.att)
+        assert np.isfinite(result.se)
+        assert result.survey_metadata is not None
+
+    def test_sun_abraham_replicate_rejected(self):
+        """SunAbraham should reject replicate-weight survey designs."""
+        from diff_diff import SunAbraham
+
+        data, rep_cols = self._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="SunAbraham"):
+            SunAbraham(n_bootstrap=0).fit(
+                data, "outcome", "unit", "time", "first_treat",
+                survey_design=sd,
+            )
+
+
+class TestSubpopulationMaskValidation:
+    """Tests for subpopulation mask edge cases."""
+
+    def test_nan_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with NaN should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        nan_mask = np.ones(len(basic_did_data))
+        nan_mask[0] = np.nan
+        with pytest.raises(ValueError, match="NA|NaN|missing"):
+            sd.subpopulation(basic_did_data, nan_mask)
+
+    def test_none_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with None should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        mask = [True] * len(basic_did_data)
+        mask[0] = None
+        with pytest.raises(ValueError, match="None|NA|missing"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_pd_na_mask_rejected(self, basic_did_data):
+        """Subpopulation mask with pd.NA should be rejected."""
+        sd = SurveyDesign(weights="weight")
+        mask = pd.array([True] * len(basic_did_data), dtype=pd.BooleanDtype())
+        mask[0] = pd.NA
+        with pytest.raises(ValueError, match="NA|missing"):
+            sd.subpopulation(basic_did_data, mask)
+
+    def test_efficient_did_replicate_bootstrap_rejected(self):
+        """EfficientDiD should reject replicate weights + n_bootstrap > 0."""
+        from diff_diff import EfficientDiD
+
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="not supported"):
+            EfficientDiD(n_bootstrap=30, seed=42).fit(
+                data, "outcome", "unit", "time", "first_treat",
+                survey_design=sd,
+            )
+
+    def test_callaway_santanna_replicate_bootstrap_rejected(self):
+        """CallawaySantAnna should reject replicate weights + n_bootstrap > 0."""
+        from diff_diff import CallawaySantAnna
+
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="not supported"):
+            CallawaySantAnna(estimation_method="reg", n_bootstrap=30, seed=42).fit(
+                data, "outcome", "unit", "time", "first_treat",
+                survey_design=sd,
+            )
+
+    def test_twfe_replicate_rejected(self):
+        """TwoWayFixedEffects should reject replicate-weight designs."""
+        from diff_diff.twfe import TwoWayFixedEffects
+
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="TwoWayFixedEffects"):
+            TwoWayFixedEffects().fit(
+                data, outcome="outcome", treatment="first_treat",
+                unit="unit", time="time", survey_design=sd,
+            )
+
+    def test_stacked_did_replicate_rejected(self):
+        """StackedDiD should reject replicate-weight designs."""
+        from diff_diff import StackedDiD
+
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        with pytest.raises(NotImplementedError, match="StackedDiD"):
+            StackedDiD().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                first_treat="first_treat", survey_design=sd,
+            )
+
+    def test_invalid_replicate_scale_rejected(self):
+        """Negative or zero replicate_scale should be rejected."""
+        with pytest.raises(ValueError, match="positive finite"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="JK1", replicate_scale=-1.0,
+            )
+        with pytest.raises(ValueError, match="positive finite"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="JK1", replicate_scale=0.0,
+            )
+
+    def test_invalid_replicate_rscales_rejected(self):
+        """Negative replicate_rscales should be rejected."""
+        with pytest.raises(ValueError, match="non-negative"):
+            SurveyDesign(
+                weights="w", replicate_weights=["r1", "r2"],
+                replicate_method="JK1", replicate_rscales=[-1.0, 1.0],
+            )
+
+    def _replicate_sd_and_data(self):
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        return data, sd
+
+    def test_multi_period_did_replicate_rejected(self):
+        """MultiPeriodDiD rejects replicate-weight designs."""
+        from diff_diff.estimators import MultiPeriodDiD
+        data, sd = self._replicate_sd_and_data()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        data["post"] = (data["time"] >= 3).astype(int)
+        with pytest.raises(NotImplementedError):
+            MultiPeriodDiD().fit(
+                data, outcome="outcome", treatment="treated",
+                time="post", survey_design=sd,
+            )
+
+    def test_imputation_did_replicate_rejected(self):
+        """ImputationDiD rejects replicate-weight designs."""
+        from diff_diff.imputation import ImputationDiD
+        data, sd = self._replicate_sd_and_data()
+        with pytest.raises(NotImplementedError):
+            ImputationDiD().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                first_treat="first_treat", survey_design=sd,
+            )
+
+    def test_two_stage_did_replicate_rejected(self):
+        """TwoStageDiD rejects replicate-weight designs."""
+        from diff_diff.two_stage import TwoStageDiD
+        data, sd = self._replicate_sd_and_data()
+        with pytest.raises(NotImplementedError):
+            TwoStageDiD().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                first_treat="first_treat", survey_design=sd,
+            )
+
+    def test_bacon_replicate_rejected(self):
+        """BaconDecomposition rejects replicate-weight designs."""
+        from diff_diff.bacon import BaconDecomposition
+        data, sd = self._replicate_sd_and_data()
+        with pytest.raises(NotImplementedError):
+            BaconDecomposition().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                first_treat="first_treat", survey_design=sd,
+            )
+
+    def test_synthetic_did_replicate_rejected(self):
+        """SyntheticDiD rejects replicate-weight designs."""
+        from diff_diff.synthetic_did import SyntheticDiD
+        data, sd = self._replicate_sd_and_data()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        with pytest.raises(NotImplementedError):
+            SyntheticDiD().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                treatment="treated", survey_design=sd,
+            )
+
+    @pytest.mark.slow
+    def test_trop_replicate_rejected(self):
+        """TROP rejects replicate-weight designs."""
+        from diff_diff.trop import TROP
+        data, sd = self._replicate_sd_and_data()
+        data["treated"] = (data["first_treat"] > 0).astype(int)
+        with pytest.raises(NotImplementedError):
+            TROP().fit(
+                data, outcome="outcome", unit="unit", time="time",
+                treatment="treated", survey_design=sd,
+            )
+
+
+class TestCSReplicateMethodCoverage:
+    """Test that CS replicate weights work for all supported estimation methods."""
+
+    @staticmethod
+    def _make_cs_replicate_data():
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        return data, sd
+
+    @pytest.mark.parametrize("method", ["ipw", "dr"])
+    def test_cs_replicate_ipw_dr_no_covariates(self, method):
+        """CS replicate weights work for ipw/dr without covariates."""
+        from diff_diff import CallawaySantAnna
+        data, sd = self._make_cs_replicate_data()
+        result = CallawaySantAnna(
+            estimation_method=method, n_bootstrap=0,
+        ).fit(
+            data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        assert np.isfinite(result.overall_att)
+        assert np.isfinite(result.overall_se)
+        assert result.survey_metadata is not None
+
+
+# =============================================================================
+# Effective-sample and d.f. consistency tests
+# =============================================================================
+
+
+class TestEffectiveSampleAndDfConsistency:
+    """Tests for P1 effective-sample gates and P2 d.f. metadata consistency."""
+
+    def test_continuous_did_positive_weight_gate(self):
+        """ContinuousDiD skips cells where positive-weight count < n_basis."""
+        import warnings
+        from diff_diff import ContinuousDiD
+
+        np.random.seed(42)
+        n_units, n_periods = 60, 6
+        n_rep = 10
+        rows = []
+        for unit in range(n_units):
+            if unit < 20:
+                ft = 3
+            elif unit < 40:
+                ft = 5
+            else:
+                ft = 0
+            dose = float(unit % 3 + 1) if ft > 0 else 0.0
+            wt = 1.0 + 0.3 * (unit % 5)
+            for t in range(1, n_periods + 1):
+                y = 10.0 + unit * 0.05 + t * 0.2
+                if ft > 0 and t >= ft:
+                    y += 2.0 * dose
+                y += np.random.normal(0, 0.5)
+                rows.append({
+                    "unit": unit, "time": t, "first_treat": ft,
+                    "outcome": y, "weight": wt, "dose": dose,
+                })
+        data = pd.DataFrame(rows)
+
+        # Build replicate columns
+        cluster_size = n_units // n_rep
+        rep_cols = []
+        for r in range(n_rep):
+            w_r = data["weight"].values.copy()
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n_units)
+            mask = (data["unit"] >= start) & (data["unit"] < end)
+            w_r[mask] = 0.0
+            w_r[~mask] *= n_rep / (n_rep - 1)
+            col = f"rep_{r}"
+            data[col] = w_r
+            rep_cols.append(col)
+
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+
+        # Zero weights for most of cohort ft=5 (units 20-39) while keeping
+        # cohort ft=3 (units 0-19) and controls intact. This leaves too few
+        # positive-weight treated units in ft=5 cells, triggering the warning,
+        # but ft=3 cells remain valid so the estimator still produces a result.
+        subpop_mask = (data["first_treat"] != 5) | (data["unit"] < 22)
+        new_sd, new_data = sd.subpopulation(data, subpop_mask)
+
+        # Should emit warning about not enough positive-weight treated units
+        # for the cohort-5 cells, but still produce a result from cohort-3
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            result = ContinuousDiD(n_bootstrap=0, num_knots=2).fit(
+                new_data, "outcome", "unit", "time", "first_treat", "dose",
+                survey_design=new_sd,
+            )
+        weight_warnings = [
+            w for w in caught
+            if "positive-weight" in str(w.message)
+        ]
+        assert len(weight_warnings) > 0, "Expected warning about insufficient positive-weight units"
+        # Result should still be finite (from the valid cohort-3 cells)
+        assert result is not None
+        assert np.isfinite(result.overall_att)
+
+    def test_triple_diff_positive_weight_gate(self):
+        """TripleDifference falls back to weighted mean when n_eff < n_cols."""
+        from diff_diff.triple_diff import TripleDifference
+
+        np.random.seed(42)
+        n = 200
+        d1 = np.repeat([0, 1], n // 2)
+        d2 = np.tile([0, 1], n // 2)
+        post = np.tile([0, 0, 1, 1], n // 4)
+        x1 = np.random.randn(n)
+        x2 = np.random.randn(n)
+        x3 = np.random.randn(n)
+        y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5
+        w = 1.0 + np.random.exponential(0.3, n)
+
+        # Create a cell (d1=1, d2=1, post=1) where most obs have zero weight
+        # but each cell still has SOME positive weight for cell means.
+        # With 3 covariates + intercept = 4 columns, n_eff=2 < 4 triggers fallback
+        cell_target = (d1 == 1) & (d2 == 1) & (post == 1)
+        cell_indices = np.where(cell_target)[0]
+        # Zero all but 2 obs in this cell
+        for idx in cell_indices[2:]:
+            w[idx] = 0.0
+
+        data = pd.DataFrame({
+            "y": y, "d1": d1, "d2": d2, "post": post,
+            "weight": w, "x1": x1, "x2": x2, "x3": x3,
+        })
+
+        sd = SurveyDesign(weights="weight")
+
+        # Should not crash — falls back to weighted mean for the thin cell
+        result = TripleDifference(estimation_method="reg").fit(
+            data, outcome="y", group="d1", partition="d2",
+            time="post", covariates=["x1", "x2", "x3"], survey_design=sd,
+        )
+        assert result is not None
+        assert np.isfinite(result.att)
+
+    def test_cs_replicate_df_metadata_consistency(self):
+        """CS survey_metadata.df_survey matches the d.f. used for inference."""
+        from diff_diff import CallawaySantAnna
+
+        data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data()
+        n_rep = len(rep_cols)
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        result = CallawaySantAnna(estimation_method="reg", n_bootstrap=0).fit(
+            data, "outcome", "unit", "time", "first_treat",
+            survey_design=sd,
+        )
+        sm = result.survey_metadata
+        assert sm is not None
+        assert sm.df_survey is not None
+        assert sm.df_survey == n_rep - 1
+
+        # summary() should include the df line
+        summary_str = result.summary()
+        assert "Survey d.f." in summary_str
+
+    def test_triple_diff_replicate_df_metadata_consistency(self):
+        """TripleDifference survey_metadata.df_survey matches inference d.f."""
+        from diff_diff import TripleDifference
+
+        np.random.seed(42)
+        n = 200
+        n_rep = 10
+        d1 = np.repeat([0, 1], n // 2)
+        d2 = np.tile([0, 1], n // 2)
+        post = np.random.choice([0, 1], n)
+        y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5
+        w = 1.0 + np.random.exponential(0.3, n)
+
+        data = pd.DataFrame({
+            "y": y, "d1": d1, "d2": d2, "post": post, "weight": w,
+        })
+        cluster_size = n // n_rep
+        rep_cols = []
+        for r in range(n_rep):
+            w_r = w.copy()
+            start = r * cluster_size
+            end = min((r + 1) * cluster_size, n)
+            w_r[start:end] = 0.0
+            w_r[w_r > 0] *= n_rep / (n_rep - 1)
+            col = f"rep_{r}"
+            data[col] = w_r
+            rep_cols.append(col)
+
+        sd = SurveyDesign(
+            weights="weight", replicate_weights=rep_cols,
+            replicate_method="JK1",
+        )
+        result = TripleDifference(estimation_method="reg").fit(
+            data, outcome="y", group="d1", partition="d2",
+            time="post", survey_design=sd,
+        )
+        sm = result.survey_metadata
+        assert sm is not None
+        assert sm.df_survey is not None
+        assert sm.df_survey == n_rep - 1
+
+        d = result.to_dict()
+        assert d["df_survey"] == sm.df_survey