diff --git a/TODO.md b/TODO.md index b2dd16c7..f9f3dc16 100644 --- a/TODO.md +++ b/TODO.md @@ -52,15 +52,16 @@ Deferred items from PR reviews that were not addressed before merge. | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) | | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium | | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium | +| Replicate-weight survey df — **Resolved**. `df_survey = rank(replicate_weights) - 1` matching R's `survey::degf()`. For IF paths, `n_valid - 1` when dropped replicates reduce effective count. | `survey.py` | #238 | Resolved | | CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved | | CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium | | SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved | -| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium | +| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved | | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low | | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low | -| ContinuousDiD event-study aggregation does not filter by `anticipation` — uses all (g,t) cells instead of anticipation-filtered subset; pre-existing in both survey and non-survey paths | `continuous_did.py` | #226 | Medium | +| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved | | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low | -| Duplicated survey metadata summary formatting across 6 results classes — extract shared `_format_survey_metadata(sm, width)` helper to reduce maintenance burden as more estimators gain survey support in Phases 4-5 | `results.py`, `stacked_did_results.py`, `sun_abraham.py`, `bacon.py`, `triple_diff.py`, `continuous_did_results.py`, `efficient_did_results.py` | #226 | Low | +| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved | | TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low | #### Performance @@ -78,7 +79,7 @@ Deferred items from PR reviews that were not addressed before merge. | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low | | ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low | | Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium | -| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium | +| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | — | Resolved | --- diff --git a/diff_diff/__init__.py b/diff_diff/__init__.py index b3188440..1e2eaeb8 100644 --- a/diff_diff/__init__.py +++ b/diff_diff/__init__.py @@ -95,8 +95,10 @@ SyntheticDiDResults, ) from diff_diff.survey import ( + DEFFDiagnostics, SurveyDesign, SurveyMetadata, + compute_deff_diagnostics, ) from diff_diff.staggered import ( CallawaySantAnna, @@ -327,6 +329,8 @@ # Survey support "SurveyDesign", "SurveyMetadata", + "DEFFDiagnostics", + "compute_deff_diagnostics", # Rust backend "HAS_RUST_BACKEND", # Linear algebra helpers diff --git a/diff_diff/bacon.py b/diff_diff/bacon.py index 67fe794d..7ffbdf88 100644 --- a/diff_diff/bacon.py +++ b/diff_diff/bacon.py @@ -17,6 +17,7 @@ import numpy as np import pandas as pd +from diff_diff.results import _format_survey_block from diff_diff.utils import within_transform as _within_transform_util @@ -144,23 +145,7 @@ def summary(self) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<35} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<35} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<35} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<35} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<35} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<35} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) lines.extend( [ @@ -477,6 +462,13 @@ def fit( resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, "analytical") ) + # Reject replicate-weight designs — Bacon decomposition is a + # diagnostic that does not compute replicate-based variance + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "BaconDecomposition does not support replicate-weight survey " + "designs. Use a TSL-based survey design (strata/psu/fpc)." + ) # Validate within-unit constancy for exact survey weights only. # The exact-weight path collapses to per-unit weights via groupby().first(), @@ -593,6 +585,13 @@ def fit( weights=survey_weights, ) + if not comparisons: + raise ValueError( + "No valid 2x2 comparisons remain after filtering. " + "All cells have zero effective weight or insufficient data. " + "Check subpopulation/domain definition." + ) + # Normalize weights to sum to 1 total_weight = sum(c.weight for c in comparisons) if total_weight > 0: @@ -849,6 +848,7 @@ def _compute_treated_vs_never( never_post_mask = never_mask & df[time].isin(post_periods) # Guard against empty cells (unbalanced/filtered panels) + # Also check positive weight mass for survey/subpopulation designs if not ( np.any(treated_pre_mask) and np.any(treated_post_mask) @@ -856,6 +856,13 @@ def _compute_treated_vs_never( and np.any(never_post_mask) ): return None + if ( + np.sum(w[treated_pre_mask]) <= 0 + or np.sum(w[treated_post_mask]) <= 0 + or np.sum(w[never_pre_mask]) <= 0 + or np.sum(w[never_post_mask]) <= 0 + ): + return None treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask]) treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask]) @@ -966,7 +973,7 @@ def _compute_timing_comparison( control_pre_mask = control_mask & df[time].isin(pre_periods) control_post_mask = control_mask & df[time].isin(post_periods) - # Skip if any cell is empty + # Skip if any cell is empty or has zero effective weight if ( treated_pre_mask.sum() == 0 or treated_post_mask.sum() == 0 @@ -974,6 +981,13 @@ def _compute_timing_comparison( or control_post_mask.sum() == 0 ): return None + if ( + np.sum(w[treated_pre_mask]) <= 0 + or np.sum(w[treated_post_mask]) <= 0 + or np.sum(w[control_pre_mask]) <= 0 + or np.sum(w[control_post_mask]) <= 0 + ): + return None treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask]) treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask]) diff --git a/diff_diff/continuous_did.py b/diff_diff/continuous_did.py index 636fcd6c..019796c3 100644 --- a/diff_diff/continuous_did.py +++ b/diff_diff/continuous_did.py @@ -355,6 +355,12 @@ def fit( gt_results[(g, t)] = result gt_bootstrap_info[(g, t)] = result.get("_bootstrap_info", {}) + # Filter out NaN cells (e.g., from zero effective survey mass) + gt_results = { + gt: r for gt, r in gt_results.items() + if np.isfinite(r.get("att_glob", np.nan)) + } + if len(gt_results) == 0: raise ValueError("No valid (g,t) cells computed.") @@ -390,6 +396,7 @@ def fit( gt_bootstrap_info=gt_bootstrap_info, unit_survey_weights=precomp.get("unit_survey_weights"), unit_cohorts=precomp["unit_cohorts"], + anticipation=self.anticipation, ) _survey_df = None # Set by analytical branch when survey is active @@ -509,6 +516,11 @@ def fit( # Survey df for t-distribution inference (unit-level, not panel-level) _survey_df = analytic.get("df_survey") + # Guard: replicate design with undefined df → NaN inference + if (_survey_df is None and resolved_survey is not None + and hasattr(resolved_survey, 'uses_replicate_variance') + and resolved_survey.uses_replicate_variance): + _survey_df = 0 # Recompute survey_metadata from unit-level design so reported # effective_n/n_psu/df_survey match the inference actually run @@ -519,6 +531,13 @@ def fit( raw_w_unit = _unit_resolved.weights survey_metadata = compute_survey_metadata(_unit_resolved, raw_w_unit) + # Propagate replicate df override to survey_metadata for display + # (but not the df=0 sentinel — keep metadata as None for undefined df) + if (_survey_df is not None and _survey_df != 0 + and survey_metadata is not None): + if survey_metadata.df_survey != _survey_df: + survey_metadata.df_survey = _survey_df + overall_att_t, overall_att_p, overall_att_ci = safe_inference( overall_att, overall_att_se, self.alpha, df=_survey_df ) @@ -571,15 +590,8 @@ def fit( ) n_strata_u = len(np.unique(us)) if us is not None else 0 n_psu_u = len(np.unique(up)) if up is not None else 0 - unit_resolved_es = ResolvedSurveyDesign( - weights=uw, - weight_type=resolved_survey.weight_type, - strata=us, - psu=up, - fpc=uf, - n_strata=n_strata_u, - n_psu=n_psu_u, - lonely_psu=resolved_survey.lonely_psu, + unit_resolved_es = resolved_survey.subset_to_units( + row_idx, uw, us, up, uf, n_strata_u, n_psu_u, ) for e_val, info_e in event_study_effects.items(): @@ -638,12 +650,19 @@ def fit( # Compute SE: survey-aware TSL or standard sqrt(sum(IF^2)) if unit_resolved_es is not None: - X_ones_es = np.ones((n_units, 1)) - # Rescale IFs by total survey mass (not n_units) for fweight support - tsl_scale_es = float(unit_resolved_es.weights.sum()) - if_es_tsl = if_es * tsl_scale_es - vcov_es = compute_survey_vcov(X_ones_es, if_es_tsl, unit_resolved_es) - es_se = float(np.sqrt(np.abs(vcov_es[0, 0]))) + if unit_resolved_es.uses_replicate_variance: + from diff_diff.survey import compute_replicate_if_variance + + # Score-scale: psi = w * if_es (matches TSL bread) + psi_es = unit_resolved_es.weights * if_es + variance, _nv = compute_replicate_if_variance(psi_es, unit_resolved_es) + es_se = float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan + else: + X_ones_es = np.ones((n_units, 1)) + tsl_scale_es = float(unit_resolved_es.weights.sum()) + if_es_tsl = if_es * tsl_scale_es + vcov_es = compute_survey_vcov(X_ones_es, if_es_tsl, unit_resolved_es) + es_se = float(np.sqrt(np.abs(vcov_es[0, 0]))) else: es_se = float(np.sqrt(np.sum(if_es**2))) @@ -871,6 +890,14 @@ def _compute_dose_response_gt( if survey_weights is not None: w_treated = survey_weights[treated_mask] w_control = survey_weights[control_mask] + # Guard against zero effective mass (e.g., after subpopulation) + if np.sum(w_treated) <= 0 or np.sum(w_control) <= 0: + return { + "att_glob": np.nan, "acrt_glob": np.nan, + "n_treated": 0, "n_control": 0, + "att_d": np.full(len(dvals), np.nan), + "acrt_d": np.full(len(dvals), np.nan), + } # Control counterfactual (weighted mean when survey weights present) if w_control is not None: @@ -897,9 +924,14 @@ def _compute_dose_response_gt( ) # Skip if not enough treated units for OLS (need n > K for residual df) - if n_treated <= n_basis: + # When survey weights are present, use positive-weight count as + # the effective sample size — subpopulation() can zero weights + # leaving rows present but the weighted regression underidentified. + n_eff = int(np.count_nonzero(w_treated > 0)) if w_treated is not None else n_treated + if n_eff <= n_basis: + label = "positive-weight treated units" if w_treated is not None else "treated units" warnings.warn( - f"Not enough treated units ({n_treated}) for {n_basis} basis functions " + f"Not enough {label} ({n_eff}) for {n_basis} basis functions " f"in (g={g}, t={t}). Skipping cell.", UserWarning, stacklevel=3, @@ -1052,12 +1084,15 @@ def _aggregate_event_study( gt_bootstrap_info: Dict[Tuple, Dict] = None, unit_survey_weights: Optional[np.ndarray] = None, unit_cohorts: Optional[np.ndarray] = None, + anticipation: int = 0, ) -> Dict[int, Dict[str, Any]]: """Aggregate binarized ATT_glob by relative period.""" effects_by_e: Dict[int, List[Tuple[float, float, Tuple]]] = {} for (g, t), r in gt_results.items(): e = t - g + if anticipation > 0 and e < -anticipation: + continue if e not in effects_by_e: effects_by_e[e] = [] # Compute weight for this (g,t) cell @@ -1205,48 +1240,58 @@ def _compute_analytical_se( n_strata_unit = len(np.unique(unit_strata)) if unit_strata is not None else 0 n_psu_unit = len(np.unique(unit_psu)) if unit_psu is not None else 0 - unit_resolved = ResolvedSurveyDesign( - weights=unit_weights, - weight_type=resolved_survey.weight_type, - strata=unit_strata, - psu=unit_psu, - fpc=unit_fpc, - n_strata=n_strata_unit, - n_psu=n_psu_unit, - lonely_psu=resolved_survey.lonely_psu, + unit_resolved = resolved_survey.subset_to_units( + row_idx, unit_weights, unit_strata, unit_psu, unit_fpc, + n_strata_unit, n_psu_unit, ) X_ones = np.ones((n_units, 1)) - # Rescale IFs from 1/n convention to score scale for TSL sandwich. - # The per-unit IFs contain internal 1/n_t, 1/n_c scaling (for the - # unweighted SE = sqrt(sum(IF^2)) convention). compute_survey_vcov - # applies its own (X'WX)^{-1} bread, which would double-count. - # Rescale by the unit-level total survey mass (= n_units for - # pweight/aweight, but can differ for fweight). - tsl_scale = float(unit_resolved.weights.sum()) - if_att_glob_tsl = if_att_glob * tsl_scale - if_acrt_glob_tsl = if_acrt_glob * tsl_scale - if_att_d_tsl = if_att_d * tsl_scale - if_acrt_d_tsl = if_acrt_d * tsl_scale - - # Overall ATT SE via compute_survey_vcov - vcov_att = compute_survey_vcov(X_ones, if_att_glob_tsl, unit_resolved) - overall_att_se = float(np.sqrt(np.abs(vcov_att[0, 0]))) - - # Overall ACRT SE via compute_survey_vcov - vcov_acrt = compute_survey_vcov(X_ones, if_acrt_glob_tsl, unit_resolved) - overall_acrt_se = float(np.sqrt(np.abs(vcov_acrt[0, 0]))) - - # Per-grid-point SEs for dose-response curves - att_d_se = np.zeros(n_grid) - acrt_d_se = np.zeros(n_grid) - for d_idx in range(n_grid): - vcov_d = compute_survey_vcov(X_ones, if_att_d_tsl[:, d_idx], unit_resolved) - att_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0]))) - - vcov_d = compute_survey_vcov(X_ones, if_acrt_d_tsl[:, d_idx], unit_resolved) - acrt_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0]))) + if unit_resolved.uses_replicate_variance: + # Replicate-weight variance: score-scale IFs to match TSL bread. + # TSL path does: scores = w * (if * tsl_scale), bread = 1/sum(w)^2 + # Equivalent psi for replicate: w * if_vals * tsl_scale / sum(w) = w * if_vals + from diff_diff.survey import compute_replicate_if_variance + + _w_rep = unit_resolved.weights + _rep_n_valid = unit_resolved.n_replicates # track effective count + + def _rep_se(if_vals): + nonlocal _rep_n_valid + psi_scaled = _w_rep * if_vals + v, nv = compute_replicate_if_variance(psi_scaled, unit_resolved) + _rep_n_valid = min(_rep_n_valid, nv) # worst-case valid count + return float(np.sqrt(max(v, 0.0))) if np.isfinite(v) else np.nan + + overall_att_se = _rep_se(if_att_glob) + overall_acrt_se = _rep_se(if_acrt_glob) + att_d_se = np.zeros(n_grid) + acrt_d_se = np.zeros(n_grid) + for d_idx in range(n_grid): + att_d_se[d_idx] = _rep_se(if_att_d[:, d_idx]) + acrt_d_se[d_idx] = _rep_se(if_acrt_d[:, d_idx]) + else: + # TSL: rescale IFs from 1/n convention to score scale for sandwich. + tsl_scale = float(unit_resolved.weights.sum()) + if_att_glob_tsl = if_att_glob * tsl_scale + if_acrt_glob_tsl = if_acrt_glob * tsl_scale + if_att_d_tsl = if_att_d * tsl_scale + if_acrt_d_tsl = if_acrt_d * tsl_scale + + vcov_att = compute_survey_vcov(X_ones, if_att_glob_tsl, unit_resolved) + overall_att_se = float(np.sqrt(np.abs(vcov_att[0, 0]))) + + vcov_acrt = compute_survey_vcov(X_ones, if_acrt_glob_tsl, unit_resolved) + overall_acrt_se = float(np.sqrt(np.abs(vcov_acrt[0, 0]))) + + att_d_se = np.zeros(n_grid) + acrt_d_se = np.zeros(n_grid) + for d_idx in range(n_grid): + vcov_d = compute_survey_vcov(X_ones, if_att_d_tsl[:, d_idx], unit_resolved) + att_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0]))) + + vcov_d = compute_survey_vcov(X_ones, if_acrt_d_tsl[:, d_idx], unit_resolved) + acrt_d_se[d_idx] = float(np.sqrt(np.abs(vcov_d[0, 0]))) else: # SE = sqrt(sum(IF_i^2)), matching CallawaySantAnna's convention # (per-unit IFs already contain 1/n_t, 1/n_c scaling) @@ -1257,7 +1302,14 @@ def _compute_analytical_se( acrt_d_se = np.sqrt(np.sum(if_acrt_d**2, axis=0)) # Return unit-level survey df and resolved design for metadata recomputation - unit_df_survey = unit_resolved.df_survey if resolved_survey is not None else None + # Only override with n_valid-based df when replicates were actually dropped + if resolved_survey is not None and hasattr(resolved_survey, 'uses_replicate_variance') and resolved_survey.uses_replicate_variance: + if _rep_n_valid < unit_resolved.n_replicates: + unit_df_survey = _rep_n_valid - 1 if _rep_n_valid > 1 else None + else: + unit_df_survey = unit_resolved.df_survey + else: + unit_df_survey = unit_resolved.df_survey if resolved_survey is not None else None return { "overall_att_se": overall_att_se, @@ -1294,6 +1346,15 @@ def _run_bootstrap( stacklevel=3, ) + # Reject replicate-weight designs for bootstrap — replicate variance + # is an analytical alternative to bootstrap, not compatible with it + if resolved_survey is not None and hasattr(resolved_survey, "uses_replicate_variance") and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "ContinuousDiD bootstrap (n_bootstrap > 0) is not supported " + "with replicate-weight survey designs. Replicate weights provide " + "analytical variance; use n_bootstrap=0 instead." + ) + rng = np.random.default_rng(self.seed) n_units = precomp["n_units"] n_grid = len(dvals) @@ -1314,15 +1375,9 @@ def _run_bootstrap( unit_fpc = resolved_survey.fpc[row_idx] if resolved_survey.fpc is not None else None n_strata_u = len(np.unique(unit_strata)) if unit_strata is not None else 0 n_psu_u = len(np.unique(unit_psu)) if unit_psu is not None else 0 - unit_resolved = ResolvedSurveyDesign( - weights=unit_weights, - weight_type=resolved_survey.weight_type, - strata=unit_strata, - psu=unit_psu, - fpc=unit_fpc, - n_strata=n_strata_u, - n_psu=n_psu_u, - lonely_psu=resolved_survey.lonely_psu, + unit_resolved = resolved_survey.subset_to_units( + row_idx, unit_weights, unit_strata, unit_psu, unit_fpc, + n_strata_u, n_psu_u, ) # Generate bootstrap weights — PSU-level when survey design is present @@ -1374,6 +1429,8 @@ def _run_bootstrap( for gt, r in gt_results.items(): g_val, t_val = gt e = t_val - g_val + if self.anticipation > 0 and e < -self.anticipation: + continue if unit_sw is not None: g_mask = unit_cohorts == g_val cell_mass = float(np.sum(unit_sw[g_mask])) @@ -1383,6 +1440,8 @@ def _run_bootstrap( for gt, r in gt_results.items(): g_val, t_val = gt e = t_val - g_val + if self.anticipation > 0 and e < -self.anticipation: + continue if unit_sw is not None: g_mask = unit_cohorts == g_val cell_mass = float(np.sum(unit_sw[g_mask])) diff --git a/diff_diff/continuous_did_results.py b/diff_diff/continuous_did_results.py index b9f332f4..67a745d4 100644 --- a/diff_diff/continuous_did_results.py +++ b/diff_diff/continuous_did_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars __all__ = ["ContinuousDiDResults", "DoseResponseCurve"] @@ -182,23 +182,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * w, - "Survey Design".center(w), - "-" * w, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * w, ""]) + lines.extend(_format_survey_block(sm, w)) # Overall summary parameters lines.extend( diff --git a/diff_diff/efficient_did.py b/diff_diff/efficient_did.py index c3a341c6..d25f0512 100644 --- a/diff_diff/efficient_did.py +++ b/diff_diff/efficient_did.py @@ -373,6 +373,11 @@ def fit( # Store survey df for safe_inference calls (t-distribution with survey df) self._survey_df = survey_metadata.df_survey if survey_metadata is not None else None + # Guard: replicate design with undefined df → NaN inference + if (self._survey_df is None and resolved_survey is not None + and hasattr(resolved_survey, 'uses_replicate_variance') + and resolved_survey.uses_replicate_variance): + self._survey_df = 0 # Bootstrap + survey supported via PSU-level multiplier bootstrap. @@ -516,18 +521,16 @@ def fit( unit_fpc = resolved_survey.fpc[row_idx] if resolved_survey.fpc is not None else None n_strata_u = len(np.unique(unit_strata)) if unit_strata is not None else 0 n_psu_u = len(np.unique(unit_psu)) if unit_psu is not None else 0 - self._unit_resolved_survey = ResolvedSurveyDesign( - weights=unit_weights_s, - weight_type=resolved_survey.weight_type, - strata=unit_strata, - psu=unit_psu, - fpc=unit_fpc, - n_strata=n_strata_u, - n_psu=n_psu_u, - lonely_psu=resolved_survey.lonely_psu, + self._unit_resolved_survey = resolved_survey.subset_to_units( + row_idx, unit_weights_s, unit_strata, unit_psu, unit_fpc, + n_strata_u, n_psu_u, ) # Use unit-level df (not panel-level) for t-distribution self._survey_df = self._unit_resolved_survey.df_survey + # Re-apply replicate guard: undefined df → NaN inference + if (self._survey_df is None + and self._unit_resolved_survey.uses_replicate_variance): + self._survey_df = 0 else: self._unit_resolved_survey = None @@ -951,6 +954,18 @@ def fit( ) # ----- Bootstrap ----- + # Reject replicate-weight designs for bootstrap — replicate variance + # is an analytical alternative, not compatible with bootstrap + if ( + self.n_bootstrap > 0 + and self._unit_resolved_survey is not None + and self._unit_resolved_survey.uses_replicate_variance + ): + raise NotImplementedError( + "EfficientDiD bootstrap (n_bootstrap > 0) is not supported " + "with replicate-weight survey designs. Replicate weights provide " + "analytical variance; use n_bootstrap=0 instead." + ) bootstrap_results = None if self.n_bootstrap > 0 and eif_by_gt: bootstrap_results = self._run_multiplier_bootstrap( @@ -1061,10 +1076,16 @@ def _recompute_unit_survey_metadata(self, panel_metadata): if self._unit_resolved_survey is not None: from diff_diff.survey import compute_survey_metadata - return compute_survey_metadata( + meta = compute_survey_metadata( self._unit_resolved_survey, self._unit_resolved_survey.weights, ) + # Propagate effective replicate df if available + # (but not the df=0 sentinel — keep metadata as None for undefined df) + if (self._survey_df is not None and self._survey_df != 0 + and meta.df_survey != self._survey_df): + meta.df_survey = self._survey_df + return meta return panel_metadata # -- Survey SE helpers ---------------------------------------------------- @@ -1076,6 +1097,18 @@ def _compute_survey_eif_se(self, eif_vals: np.ndarray) -> float: once in ``fit()``, ensuring consistent unit-level arrays and avoiding repeated subsetting of panel-level survey data. """ + if self._unit_resolved_survey.uses_replicate_variance: + from diff_diff.survey import compute_replicate_if_variance + + # Score-scale IFs to match TSL bread: psi = w * eif / sum(w) + w = self._unit_resolved_survey.weights + psi_scaled = w * eif_vals / w.sum() + variance, n_valid = compute_replicate_if_variance(psi_scaled, self._unit_resolved_survey) + # Update survey df to reflect effective replicate count + if n_valid < self._unit_resolved_survey.n_replicates: + self._survey_df = n_valid - 1 if n_valid > 1 else None + return float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan + from diff_diff.survey import compute_survey_vcov X_ones = np.ones((len(eif_vals), 1)) @@ -1577,6 +1610,10 @@ def _aggregate_es( n_units = int(np.sum(row_finite)) if cl_idx is not None: cl_idx = cl_idx[row_finite] + # Recompute effective cluster count and remap to contiguous + # indices — entire clusters may have been dropped by filtering + unique_cl, cl_idx = np.unique(cl_idx, return_inverse=True) + n_cl = len(unique_cl) # Compute full covariance matrices if cl_idx is not None and n_cl is not None: diff --git a/diff_diff/efficient_did_results.py b/diff_diff/efficient_did_results.py index 8e212795..4d10c9b3 100644 --- a/diff_diff/efficient_did_results.py +++ b/diff_diff/efficient_did_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars if TYPE_CHECKING: from diff_diff.efficient_did_bootstrap import EDiDBootstrapResults @@ -201,23 +201,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py index 88ae9672..1b8a2ac0 100644 --- a/diff_diff/estimators.py +++ b/diff_diff/estimators.py @@ -240,6 +240,15 @@ def fit( resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, self.inference) ) + # Reject replicate-weight designs — base DiD uses compute_survey_vcov + # (TSL) directly, not LinearRegression's replicate dispatch. + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "DifferenceInDifferences does not yet support replicate-weight " + "survey designs. Use CallawaySantAnna, EfficientDiD, " + "ContinuousDiD, or TripleDifference for replicate-weight " + "inference, or use a TSL-based survey design (strata/psu/fpc)." + ) # Handle absorbed fixed effects (within-transformation) working_data = data.copy() @@ -1008,6 +1017,15 @@ def fit( # type: ignore[override] resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, effective_inference) ) + # Reject replicate-weight designs — MultiPeriodDiD uses + # compute_survey_vcov (TSL) directly without replicate dispatch. + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "MultiPeriodDiD does not yet support replicate-weight survey " + "designs. Use CallawaySantAnna for staggered adoption with " + "replicate weights, or use a TSL-based survey design " + "(strata/psu/fpc)." + ) # Handle absorbed fixed effects (within-transformation) working_data = data.copy() diff --git a/diff_diff/imputation.py b/diff_diff/imputation.py index 77146d40..4f32a663 100644 --- a/diff_diff/imputation.py +++ b/diff_diff/imputation.py @@ -246,6 +246,11 @@ def fit( # Validate within-unit constancy for panel survey designs if resolved_survey is not None: + if resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "ImputationDiD does not yet support replicate-weight survey " + "designs. Use a TSL-based survey design (strata/psu/fpc)." + ) _validate_unit_constant_survey(data, unit, survey_design) if resolved_survey.weight_type != "pweight": raise ValueError( diff --git a/diff_diff/imputation_results.py b/diff_diff/imputation_results.py index 6589af1d..3a0465e1 100644 --- a/diff_diff/imputation_results.py +++ b/diff_diff/imputation_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars __all__ = [ "ImputationBootstrapResults", @@ -187,23 +187,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py index 819af25f..5b0d5d59 100644 --- a/diff_diff/linalg.py +++ b/diff_diff/linalg.py @@ -402,11 +402,16 @@ def _validate_weights(weights, weight_type, n): raise ValueError("Weights contain Inf values") if np.any(weights < 0): raise ValueError("Weights must be non-negative") + if np.sum(weights) <= 0: + raise ValueError( + "Weights sum to zero — no observations have positive weight. " + "Cannot fit a model on an empty effective sample." + ) if weight_type == "fweight": fractional = weights - np.round(weights) if np.any(np.abs(fractional) > 1e-10): raise ValueError( - "Frequency weights (fweight) must be positive integers. " + "Frequency weights (fweight) must be non-negative integers. " "Fractional values detected. Use pweight for non-integer weights." ) return weights @@ -1041,10 +1046,16 @@ def _compute_robust_vcov_numpy( else: bread_matrix = X.T @ X - # Effective n for df computation (fweights use sum(w)) + # Effective n for df computation + # fweights: sum(w) (frequency expansion) + # pweight/aweight with zeros: positive-weight count (zero-weight rows + # contribute nothing to the sandwich and should not inflate df) n_eff = n - if weights is not None and weight_type == "fweight": - n_eff = int(round(np.sum(weights))) + if weights is not None: + if weight_type == "fweight": + n_eff = int(round(np.sum(weights))) + elif np.any(weights == 0): + n_eff = int(np.count_nonzero(weights > 0)) # Compute weighted scores for cluster-robust meat (outer product of sums). # pweight/fweight multiply by w; aweight and unweighted use raw residuals. @@ -1053,6 +1064,9 @@ def _compute_robust_vcov_numpy( scores = X * (weights * residuals)[:, np.newaxis] else: scores = X * residuals[:, np.newaxis] + # Zero out scores for zero-weight aweight rows (subpopulation invariance) + if weights is not None and np.any(weights == 0): + scores[weights == 0] = 0.0 if cluster_ids is None: # HC1 (heteroskedasticity-robust) standard errors @@ -1070,6 +1084,11 @@ def _compute_robust_vcov_numpy( unique_clusters = np.unique(cluster_ids) n_clusters = len(unique_clusters) + # Exclude clusters with zero total weight (subpopulation-zeroed) + if weights is not None and weight_type != "fweight" and np.any(weights == 0): + cluster_weights = pd.Series(weights).groupby(cluster_ids).sum() + n_clusters = int((cluster_weights > 0).sum()) + if n_clusters < 2: raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {n_clusters}") @@ -1166,8 +1185,12 @@ def solve_logit( raise ValueError("weights contain NaN values") if np.any(~np.isfinite(weights)): raise ValueError("weights contain Inf values") - if np.any(weights <= 0): - raise ValueError("weights must be strictly positive") + if np.any(weights < 0): + raise ValueError("weights must be non-negative") + if np.sum(weights) <= 0: + raise ValueError( + "weights sum to zero — no observations have positive weight" + ) # Validate rank_deficient_action valid_actions = {"warn", "error", "silent"} @@ -1177,7 +1200,57 @@ def solve_logit( f"got '{rank_deficient_action}'" ) - # Check rank deficiency once before iterating + # Track original column count for coefficient expansion at the end + k_original = X_with_intercept.shape[1] + eff_dropped_original: list = [] # indices in original column space + + # Validate effective weighted sample when weights have zeros + if weights is not None and np.any(weights == 0): + pos_mask = weights > 0 + n_pos = int(np.sum(pos_mask)) + y_pos = y[pos_mask] + # Need both outcome classes in the positive-weight subset + unique_y = np.unique(y_pos) + if len(unique_y) < 2: + raise ValueError( + f"Positive-weight observations have only {len(unique_y)} " + f"outcome class(es). Logistic regression requires both 0 and 1 " + f"in the effective (positive-weight) sample." + ) + # Check rank deficiency on positive-weight rows FIRST — full design + # may be full rank due to zero-weight padding. Drop columns before + # checking sample-size identification. + X_eff = X_with_intercept[pos_mask] + eff_rank_info = _detect_rank_deficiency(X_eff) + if len(eff_rank_info[1]) > 0: + n_dropped_eff = len(eff_rank_info[1]) + if rank_deficient_action == "error": + raise ValueError( + f"Effective (positive-weight) sample is rank-deficient: " + f"{n_dropped_eff} linearly dependent column(s). " + f"Cannot identify logistic model on this subpopulation." + ) + elif rank_deficient_action == "warn": + warnings.warn( + f"Effective (positive-weight) sample is rank-deficient: " + f"dropping {n_dropped_eff} column(s). Propensity estimates " + f"may be unreliable on this subpopulation.", + UserWarning, + stacklevel=2, + ) + # Drop columns and track original indices for final expansion + eff_dropped_original = list(eff_rank_info[1]) + X_with_intercept = np.delete(X_with_intercept, eff_rank_info[1], axis=1) + k = X_with_intercept.shape[1] + # Check sample-size identification AFTER column dropping + if n_pos <= k: + raise ValueError( + f"Only {n_pos} positive-weight observation(s) for " + f"{k} parameters (after rank reduction). " + f"Cannot identify logistic model." + ) + + # Check rank deficiency once before iterating (on possibly-shrunk matrix) rank_info = _detect_rank_deficiency(X_with_intercept) rank, dropped_cols, _ = rank_info if len(dropped_cols) > 0: @@ -1273,10 +1346,21 @@ def solve_logit( stacklevel=2, ) - # Expand beta back to full size if columns were dropped - if len(dropped_cols) > 0: - beta_full = np.zeros(k) - beta_full[kept_cols] = beta_solve + # Expand beta back to original column count, accounting for columns + # dropped in both the effective-sample check and the full-sample check + if len(dropped_cols) > 0 or len(eff_dropped_original) > 0: + # First expand from X_solve columns back to post-eff-drop columns + # Use NaN for dropped coefficients (R convention: not estimable) + beta_post_eff = np.full(k, np.nan) + beta_post_eff[kept_cols] = beta_solve + + # Then expand from post-eff-drop columns back to original columns + if len(eff_dropped_original) > 0: + beta_full = np.full(k_original, np.nan) + kept_original = [i for i in range(k_original) if i not in eff_dropped_original] + beta_full[kept_original] = beta_post_eff + else: + beta_full = beta_post_eff else: beta_full = beta_solve @@ -1592,6 +1676,9 @@ def fit( X = np.asarray(X, dtype=np.float64) y = np.asarray(y, dtype=np.float64) + # Reset replicate df from any previous fit + self._replicate_df = None + # Add intercept if requested if self.include_intercept: X = np.column_stack([np.ones(X.shape[0]), X]) @@ -1670,10 +1757,14 @@ def fit( nan_mask = np.isnan(coefficients) k_effective = k - np.sum(nan_mask) # Number of identified coefficients - # For fweights, df uses sum(w) - k (effective sample size) + # Effective n for df: fweights use sum(w), pweight/aweight with + # zeros use positive-weight count (zero-weight rows don't contribute) n_eff_df = n - if self.weights is not None and self.weight_type == "fweight": - n_eff_df = int(round(np.sum(self.weights))) + if self.weights is not None: + if self.weight_type == "fweight": + n_eff_df = int(round(np.sum(self.weights))) + elif np.any(self.weights == 0): + n_eff_df = int(np.count_nonzero(self.weights > 0)) if k_effective == 0: # All coefficients dropped - no valid inference @@ -1721,20 +1812,54 @@ def fit( # Compute survey vcov if applicable if _use_survey_vcov: - from diff_diff.survey import compute_survey_vcov + from diff_diff.survey import ResolvedSurveyDesign as _RSD - nan_mask = np.isnan(coefficients) - if np.any(nan_mask): - kept_cols = np.where(~nan_mask)[0] - if len(kept_cols) > 0: - vcov_reduced = compute_survey_vcov( - X[:, kept_cols], residuals, _effective_survey_design + _uses_rep = ( + isinstance(_effective_survey_design, _RSD) + and _effective_survey_design.uses_replicate_variance + ) + + if _uses_rep: + from diff_diff.survey import compute_replicate_vcov + + nan_mask = np.isnan(coefficients) + if np.any(nan_mask): + kept_cols = np.where(~nan_mask)[0] + if len(kept_cols) > 0: + vcov_reduced, _n_valid_rep = compute_replicate_vcov( + X[:, kept_cols], y, coefficients[kept_cols], + _effective_survey_design, + weight_type=self.weight_type, + ) + vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols) + else: + vcov = np.full((X.shape[1], X.shape[1]), np.nan) + _n_valid_rep = 0 + else: + vcov, _n_valid_rep = compute_replicate_vcov( + X, y, coefficients, _effective_survey_design, + weight_type=self.weight_type, ) - vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols) + # Store effective replicate df only when replicates were dropped + if _n_valid_rep < _effective_survey_design.n_replicates: + self._replicate_df = _n_valid_rep - 1 if _n_valid_rep > 1 else None else: - vcov = np.full((X.shape[1], X.shape[1]), np.nan) + self._replicate_df = None # use rank-based df from design else: - vcov = compute_survey_vcov(X, residuals, _effective_survey_design) + from diff_diff.survey import compute_survey_vcov + + nan_mask = np.isnan(coefficients) + if np.any(nan_mask): + kept_cols = np.where(~nan_mask)[0] + if len(kept_cols) > 0: + vcov_reduced = compute_survey_vcov( + X[:, kept_cols], residuals, _effective_survey_design + ) + vcov = _expand_vcov_with_nan(vcov_reduced, X.shape[1], kept_cols) + else: + vcov = np.full((X.shape[1], X.shape[1]), np.nan) + else: + vcov = compute_survey_vcov(X, residuals, _effective_survey_design) # Store fitted attributes self.coefficients_ = coefficients @@ -1750,10 +1875,14 @@ def fit( # This is needed for correct degrees of freedom in inference nan_mask = np.isnan(coefficients) self.n_params_effective_ = int(self.n_params_ - np.sum(nan_mask)) - # For fweights, df uses sum(w) - k (effective sample size) + # Effective n for df: fweights use sum(w), pweight/aweight with + # zeros use positive-weight count (matches compute_robust_vcov) n_eff_df = self.n_obs_ - if self.weights is not None and self.weight_type == "fweight": - n_eff_df = int(round(np.sum(self.weights))) + if self.weights is not None: + if self.weight_type == "fweight": + n_eff_df = int(round(np.sum(self.weights))) + elif np.any(self.weights == 0): + n_eff_df = int(np.count_nonzero(self.weights > 0)) self.df_ = n_eff_df - self.n_params_effective_ - df_adjustment # Survey degrees of freedom: n_PSU - n_strata (overrides standard df) @@ -1763,9 +1892,74 @@ def fit( if isinstance(_effective_survey_design, ResolvedSurveyDesign): self.survey_df_ = _effective_survey_design.df_survey + # Override with effective replicate df if available + if hasattr(self, '_replicate_df') and self._replicate_df is not None: + self.survey_df_ = self._replicate_df return self + def compute_deff(self, coefficient_names=None): + """Compute per-coefficient design effect diagnostics. + + Compares the survey vcov to an SRS (HC1) baseline. Must be called + after ``fit()`` with a survey design. + + Returns + ------- + DEFFDiagnostics + """ + self._check_fitted() + if not (hasattr(self, 'survey_design') and self.survey_design is not None): + raise ValueError( + "compute_deff() requires a survey design. " + "Fit with survey_design= first." + ) + from diff_diff.survey import compute_deff_diagnostics + + # Handle rank-deficient fits: compute DEFF only on kept columns, + # then expand back with NaN for dropped columns + nan_mask = np.isnan(self.coefficients_) + if np.any(nan_mask): + kept = np.where(~nan_mask)[0] + if len(kept) == 0: + k = len(self.coefficients_) + nan_arr = np.full(k, np.nan) + from diff_diff.survey import DEFFDiagnostics + return DEFFDiagnostics( + deff=nan_arr, effective_n=nan_arr.copy(), + srs_se=nan_arr.copy(), survey_se=nan_arr.copy(), + coefficient_names=coefficient_names, + ) + # Compute on kept columns only + X_kept = self._X[:, kept] + vcov_kept = self.vcov_[np.ix_(kept, kept)] + deff_kept = compute_deff_diagnostics( + X_kept, self.residuals_, vcov_kept, + self.weights, weight_type=self.weight_type, + ) + # Expand back to full size with NaN for dropped + k = len(self.coefficients_) + full_deff = np.full(k, np.nan) + full_eff_n = np.full(k, np.nan) + full_srs_se = np.full(k, np.nan) + full_survey_se = np.full(k, np.nan) + full_deff[kept] = deff_kept.deff + full_eff_n[kept] = deff_kept.effective_n + full_srs_se[kept] = deff_kept.srs_se + full_survey_se[kept] = deff_kept.survey_se + from diff_diff.survey import DEFFDiagnostics + return DEFFDiagnostics( + deff=full_deff, effective_n=full_eff_n, + srs_se=full_srs_se, survey_se=full_survey_se, + coefficient_names=coefficient_names, + ) + + return compute_deff_diagnostics( + self._X, self.residuals_, self.vcov_, + self.weights, weight_type=self.weight_type, + coefficient_names=coefficient_names, + ) + def _check_fitted(self) -> None: """Raise error if model has not been fitted.""" if self.coefficients_ is None: @@ -1859,11 +2053,25 @@ def get_inference( effective_df = df elif self.survey_df_ is not None: effective_df = self.survey_df_ + elif (hasattr(self, 'survey_design') and self.survey_design is not None + and hasattr(self.survey_design, 'uses_replicate_variance') + and self.survey_design.uses_replicate_variance): + # Replicate design with undefined df (rank <= 1) — NaN inference + warnings.warn( + "Replicate design has undefined survey d.f. (rank <= 1). " + "Inference fields will be NaN.", + UserWarning, stacklevel=2, + ) + effective_df = 0 # Forces NaN from t-distribution else: effective_df = self.df_ # Warn if df is non-positive and fall back to normal distribution - if effective_df is not None and effective_df <= 0: + # (skip for replicate designs — df=0 is intentional for NaN inference) + _is_replicate = (hasattr(self, 'survey_design') and self.survey_design is not None + and hasattr(self.survey_design, 'uses_replicate_variance') + and self.survey_design.uses_replicate_variance) + if effective_df is not None and effective_df <= 0 and not _is_replicate: import warnings warnings.warn( diff --git a/diff_diff/results.py b/diff_diff/results.py index 1c62b8b7..bf833192 100644 --- a/diff_diff/results.py +++ b/diff_diff/results.py @@ -11,6 +11,41 @@ import pandas as pd +def _format_survey_block(sm, width: int) -> list: + """Format survey design metadata block for summary() output. + + Parameters + ---------- + sm : SurveyMetadata + Survey metadata from results object. + width : int + Total width for separator lines and centering. + """ + label_width = 30 if width >= 80 else 25 + lines = [ + "", + "-" * width, + "Survey Design".center(width), + "-" * width, + f"{'Weight type:':<{label_width}} {sm.weight_type:>10}", + ] + if getattr(sm, "replicate_method", None) is not None: + lines.append(f"{'Replicate method:':<{label_width}} {sm.replicate_method:>10}") + if getattr(sm, "n_replicates", None) is not None: + lines.append(f"{'Replicates:':<{label_width}} {sm.n_replicates:>10}") + else: + if sm.n_strata is not None: + lines.append(f"{'Strata:':<{label_width}} {sm.n_strata:>10}") + if sm.n_psu is not None: + lines.append(f"{'PSU/Cluster:':<{label_width}} {sm.n_psu:>10}") + lines.append(f"{'Effective sample size:':<{label_width}} {sm.effective_n:>10.1f}") + lines.append(f"{'Kish DEFF (weights):':<{label_width}} {sm.design_effect:>10.2f}") + if sm.df_survey is not None: + lines.append(f"{'Survey d.f.:':<{label_width}} {sm.df_survey:>10}") + lines.append("-" * width) + return lines + + @dataclass class DiDResults: """ @@ -103,24 +138,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "", - "-" * 70, - "Survey Design".center(70), - "-" * 70, - f"{'Weight type:':<25} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<25} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}") - lines.append("-" * 70) + lines.extend(_format_survey_block(sm, 70)) # Add inference method info if self.inference_method != "analytical": @@ -405,24 +423,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "", - "-" * 80, - "Survey Design".center(80), - "-" * 80, - f"{'Weight type:':<25} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<25} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}") - lines.append("-" * 80) + lines.extend(_format_survey_block(sm, 80)) # Pre-period effects (parallel trends test) pre_effects = {p: pe for p, pe in self.period_effects.items() if p in self.pre_periods} @@ -740,24 +741,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "", - "-" * 75, - "Survey Design".center(75), - "-" * 75, - f"{'Weight type:':<25} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<25} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}") - lines.append("-" * 75) + lines.extend(_format_survey_block(sm, 75)) lines.extend( [ diff --git a/diff_diff/stacked_did.py b/diff_diff/stacked_did.py index 4ef6a6ee..d923c707 100644 --- a/diff_diff/stacked_did.py +++ b/diff_diff/stacked_did.py @@ -242,6 +242,15 @@ def fit( resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, "analytical") ) + # Reject replicate-weight designs — StackedDiD uses + # compute_survey_vcov (TSL) directly without replicate dispatch. + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "StackedDiD does not yet support replicate-weight survey " + "designs. Use CallawaySantAnna for staggered adoption with " + "replicate weights, or use a TSL-based survey design " + "(strata/psu/fpc)." + ) # Reject fweight and aweight — Q-weight composition is ratio-valued # and breaks both frequency-weight (integer) and analytic-weight diff --git a/diff_diff/stacked_did_results.py b/diff_diff/stacked_did_results.py index 631f3817..20bfb65a 100644 --- a/diff_diff/stacked_did_results.py +++ b/diff_diff/stacked_did_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars __all__ = [ "StackedDiDResults", @@ -144,23 +144,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( diff --git a/diff_diff/staggered.py b/diff_diff/staggered.py index 2c831668..718dc93b 100644 --- a/diff_diff/staggered.py +++ b/diff_diff/staggered.py @@ -379,6 +379,20 @@ def _collapse_survey_to_unit_level(resolved_survey, df, unit_col, all_units): .values ) + # Collapse replicate weights to unit level (same groupby pattern) + rep_weights_unit = None + if resolved_survey.replicate_weights is not None: + R = resolved_survey.replicate_weights.shape[1] + rep_weights_unit = np.zeros((n_units, R)) + for r in range(R): + rep_weights_unit[:, r] = ( + pd.Series(resolved_survey.replicate_weights[:, r], index=df.index) + .groupby(df[unit_col]) + .first() + .reindex(all_units) + .values + ) + return ResolvedSurveyDesign( weights=weights_unit.astype(np.float64), weight_type=resolved_survey.weight_type, @@ -388,6 +402,15 @@ def _collapse_survey_to_unit_level(resolved_survey, df, unit_col, all_units): n_strata=resolved_survey.n_strata, n_psu=resolved_survey.n_psu, lonely_psu=resolved_survey.lonely_psu, + replicate_weights=rep_weights_unit, + replicate_method=resolved_survey.replicate_method, + fay_rho=resolved_survey.fay_rho, + n_replicates=resolved_survey.n_replicates, + replicate_strata=resolved_survey.replicate_strata, + combined_weights=resolved_survey.combined_weights, + replicate_scale=resolved_survey.replicate_scale, + replicate_rscales=resolved_survey.replicate_rscales, + mse=resolved_survey.mse, ) def _precompute_structures( @@ -585,6 +608,12 @@ def _compute_att_gt_fast( sw_treated = survey_w[treated_valid] if survey_w is not None else None sw_control = survey_w[control_valid] if survey_w is not None else None + # Guard against zero effective mass after subpopulation filtering + if sw_treated is not None and np.sum(sw_treated) <= 0: + return np.nan, np.nan, 0, 0, None, None + if sw_control is not None and np.sum(sw_control) <= 0: + return np.nan, np.nan, 0, 0, None, None + # Get covariates if specified (from the base period) X_treated = None X_control = None @@ -781,6 +810,9 @@ def _compute_all_att_gt_vectorized( if survey_w is not None: sw_t = survey_w[treated_valid] sw_c = survey_w[control_valid] + # Guard against zero effective mass + if np.sum(sw_t) <= 0 or np.sum(sw_c) <= 0: + continue sw_t_norm = sw_t / np.sum(sw_t) sw_c_norm = sw_c / np.sum(sw_c) mu_t = float(np.sum(sw_t_norm * treated_change)) @@ -842,6 +874,12 @@ def _compute_all_att_gt_vectorized( # Batch inference for all (g,t) pairs at once if task_keys: df_survey_val = precomputed.get("df_survey") + # Guard: replicate design with undefined df → NaN inference + if (df_survey_val is None + and precomputed.get("resolved_survey_unit") is not None + and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance') + and precomputed["resolved_survey_unit"].uses_replicate_variance): + df_survey_val = 0 t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch( np.array(atts), np.array(ses), @@ -1176,8 +1214,16 @@ def _compute_all_att_gt_covariate_reg( # Batch inference if task_keys: + # Use survey df for replicate designs (propagated from precomputed) + _ipw_dr_df = precomputed.get("df_survey") if precomputed is not None else None + # Guard: replicate design with undefined df → NaN inference + if (_ipw_dr_df is None and precomputed is not None + and precomputed.get("resolved_survey_unit") is not None + and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance') + and precomputed["resolved_survey_unit"].uses_replicate_variance): + _ipw_dr_df = 0 t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch( - np.array(atts), np.array(ses), alpha=self.alpha + np.array(atts), np.array(ses), alpha=self.alpha, df=_ipw_dr_df ) for idx, key in enumerate(task_keys): group_time_effects[key]["t_stat"] = float(t_stats[idx]) @@ -1372,6 +1418,11 @@ def fit( # Survey df for safe_inference calls — use the unit-level resolved # survey df computed in _precompute_structures for consistency. df_survey = precomputed.get("df_survey") + # Guard: replicate design with undefined df (rank <= 1) → NaN inference + if (df_survey is None and resolved_survey is not None + and hasattr(resolved_survey, 'uses_replicate_variance') + and resolved_survey.uses_replicate_variance): + df_survey = 0 # Compute ATT(g,t) for each group-time combination min_period = min(time_periods) @@ -1465,6 +1516,17 @@ def fit( overall_att, overall_se = self._aggregate_simple( group_time_effects, influence_func_info, df, unit, precomputed ) + # Re-read df_survey in case replicate aggregation updated it + df_survey = precomputed.get("df_survey") + # Propagate replicate df override to survey_metadata for display consistency + if df_survey is not None and survey_metadata is not None: + if survey_metadata.df_survey != df_survey: + survey_metadata.df_survey = df_survey + # Guard: replicate design with undefined df (rank <= 1) → NaN inference + if (df_survey is None and resolved_survey is not None + and hasattr(resolved_survey, 'uses_replicate_variance') + and resolved_survey.uses_replicate_variance): + df_survey = 0 overall_t, overall_p, overall_ci = safe_inference( overall_att, overall_se, @@ -1498,6 +1560,20 @@ def fit( unit=unit, ) + # Reject replicate-weight designs for bootstrap — replicate variance + # is an analytical alternative, not compatible with bootstrap + if ( + self.n_bootstrap > 0 + and resolved_survey is not None + and hasattr(resolved_survey, "uses_replicate_variance") + and resolved_survey.uses_replicate_variance + ): + raise NotImplementedError( + "CallawaySantAnna bootstrap (n_bootstrap > 0) is not supported " + "with replicate-weight survey designs. Replicate weights provide " + "analytical variance; use n_bootstrap=0 instead." + ) + # Run bootstrap inference if requested bootstrap_results = None if self.n_bootstrap > 0 and influence_func_info: diff --git a/diff_diff/staggered_aggregation.py b/diff_diff/staggered_aggregation.py index b0c18670..3b75bd8b 100644 --- a/diff_diff/staggered_aggregation.py +++ b/diff_diff/staggered_aggregation.py @@ -473,6 +473,17 @@ def _compute_aggregated_se_with_wif( resolved_survey = ( precomputed.get("resolved_survey_unit") if precomputed is not None else None ) + if resolved_survey is not None and hasattr(resolved_survey, "uses_replicate_variance") and resolved_survey.uses_replicate_variance: + from diff_diff.survey import compute_replicate_if_variance + + variance, n_valid_rep = compute_replicate_if_variance(psi_total, resolved_survey) + # Update precomputed survey df to reflect valid replicate count + if precomputed is not None and n_valid_rep < resolved_survey.n_replicates: + precomputed["df_survey"] = n_valid_rep - 1 if n_valid_rep > 1 else None + if np.isnan(variance): + return np.nan + return np.sqrt(max(variance, 0.0)) + if resolved_survey is not None and ( resolved_survey.strata is not None or resolved_survey.psu is not None @@ -606,6 +617,12 @@ def _aggregate_event_study( if not agg_effects_list: return {} df_survey_val = precomputed.get("df_survey") if precomputed is not None else None + # Guard: replicate design with undefined df → NaN inference + if (df_survey_val is None and precomputed is not None + and precomputed.get("resolved_survey_unit") is not None + and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance') + and precomputed["resolved_survey_unit"].uses_replicate_variance): + df_survey_val = 0 t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch( np.array(agg_effects_list), np.array(agg_ses_list), @@ -699,6 +716,12 @@ def _aggregate_by_group( agg_effects = np.array([x[1] for x in group_data_list]) agg_ses = np.array([x[2] for x in group_data_list]) df_survey_val = precomputed.get("df_survey") if precomputed is not None else None + # Guard: replicate design with undefined df → NaN inference + if (df_survey_val is None and precomputed is not None + and precomputed.get("resolved_survey_unit") is not None + and hasattr(precomputed["resolved_survey_unit"], 'uses_replicate_variance') + and precomputed["resolved_survey_unit"].uses_replicate_variance): + df_survey_val = 0 t_stats, p_values, ci_lowers, ci_uppers = safe_inference_batch( agg_effects, agg_ses, diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py index 7f47dc31..3fea9cc8 100644 --- a/diff_diff/staggered_results.py +++ b/diff_diff/staggered_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars if TYPE_CHECKING: from diff_diff.staggered_bootstrap import CSBootstrapResults @@ -165,23 +165,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py index c1eada45..8ab8a916 100644 --- a/diff_diff/sun_abraham.py +++ b/diff_diff/sun_abraham.py @@ -18,7 +18,7 @@ from diff_diff.bootstrap_utils import compute_effect_bootstrap_stats from diff_diff.linalg import LinearRegression -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars from diff_diff.utils import ( safe_inference, ) @@ -131,23 +131,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( @@ -517,6 +501,18 @@ def fit( if resolved_survey is not None: _validate_unit_constant_survey(data, unit, survey_design) + # Reject replicate-weight designs — SunAbraham's weighted + # within-transformation bakes survey weights into X and y, so + # replicate refits on the already-transformed design are incorrect. + # Full estimator-level replicate refits are not yet implemented. + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "SunAbraham does not yet support replicate-weight survey designs. " + "The weighted within-transformation must be recomputed for each " + "replicate, which requires estimator-level replicate refits. " + "Use a TSL-based survey design (strata/psu/fpc) instead." + ) + # Bootstrap + survey supported via Rao-Wu rescaled bootstrap. # Determine Rao-Wu eligibility from the *original* survey_design # (before cluster-as-PSU injection which adds PSU to weights-only designs). diff --git a/diff_diff/survey.py b/diff_diff/survey.py index 76d21b23..a230d867 100644 --- a/diff_diff/survey.py +++ b/diff_diff/survey.py @@ -16,7 +16,7 @@ import warnings from dataclasses import dataclass, field, replace -from typing import Optional, Tuple +from typing import List, Optional, Tuple import numpy as np import pandas as pd @@ -62,6 +62,14 @@ class SurveyDesign: weight_type: str = "pweight" nest: bool = False lonely_psu: str = "remove" + replicate_weights: Optional[List[str]] = None + replicate_method: Optional[str] = None + fay_rho: float = 0.0 + replicate_strata: Optional[List[int]] = None + combined_weights: bool = True + replicate_scale: Optional[float] = None + replicate_rscales: Optional[List[float]] = None + mse: bool = False def __post_init__(self): valid_weight_types = {"pweight", "fweight", "aweight"} @@ -74,6 +82,73 @@ def __post_init__(self): raise ValueError( f"lonely_psu must be one of {valid_lonely}, " f"got '{self.lonely_psu}'" ) + # Replicate weight validation + valid_rep_methods = {"BRR", "Fay", "JK1", "JKn"} + if self.replicate_method is not None: + if self.replicate_method not in valid_rep_methods: + raise ValueError( + f"replicate_method must be one of {valid_rep_methods}, " + f"got '{self.replicate_method}'" + ) + if self.replicate_weights is None: + raise ValueError( + "replicate_weights must be provided when replicate_method is set" + ) + if self.replicate_weights is not None and self.replicate_method is None: + raise ValueError( + "replicate_method must be provided when replicate_weights is set" + ) + if self.replicate_method == "Fay": + if not (0 < self.fay_rho < 1): + raise ValueError( + f"fay_rho must be in (0, 1) for Fay's method, got {self.fay_rho}" + ) + elif self.replicate_method is not None and self.fay_rho != 0.0: + raise ValueError( + f"fay_rho must be 0 for method '{self.replicate_method}', " + f"got {self.fay_rho}" + ) + # Replicate weights are mutually exclusive with strata/psu/fpc + if self.replicate_weights is not None: + if self.strata is not None or self.psu is not None or self.fpc is not None: + raise ValueError( + "replicate_weights cannot be combined with strata/psu/fpc. " + "Replicate weights encode the design structure implicitly." + ) + if self.weights is None: + raise ValueError( + "Full-sample weights must be provided alongside replicate_weights" + ) + # JKn requires replicate_strata + if self.replicate_method == "JKn": + if self.replicate_strata is None: + raise ValueError( + "replicate_strata is required for JKn method. Provide a list " + "of stratum assignments (one per replicate weight column)." + ) + if len(self.replicate_strata) != len(self.replicate_weights): + raise ValueError( + f"replicate_strata length ({len(self.replicate_strata)}) must " + f"match replicate_weights length ({len(self.replicate_weights)})" + ) + # Validate scale/rscales values and length + if self.replicate_scale is not None: + if not (np.isfinite(self.replicate_scale) and self.replicate_scale > 0): + raise ValueError( + f"replicate_scale must be a positive finite number, " + f"got {self.replicate_scale}" + ) + if self.replicate_rscales is not None and self.replicate_weights is not None: + if len(self.replicate_rscales) != len(self.replicate_weights): + raise ValueError( + f"replicate_rscales length ({len(self.replicate_rscales)}) must " + f"match replicate_weights length ({len(self.replicate_weights)})" + ) + rscales_arr = np.asarray(self.replicate_rscales, dtype=float) + if not np.all(np.isfinite(rscales_arr)): + raise ValueError("replicate_rscales must be finite") + if np.any(rscales_arr < 0): + raise ValueError("replicate_rscales must be non-negative") def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign": """ @@ -102,26 +177,109 @@ def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign": raise ValueError("Weights contain NaN values") if np.any(~np.isfinite(raw_weights)): raise ValueError("Weights contain Inf values") - if np.any(raw_weights <= 0): - raise ValueError("Weights must be strictly positive") + if np.any(raw_weights < 0): + raise ValueError("Weights must be non-negative") + if np.any(raw_weights == 0) and np.all(raw_weights == 0): + raise ValueError( + "All weights are zero. At least one observation must " + "have a positive weight." + ) - # fweight validation: must be positive integers + # fweight validation: must be non-negative integers if self.weight_type == "fweight": - fractional = raw_weights - np.round(raw_weights) - if np.any(np.abs(fractional) > 1e-10): - raise ValueError( - "Frequency weights (fweight) must be positive integers. " - "Fractional values detected. Use pweight for non-integer weights." - ) + pos_mask = raw_weights > 0 + if np.any(pos_mask): + fractional = raw_weights[pos_mask] - np.round(raw_weights[pos_mask]) + if np.any(np.abs(fractional) > 1e-10): + raise ValueError( + "Frequency weights (fweight) must be non-negative integers. " + "Fractional values detected. Use pweight for non-integer weights." + ) # Normalize: pweights/aweights to sum=n (mean=1); fweights unchanged - if self.weight_type in ("pweight", "aweight"): + # Skip normalization for replicate designs — the IF path uses + # w_r / w_full ratios that must be on the same raw scale + if self.replicate_weights is not None: + weights = raw_weights.copy() + elif self.weight_type in ("pweight", "aweight"): weights = raw_weights * (n / np.sum(raw_weights)) else: weights = raw_weights.copy() else: weights = np.ones(n, dtype=np.float64) + # --- Replicate weights (short-circuit strata/psu/fpc) --- + if self.replicate_weights is not None: + rep_cols = self.replicate_weights + for col in rep_cols: + if col not in data.columns: + raise ValueError( + f"Replicate weight column '{col}' not found in data" + ) + rep_arr = np.column_stack( + [data[col].values.astype(np.float64) for col in rep_cols] + ) + if np.any(np.isnan(rep_arr)): + raise ValueError("Replicate weights contain NaN values") + if np.any(~np.isfinite(rep_arr)): + raise ValueError("Replicate weights contain Inf values") + if np.any(rep_arr < 0): + raise ValueError("Replicate weights must be non-negative") + # Validate combined_weights contract: when True, replicate columns + # include the full-sample weight, so w_r > 0 with w_full == 0 is + # malformed (observation excluded from full sample but included in + # a replicate). + combined = ( + self.combined_weights + if self.combined_weights is not None + else True + ) + if combined: + zero_full = weights == 0 + if np.any(zero_full): + rep_positive_on_zero = np.any(rep_arr[zero_full] > 0, axis=1) + if np.any(rep_positive_on_zero): + raise ValueError( + "Malformed combined_weights=True design: some " + "replicate columns have positive weight where " + "full-sample weight is zero. Either fix the " + "replicate columns or use combined_weights=False." + ) + # Do NOT normalize replicate columns — the IF path uses w_r/w_full + # ratios that must reflect the true replicate design, not rescaled sums + n_rep = rep_arr.shape[1] + if n_rep < 2: + raise ValueError( + "At least 2 replicate weight columns are required" + ) + return ResolvedSurveyDesign( + weights=weights, + weight_type=self.weight_type, + strata=None, + psu=None, + fpc=None, + n_strata=0, + n_psu=0, + lonely_psu=self.lonely_psu, + replicate_weights=rep_arr, + replicate_method=self.replicate_method, + fay_rho=self.fay_rho, + n_replicates=n_rep, + replicate_strata=( + np.asarray(self.replicate_strata, dtype=int) + if self.replicate_strata is not None + else None + ), + combined_weights=self.combined_weights, + replicate_scale=self.replicate_scale, + replicate_rscales=( + np.asarray(self.replicate_rscales, dtype=float) + if self.replicate_rscales is not None + else None + ), + mse=self.mse, + ) + # --- Strata --- strata_arr = None n_strata = 0 @@ -273,6 +431,144 @@ def resolve(self, data: pd.DataFrame) -> "ResolvedSurveyDesign": lonely_psu=self.lonely_psu, ) + def subpopulation( + self, + data: pd.DataFrame, + mask, + ) -> Tuple["SurveyDesign", pd.DataFrame]: + """Create a subpopulation design by zeroing out excluded observations. + + Preserves the full survey design structure (strata, PSU) while setting + weights to zero for observations outside the subpopulation. This is + the correct approach for subpopulation analysis — unlike naive + subsetting, it retains design information for variance estimation. + + Parameters + ---------- + mask : array-like of bool, str, or callable + Defines the subpopulation: + - bool array/Series of length ``len(data)`` — True = included + - str — column name in ``data`` containing boolean values + - callable — applied to ``data``, must return bool array + + Returns + ------- + (SurveyDesign, pd.DataFrame) + A new SurveyDesign pointing to a ``_subpop_weight`` column in the + returned DataFrame copy. The pair should be used together: pass + the returned DataFrame to ``fit()`` with the returned SurveyDesign. + """ + # Resolve mask to boolean array + if callable(mask): + raw_mask = np.asarray(mask(data)) + elif isinstance(mask, str): + if mask not in data.columns: + raise ValueError(f"Mask column '{mask}' not found in data") + raw_mask = np.asarray(data[mask].values) + else: + raw_mask = np.asarray(mask) + + # Validate: reject pd.NA/pd.NaT/None before bool coercion + try: + if pd.isna(raw_mask).any(): + raise ValueError( + "Subpopulation mask contains NA/missing values. " + "Provide a boolean mask with no missing values." + ) + except (TypeError, ValueError) as e: + if "NA/missing" in str(e): + raise + # pd.isna can't handle some dtypes — fall through to specific checks + if raw_mask.dtype.kind == 'f' and np.any(np.isnan(raw_mask)): + raise ValueError( + "Subpopulation mask contains NaN values. " + "Provide a boolean mask with no missing values." + ) + if hasattr(raw_mask, 'dtype') and raw_mask.dtype == object: + # Check for None values (pd.NA, None, etc.) + if any(v is None for v in raw_mask): + raise ValueError( + "Subpopulation mask contains None/NA values. " + "Provide a boolean mask with no missing values." + ) + # Reject string/object masks — non-empty strings coerce to True + # which silently defines the wrong domain + if any(isinstance(v, str) for v in raw_mask): + raise ValueError( + "Subpopulation mask has object dtype with string values. " + "Provide a boolean or numeric (0/1) mask, not strings." + ) + if hasattr(raw_mask, 'dtype') and raw_mask.dtype.kind in ('U', 'S'): + raise ValueError( + "Subpopulation mask contains string values. " + "Provide a boolean or numeric (0/1) mask." + ) + # Validate numeric masks: only {0, 1} allowed (not {1, 2}, etc.) + if hasattr(raw_mask, 'dtype') and raw_mask.dtype.kind in ('i', 'u', 'f'): + unique_vals = set(np.unique(raw_mask[np.isfinite(raw_mask)]).tolist()) + if not unique_vals.issubset({0, 1, 0.0, 1.0, True, False}): + raise ValueError( + f"Subpopulation mask contains non-binary numeric values " + f"{unique_vals - {0, 1, 0.0, 1.0}}. " + f"Provide a boolean or numeric (0/1) mask." + ) + mask_arr = raw_mask.astype(bool) + + if len(mask_arr) != len(data): + raise ValueError( + f"Mask length ({len(mask_arr)}) does not match data " + f"length ({len(data)})" + ) + + if not np.any(mask_arr): + raise ValueError( + "Subpopulation mask excludes all observations. " + "At least one observation must be included." + ) + + # Build subpopulation weights + if self.weights is not None: + if self.weights not in data.columns: + raise ValueError( + f"Weight column '{self.weights}' not found in data" + ) + base_weights = data[self.weights].values.astype(np.float64) + else: + base_weights = np.ones(len(data), dtype=np.float64) + + subpop_weights = np.where(mask_arr, base_weights, 0.0) + + # Create data copy with synthetic weight column + data_out = data.copy() + data_out["_subpop_weight"] = subpop_weights + + # Zero out replicate weight columns for excluded observations + if self.replicate_weights is not None: + for col in self.replicate_weights: + if col in data.columns: + data_out[col] = np.where(mask_arr, data[col].values, 0.0) + + # Return new SurveyDesign using the synthetic column + new_design = SurveyDesign( + weights="_subpop_weight", + strata=self.strata, + psu=self.psu, + fpc=self.fpc, + weight_type=self.weight_type, + nest=self.nest, + lonely_psu=self.lonely_psu, + replicate_weights=self.replicate_weights, + replicate_method=self.replicate_method, + fay_rho=self.fay_rho, + replicate_strata=self.replicate_strata, + combined_weights=self.combined_weights, + replicate_scale=self.replicate_scale, + replicate_rscales=self.replicate_rscales, + mse=self.mse, + ) + + return new_design, data_out + @dataclass class ResolvedSurveyDesign: @@ -290,10 +586,50 @@ class ResolvedSurveyDesign: n_strata: int n_psu: int lonely_psu: str + replicate_weights: Optional[np.ndarray] = None # (n, R) array + replicate_method: Optional[str] = None + fay_rho: float = 0.0 + n_replicates: int = 0 + replicate_strata: Optional[np.ndarray] = None # (R,) for JKn + combined_weights: bool = True + replicate_scale: Optional[float] = None + replicate_rscales: Optional[np.ndarray] = None # (R,) per-replicate scales + mse: bool = False + + @property + def uses_replicate_variance(self) -> bool: + """Whether replicate-based variance should be used instead of TSL.""" + return self.replicate_method is not None @property def df_survey(self) -> Optional[int]: - """Survey degrees of freedom: n_PSU - n_strata.""" + """Survey degrees of freedom. + + For replicate designs: QR-rank of the analysis-weight matrix minus 1, + matching R's ``survey::degf()`` which uses ``qr(..., tol=1e-5)$rank``. + Returns ``None`` when rank <= 1 (insufficient for t-based inference). + For TSL: n_PSU - n_strata. + """ + if self.uses_replicate_variance: + if self.replicate_weights is None or self.n_replicates < 2: + return None + # QR-rank of analysis-weight matrix, matching R's survey::degf() + # which uses qr(weights(design, "analysis"), tol=1e-5)$rank. + # For combined_weights=True, replicate cols ARE analysis weights. + # For combined_weights=False, analysis weights = rep * full-sample. + if self.combined_weights: + analysis_weights = self.replicate_weights + else: + analysis_weights = self.replicate_weights * self.weights[:, np.newaxis] + # Pivoted QR with R-compatible tolerance, matching R's + # qr(..., tol=1e-5) which uses column pivoting (LAPACK dgeqp3) + from scipy.linalg import qr as scipy_qr + _, R_mat, _ = scipy_qr(analysis_weights, pivoting=True, mode='economic') + diag_abs = np.abs(np.diag(R_mat)) + tol = 1e-5 + rank = int(np.sum(diag_abs > tol * diag_abs.max())) if diag_abs.max() > 0 else 0 + df = rank - 1 + return df if df > 0 else None if self.psu is not None and self.n_psu > 0: if self.strata is not None and self.n_strata > 0: return self.n_psu - self.n_strata @@ -304,6 +640,52 @@ def df_survey(self) -> Optional[int]: return n_obs - self.n_strata return n_obs - 1 + def subset_to_units( + self, + row_idx: np.ndarray, + weights: np.ndarray, + strata: Optional[np.ndarray], + psu: Optional[np.ndarray], + fpc: Optional[np.ndarray], + n_strata: int, + n_psu: int, + ) -> "ResolvedSurveyDesign": + """Create a unit-level copy preserving replicate metadata. + + Used by panel estimators (ContinuousDiD, EfficientDiD) that collapse + panel-level survey info to one row per unit. + + Parameters + ---------- + row_idx : np.ndarray + Indices into the panel-level arrays to select one row per unit. + weights, strata, psu, fpc, n_strata, n_psu + Already-subsetted TSL fields (computed by the caller). + """ + rep_weights_sub = None + if self.replicate_weights is not None: + rep_weights_sub = self.replicate_weights[row_idx, :] + + return ResolvedSurveyDesign( + weights=weights, + weight_type=self.weight_type, + strata=strata, + psu=psu, + fpc=fpc, + n_strata=n_strata, + n_psu=n_psu, + lonely_psu=self.lonely_psu, + replicate_weights=rep_weights_sub, + replicate_method=self.replicate_method, + fay_rho=self.fay_rho, + n_replicates=self.n_replicates, + replicate_strata=self.replicate_strata, + combined_weights=self.combined_weights, + replicate_scale=self.replicate_scale, + replicate_rscales=self.replicate_rscales, + mse=self.mse, + ) + @property def needs_survey_vcov(self) -> bool: """Whether survey vcov (not generic sandwich) should be used.""" @@ -343,6 +725,9 @@ class SurveyMetadata: n_psu: Optional[int] = None weight_range: Tuple[float, float] = field(default=(0.0, 0.0)) df_survey: Optional[int] = None + replicate_method: Optional[str] = None + n_replicates: Optional[int] = None + deff_diagnostics: Optional["DEFFDiagnostics"] = None def compute_survey_metadata( @@ -371,13 +756,21 @@ def compute_survey_metadata( design_effect = n * sum_w2 / (sum_w**2) if sum_w > 0 else 1.0 n_strata = resolved.n_strata if resolved.strata is not None else None - if resolved.psu is not None: + if resolved.uses_replicate_variance: + # Replicate designs don't have meaningful PSU/strata counts + n_psu = None + n_strata = None + elif resolved.psu is not None: n_psu = resolved.n_psu else: # Implicit PSU: each observation is its own PSU n_psu = len(resolved.weights) df_survey = resolved.df_survey + # Replicate info + rep_method = resolved.replicate_method if resolved.uses_replicate_variance else None + n_rep = resolved.n_replicates if resolved.uses_replicate_variance else None + return SurveyMetadata( weight_type=resolved.weight_type, effective_n=effective_n, @@ -387,6 +780,101 @@ def compute_survey_metadata( n_psu=n_psu, weight_range=(float(np.min(raw_weights)), float(np.max(raw_weights))), df_survey=df_survey, + replicate_method=rep_method, + n_replicates=n_rep, + ) + + +@dataclass +class DEFFDiagnostics: + """Per-coefficient design effect diagnostics. + + Compares survey-design variance to simple random sampling (SRS) + variance for each coefficient, giving the variance inflation factor + due to the survey design (clustering, stratification, weighting). + + Attributes + ---------- + deff : np.ndarray + Per-coefficient DEFF: survey_var / srs_var. Shape (k,). + effective_n : np.ndarray + Effective sample size per coefficient: n / DEFF. Shape (k,). + srs_se : np.ndarray + SRS (HC1) standard errors. Shape (k,). + survey_se : np.ndarray + Survey standard errors. Shape (k,). + coefficient_names : list of str or None + Names for display. + """ + + deff: np.ndarray + effective_n: np.ndarray + srs_se: np.ndarray + survey_se: np.ndarray + coefficient_names: Optional[List[str]] = None + + +def compute_deff_diagnostics( + X: np.ndarray, + residuals: np.ndarray, + survey_vcov: np.ndarray, + weights: np.ndarray, + weight_type: str = "pweight", + coefficient_names: Optional[List[str]] = None, +) -> DEFFDiagnostics: + """Compute per-coefficient design effects. + + Compares the survey variance-covariance matrix to a simple random + sampling (SRS) baseline (HC1 sandwich, ignoring strata/PSU/FPC). + + Parameters + ---------- + X : np.ndarray + Design matrix of shape (n, k). + residuals : np.ndarray + Residuals from the WLS fit, shape (n,). + survey_vcov : np.ndarray + Survey variance-covariance matrix, shape (k, k). + weights : np.ndarray + Observation weights (normalized), shape (n,). + weight_type : str, default "pweight" + Weight type for SRS computation. + coefficient_names : list of str, optional + Names for display. + + Returns + ------- + DEFFDiagnostics + """ + from diff_diff.linalg import compute_robust_vcov + + n = X.shape[0] + # Use positive-weight count for effective n (zero-weight rows from + # subpopulation don't contribute to the effective sample) + n_eff = int(np.count_nonzero(weights > 0)) if np.any(weights == 0) else n + + # SRS baseline: HC1 weighted sandwich ignoring design structure + srs_vcov = compute_robust_vcov( + X, residuals, cluster_ids=None, weights=weights, weight_type=weight_type, + ) + + survey_var = np.diag(survey_vcov) + srs_var = np.diag(srs_vcov) + + # DEFF = survey_var / srs_var + with np.errstate(divide="ignore", invalid="ignore"): + deff = np.where(srs_var > 0, survey_var / srs_var, np.nan) + eff_n = np.where(deff > 0, n_eff / deff, np.nan) + + survey_se = np.sqrt(np.maximum(survey_var, 0.0)) + srs_se = np.sqrt(np.maximum(srs_var, 0.0)) + + return DEFFDiagnostics( + deff=deff, + effective_n=eff_n, + srs_se=srs_se, + survey_se=survey_se, + coefficient_names=coefficient_names, ) @@ -417,6 +905,9 @@ def _validate_unit_constant_survey(data, unit_col, survey_design): survey_design.psu, survey_design.fpc, ] + # Also validate replicate weight columns for within-unit constancy + if survey_design.replicate_weights is not None: + cols_to_check.extend(survey_design.replicate_weights) for col in cols_to_check: if col is not None and col in data.columns: n_unique = data.groupby(unit_col)[col].nunique() @@ -760,6 +1251,9 @@ def compute_survey_vcov( # Compute weighted scores per observation: w_i * X_i * u_i if resolved.weight_type == "aweight": scores = X * residuals[:, np.newaxis] + # Zero-weight observations should not contribute to aweight meat + if np.any(weights == 0): + scores[weights == 0] = 0.0 else: scores = X * (weights * residuals)[:, np.newaxis] @@ -833,6 +1327,292 @@ def compute_survey_if_variance( return meat_scalar +def _replicate_variance_factor( + method: str, n_replicates: int, fay_rho: float, +) -> float: + """Compute the scalar variance factor for replicate methods.""" + if method == "BRR": + return 1.0 / n_replicates + elif method == "Fay": + return 1.0 / (n_replicates * (1.0 - fay_rho) ** 2) + elif method == "JK1": + return (n_replicates - 1.0) / n_replicates + # JKn handled separately (per-stratum factors) + raise ValueError(f"Unknown replicate method: {method}") + + +def compute_replicate_vcov( + X: np.ndarray, + y: np.ndarray, + full_sample_coef: np.ndarray, + resolved: "ResolvedSurveyDesign", + weight_type: str = "pweight", +) -> np.ndarray: + """Compute replicate-weight variance-covariance matrix. + + Re-runs WLS for each replicate weight column and computes variance + from the distribution of replicate coefficient vectors. + + Parameters + ---------- + X : np.ndarray + Design matrix of shape (n, k). + y : np.ndarray + Response vector of shape (n,). + full_sample_coef : np.ndarray + Coefficients from the full-sample fit, shape (k,). + resolved : ResolvedSurveyDesign + Must have ``uses_replicate_variance == True``. + weight_type : str, default "pweight" + Weight type for per-replicate WLS. + + Returns + ------- + np.ndarray + Variance-covariance matrix of shape (k, k). + """ + from diff_diff.linalg import solve_ols + + rep_weights = resolved.replicate_weights + method = resolved.replicate_method + R = resolved.n_replicates + k = X.shape[1] + + # Collect replicate coefficient vectors + coef_reps = np.full((R, k), np.nan) + for r in range(R): + w_r = rep_weights[:, r] + # For non-combined weights, multiply by full-sample weights + if not resolved.combined_weights: + w_r = w_r * resolved.weights + # Skip replicates where all weights are zero + if np.sum(w_r) == 0: + continue + try: + coef_r, _, _ = solve_ols( + X, y, + weights=w_r, + weight_type=weight_type, + rank_deficient_action="silent", + return_vcov=False, + check_finite=False, + ) + coef_reps[r] = coef_r + except (np.linalg.LinAlgError, ValueError): + pass # NaN row for singular/degenerate replicate solve + + # Remove replicates with NaN coefficients + valid = np.all(np.isfinite(coef_reps), axis=1) + n_invalid = int(R - np.sum(valid)) + if n_invalid > 0: + warnings.warn( + f"{n_invalid} of {R} replicate solves failed (singular or degenerate). " + f"Variance computed from {int(np.sum(valid))} valid replicates.", + UserWarning, + stacklevel=2, + ) + n_valid = int(np.sum(valid)) + if n_valid < 2: + if n_valid == 0: + warnings.warn( + "All replicate solves failed. Returning NaN variance.", + UserWarning, stacklevel=2, + ) + else: + warnings.warn( + f"Only {n_valid} valid replicate(s) — variance is not estimable " + f"with fewer than 2. Returning NaN.", + UserWarning, stacklevel=2, + ) + return np.full((k, k), np.nan), n_valid + coef_valid = coef_reps[valid] + c = full_sample_coef + + # Compute variance by method + # Support mse=False: center on replicate mean instead of full-sample estimate + # When rscales present and mse=False, center only over rscales > 0 + # (R's svrVar convention — zero-scaled replicates should not shift center) + if resolved.mse: + center = c + else: + if resolved.replicate_rscales is not None: + pos_scale = resolved.replicate_rscales[valid] > 0 + if np.any(pos_scale): + center = np.mean(coef_valid[pos_scale], axis=0) + else: + center = np.mean(coef_valid, axis=0) + else: + center = np.mean(coef_valid, axis=0) + diffs = coef_valid - center[np.newaxis, :] + + outer_sum = diffs.T @ diffs # (k, k) + + # BRR/Fay: use fixed scaling, ignore user-supplied scale/rscales (R convention) + if method in ("BRR", "Fay"): + if resolved.replicate_scale is not None or resolved.replicate_rscales is not None: + warnings.warn( + f"Custom replicate_scale/replicate_rscales ignored for {method} " + f"(BRR/Fay use fixed scaling).", + UserWarning, stacklevel=2, + ) + factor = _replicate_variance_factor(method, R, resolved.fay_rho) + return factor * outer_sum, n_valid + + # JK1/JKn: apply scale * rscales multiplicatively (R's svrVar contract) + scale = resolved.replicate_scale if resolved.replicate_scale is not None else 1.0 + + if resolved.replicate_rscales is not None: + valid_rscales = resolved.replicate_rscales[valid] + V = np.zeros((k, k)) + for i in range(len(diffs)): + V += valid_rscales[i] * np.outer(diffs[i], diffs[i]) + return scale * V, n_valid + + if method == "JK1": + factor = _replicate_variance_factor(method, R, resolved.fay_rho) + return scale * factor * outer_sum, n_valid + elif method == "JKn": + # JKn: V = sum_h ((n_h-1)/n_h) * sum_{r in h} (c_r - c)(c_r - c)^T + rep_strata = resolved.replicate_strata + if rep_strata is None: + raise ValueError("JKn requires replicate_strata") + valid_strata = rep_strata[valid] + V = np.zeros((k, k)) + for h in np.unique(rep_strata): + n_h_original = int(np.sum(rep_strata == h)) + mask_h = valid_strata == h + if not np.any(mask_h): + continue + diffs_h = diffs[mask_h] + V += ((n_h_original - 1.0) / n_h_original) * (diffs_h.T @ diffs_h) + return scale * V, n_valid + else: + raise ValueError(f"Unknown replicate method: {method}") + + +def compute_replicate_if_variance( + psi: np.ndarray, + resolved: "ResolvedSurveyDesign", +) -> Tuple[float, int]: + """Compute replicate-based variance for influence-function estimators. + + Instead of re-running the full estimator, reweights the influence + function under each replicate weight set. + + Parameters + ---------- + psi : np.ndarray + Per-unit influence function values, shape (n,). + resolved : ResolvedSurveyDesign + Must have ``uses_replicate_variance == True``. + + Returns + ------- + float + Replicate-based variance estimate. + """ + psi = np.asarray(psi, dtype=np.float64).ravel() + rep_weights = resolved.replicate_weights + method = resolved.replicate_method + R = resolved.n_replicates + + # Match the contract of compute_survey_if_variance(): psi is accepted + # as-is (the combined IF/WIF object), with NO extra weight multiplication. + # Replicate contrasts are formed by rescaling each unit's contribution + # by the ratio w_r/w_full (Rao-Wu reweighting). + full_weights = resolved.weights + theta_full = float(np.sum(psi)) + + # Validate: combined_weights=True requires w_full > 0 wherever w_r > 0 + if resolved.combined_weights: + for r in range(R): + bad = (rep_weights[:, r] > 0) & (full_weights <= 0) + if np.any(bad): + raise ValueError( + f"Replicate column {r} has positive weight where full-sample " + f"weight is zero. With combined_weights=True, every " + f"replicate-positive observation must have a positive " + f"full-sample weight." + ) + + # Compute replicate estimates via weight-ratio rescaling + theta_reps = np.full(R, np.nan) + for r in range(R): + w_r = rep_weights[:, r] + if np.any(w_r > 0): + if resolved.combined_weights: + # Combined: w_r already includes full-sample weight + ratio = np.divide( + w_r, full_weights, + out=np.zeros_like(w_r, dtype=np.float64), + where=full_weights > 0, + ) + else: + # Non-combined: w_r is perturbation factor directly + ratio = w_r + theta_reps[r] = np.sum(ratio * psi) + + valid = np.isfinite(theta_reps) + n_valid = int(np.sum(valid)) + if n_valid < 2: + return np.nan, n_valid + + # Support mse=False: center on replicate mean + # When rscales present and mse=False, center only over rscales > 0 + # (R's svrVar convention — zero-scaled replicates should not shift center) + if resolved.mse: + center = theta_full + else: + if resolved.replicate_rscales is not None: + pos_scale = resolved.replicate_rscales[valid] > 0 + if np.any(pos_scale): + center = float(np.mean(theta_reps[valid][pos_scale])) + else: + center = float(np.mean(theta_reps[valid])) + else: + center = float(np.mean(theta_reps[valid])) + diffs = theta_reps[valid] - center + + ss = float(np.sum(diffs**2)) + + # BRR/Fay: use fixed scaling, ignore user-supplied scale/rscales (R convention) + if method in ("BRR", "Fay"): + if resolved.replicate_scale is not None or resolved.replicate_rscales is not None: + warnings.warn( + f"Custom replicate_scale/replicate_rscales ignored for {method} " + f"(BRR/Fay use fixed scaling).", + UserWarning, stacklevel=2, + ) + factor = _replicate_variance_factor(method, R, resolved.fay_rho) + return factor * ss, n_valid + + # JK1/JKn: apply scale * rscales multiplicatively (R's svrVar contract) + scale = resolved.replicate_scale if resolved.replicate_scale is not None else 1.0 + + if resolved.replicate_rscales is not None: + valid_rscales = resolved.replicate_rscales[valid] + return scale * float(np.sum(valid_rscales * diffs**2)), n_valid + + if method == "JK1": + factor = _replicate_variance_factor(method, R, resolved.fay_rho) + return scale * factor * ss, n_valid + elif method == "JKn": + rep_strata = resolved.replicate_strata + if rep_strata is None: + raise ValueError("JKn requires replicate_strata") + valid_strata = rep_strata[valid] + result = 0.0 + for h in np.unique(rep_strata): + n_h_original = int(np.sum(rep_strata == h)) + mask_h = valid_strata == h + if not np.any(mask_h): + continue + result += ((n_h_original - 1.0) / n_h_original) * float(np.sum(diffs[mask_h] ** 2)) + return scale * result, n_valid + else: + raise ValueError(f"Unknown replicate method: {method}") + + def aggregate_to_psu( values: np.ndarray, resolved: "ResolvedSurveyDesign", diff --git a/diff_diff/synthetic_did.py b/diff_diff/synthetic_did.py index cea1e481..43f10460 100644 --- a/diff_diff/synthetic_did.py +++ b/diff_diff/synthetic_did.py @@ -256,6 +256,12 @@ def fit( # type: ignore[override] resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, "analytical") ) + # Reject replicate-weight designs — SyntheticDiD uses bootstrap variance + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "SyntheticDiD does not yet support replicate-weight survey " + "designs. Use a TSL-based survey design (strata/psu/fpc)." + ) # Validate pweight only (strata/PSU/FPC are allowed for Rao-Wu bootstrap) if resolved_survey is not None and resolved_survey.weight_type != "pweight": raise ValueError( diff --git a/diff_diff/triple_diff.py b/diff_diff/triple_diff.py index 5d9a70ea..6e9b6eb9 100644 --- a/diff_diff/triple_diff.py +++ b/diff_diff/triple_diff.py @@ -30,7 +30,7 @@ import pandas as pd from diff_diff.linalg import solve_logit, solve_ols -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars from diff_diff.utils import safe_inference _MIN_CELL_SIZE = 10 @@ -152,24 +152,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "", - "-" * 75, - "Survey Design".center(75), - "-" * 75, - f"{'Weight type:':<30} {sm.weight_type:>15}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>15}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>15}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>15.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>15.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>15}") - lines.append("-" * 75) + lines.extend(_format_survey_block(sm, 75)) if self.inference_method != "analytical": lines.append(f"{'Inference method:':<30} {self.inference_method:>15}") @@ -495,6 +478,9 @@ def fit( NotImplementedError If survey_design is used with wild_bootstrap inference. """ + # Reset replicate state from any previous fit + self._replicate_n_valid = None + # Resolve survey design if provided from diff_diff.survey import ( _inject_cluster_as_psu, @@ -595,7 +581,21 @@ def fit( # Compute inference # When survey design is active, use survey df (n_PSU - n_strata) if survey_metadata is not None and survey_metadata.df_survey is not None: - df = max(survey_metadata.df_survey, 1) + df = survey_metadata.df_survey + # Override with effective replicate df only when replicates were dropped + if (hasattr(self, '_replicate_n_valid') and self._replicate_n_valid is not None + and resolved_survey is not None + and self._replicate_n_valid < resolved_survey.n_replicates): + df = self._replicate_n_valid - 1 + survey_metadata.df_survey = self._replicate_n_valid - 1 + # df <= 0 means insufficient rank for t-based inference + if df is not None and df <= 0: + df = 0 # Forces NaN from t-distribution + elif (resolved_survey is not None + and hasattr(resolved_survey, 'uses_replicate_variance') + and resolved_survey.uses_replicate_variance): + # Replicate design with undefined df (rank <= 1) — NaN inference + df = 0 # Forces NaN from t-distribution else: df = n_obs - 8 # Approximate df (8 cell means) if covariates: @@ -721,7 +721,14 @@ def _compute_cell_means( mask = (G == g_val) & (P == p_val) & (T == t_val) cell_name = f"{g_name}, {p_name}, {t_name}" if weights is not None: - means[cell_name] = float(np.average(y[mask], weights=weights[mask])) + w_cell = weights[mask] + if np.sum(w_cell) <= 0: + raise ValueError( + f"Cell '{cell_name}' has zero effective survey " + f"weight. Cannot compute weighted cell mean. " + f"Check subpopulation/domain definition." + ) + means[cell_name] = float(np.average(y[mask], weights=w_cell)) else: means[cell_name] = float(np.mean(y[mask])) return means @@ -1089,15 +1096,32 @@ def _estimate_ddd_decomposition( # For reg: pairwise IFs are already on the unweighted scale (WLS # fits use weights but the IF is residual-based, not Riesz-weighted), # so pass directly to TSL without de-weighting. - from diff_diff.survey import compute_survey_vcov - inf_for_tsl = inf_func.copy() if est_method in ("ipw", "dr") and survey_weights is not None: sw = survey_weights nz = sw > 0 inf_for_tsl[nz] = inf_for_tsl[nz] / sw[nz] - vcov_survey = compute_survey_vcov(np.ones((n, 1)), inf_for_tsl, resolved_survey) - se = float(np.sqrt(vcov_survey[0, 0])) + + if resolved_survey.uses_replicate_variance: + from diff_diff.survey import compute_replicate_if_variance + + # Score-scale to match TSL bread (1/sum(w)^2): + # reg: psi = w * inf_func / sum(w) + # ipw/dr: psi = inf_func / sum(w) (survey weights cancel) + w_sum = float(resolved_survey.weights.sum()) + if est_method in ("ipw", "dr") and survey_weights is not None: + psi_rep = inf_func / w_sum + else: + psi_rep = resolved_survey.weights * inf_func / w_sum + variance, n_valid_rep = compute_replicate_if_variance(psi_rep, resolved_survey) + se = float(np.sqrt(max(variance, 0.0))) if np.isfinite(variance) else np.nan + # Store effective replicate count for df update in fit() + self._replicate_n_valid = n_valid_rep + else: + from diff_diff.survey import compute_survey_vcov + + vcov_survey = compute_survey_vcov(np.ones((n, 1)), inf_for_tsl, resolved_survey) + se = float(np.sqrt(vcov_survey[0, 0])) elif self._cluster_ids is not None: # Cluster-robust SE: sum IF within clusters, then Liang-Zeger variance unique_clusters = np.unique(self._cluster_ids) @@ -1187,6 +1211,14 @@ def _fit_predict_mu( if weights is not None: # WLS: transform by sqrt(weights) for weighted least squares w_fit = weights[fit_mask] + # Check positive-weight effective sample — subpopulation() + # can zero weights leaving rows but an underidentified WLS. + n_eff = int(np.count_nonzero(w_fit > 0)) + n_cols = X_fit.shape[1] + if n_eff <= n_cols: + if np.sum(w_fit) > 0: + return np.full(n_total, np.average(y_fit, weights=w_fit)) + return np.full(n_total, np.mean(y_fit)) sqrt_w = np.sqrt(w_fit) beta, _, _ = solve_ols( X_fit * sqrt_w[:, np.newaxis], diff --git a/diff_diff/trop.py b/diff_diff/trop.py index c9ab4286..4d26a69f 100644 --- a/diff_diff/trop.py +++ b/diff_diff/trop.py @@ -461,6 +461,12 @@ def fit( resolved_survey, _survey_weights, _survey_wt, survey_metadata = _resolve_survey_for_fit( survey_design, data, "analytical" ) + # Reject replicate-weight designs — TROP uses Rao-Wu bootstrap + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "TROP does not yet support replicate-weight survey designs. " + "Use a TSL-based survey design (strata/psu/fpc)." + ) # Validate weight_type is pweight (keep restriction), but allow # strata/PSU/FPC — those are handled via Rao-Wu rescaled bootstrap. if resolved_survey is not None and resolved_survey.weight_type != "pweight": diff --git a/diff_diff/trop_results.py b/diff_diff/trop_results.py index 94c38657..2123499f 100644 --- a/diff_diff/trop_results.py +++ b/diff_diff/trop_results.py @@ -16,7 +16,7 @@ except ImportError: from typing_extensions import TypedDict -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars __all__ = [ "_LAMBDA_INF", @@ -208,24 +208,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Add survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "", - "-" * 75, - "Survey Design".center(75), - "-" * 75, - f"{'Weight type:':<25} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<25} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<25} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<25} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<25} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<25} {sm.df_survey:>10}") - lines.append("-" * 75) + lines.extend(_format_survey_block(sm, 75)) lines.extend( [ diff --git a/diff_diff/twfe.py b/diff_diff/twfe.py index 0b49266a..0d581589 100644 --- a/diff_diff/twfe.py +++ b/diff_diff/twfe.py @@ -127,6 +127,16 @@ def fit( # type: ignore[override] resolved_survey, survey_weights, survey_weight_type, survey_metadata = ( _resolve_survey_for_fit(survey_design, data, self.inference) ) + # Reject replicate-weight designs — TWFE within-transformation must + # be recomputed per replicate (same reason as SunAbraham rejection) + if resolved_survey is not None and resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "TwoWayFixedEffects does not yet support replicate-weight " + "survey designs. The weighted within-transformation must be " + "recomputed for each replicate. Use CallawaySantAnna or " + "TripleDifference for replicate-weight inference, or use a " + "TSL-based survey design (strata/psu/fpc)." + ) # Use unit-level clustering if not specified (use local variable to avoid mutation) cluster_var = self.cluster if self.cluster is not None else unit diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py index bfe7b4fd..cd4baa4e 100644 --- a/diff_diff/two_stage.py +++ b/diff_diff/two_stage.py @@ -242,6 +242,11 @@ def fit( # Validate within-unit constancy for panel survey designs if resolved_survey is not None: + if resolved_survey.uses_replicate_variance: + raise NotImplementedError( + "TwoStageDiD does not yet support replicate-weight survey " + "designs. Use a TSL-based survey design (strata/psu/fpc)." + ) _validate_unit_constant_survey(data, unit, survey_design) if resolved_survey.weight_type != "pweight": raise ValueError( diff --git a/diff_diff/two_stage_results.py b/diff_diff/two_stage_results.py index b06cd000..bb064258 100644 --- a/diff_diff/two_stage_results.py +++ b/diff_diff/two_stage_results.py @@ -11,7 +11,7 @@ import numpy as np import pandas as pd -from diff_diff.results import _get_significance_stars +from diff_diff.results import _format_survey_block, _get_significance_stars __all__ = [ "TwoStageBootstrapResults", @@ -185,23 +185,7 @@ def summary(self, alpha: Optional[float] = None) -> str: # Survey design info if self.survey_metadata is not None: sm = self.survey_metadata - lines.extend( - [ - "-" * 85, - "Survey Design".center(85), - "-" * 85, - f"{'Weight type:':<30} {sm.weight_type:>10}", - ] - ) - if sm.n_strata is not None: - lines.append(f"{'Strata:':<30} {sm.n_strata:>10}") - if sm.n_psu is not None: - lines.append(f"{'PSU/Cluster:':<30} {sm.n_psu:>10}") - lines.append(f"{'Effective sample size:':<30} {sm.effective_n:>10.1f}") - lines.append(f"{'Design effect (DEFF):':<30} {sm.design_effect:>10.2f}") - if sm.df_survey is not None: - lines.append(f"{'Survey d.f.:':<30} {sm.df_survey:>10}") - lines.extend(["-" * 85, ""]) + lines.extend(_format_survey_block(sm, 85)) # Overall ATT lines.extend( diff --git a/diff_diff/utils.py b/diff_diff/utils.py index aeef49a9..c252c418 100644 --- a/diff_diff/utils.py +++ b/diff_diff/utils.py @@ -177,6 +177,9 @@ def safe_inference(effect, se, alpha=0.05, df=None): """ if not (np.isfinite(se) and se > 0): return np.nan, np.nan, (np.nan, np.nan) + if df is not None and df <= 0: + # Undefined degrees of freedom (e.g., rank-deficient replicate design) + return np.nan, np.nan, (np.nan, np.nan) t_stat = effect / se p_value = compute_p_value(t_stat, df=df) conf_int = compute_confidence_interval(effect, se, alpha, df=df) @@ -213,6 +216,10 @@ def safe_inference_batch(effects, ses, alpha=0.05, df=None): ci_lowers = np.full(n, np.nan) ci_uppers = np.full(n, np.nan) + # Undefined df (e.g., rank-deficient replicate design) → all NaN + if df is not None and df <= 0: + return t_stats, p_values, ci_lowers, ci_uppers + valid = np.isfinite(ses) & (ses > 0) if not np.any(valid): return t_stats, p_values, ci_lowers, ci_uppers diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index e2615274..076947ff 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -1907,9 +1907,9 @@ unequal selection probabilities). - **Note:** pweight HC1 meat uses score outer products (Σ s_i s_i' where s_i = w_i x_i u_i), giving w² in the meat. fweight HC1 meat uses X'diag(w u²)X (one power of w), matching frequency-expanded HC1. -- **Note:** fweights must be positive integers; fractional values are - rejected by `_validate_weights()`. This matches Stata's convention and - avoids ambiguous frequency semantics. +- **Note:** fweights must be non-negative integers; fractional values are + rejected by `_validate_weights()`. All-zero vectors rejected at solver + level. This matches Stata's convention. ### Absorbed Fixed Effects with Survey Weights @@ -1993,10 +1993,128 @@ ContinuousDiD, EfficientDiD): tau values once and only varies the ATT aggregation weights across draws. This is mathematically equivalent to refitting per draw and avoids redundant computation. +### Replicate Weight Variance (Phase 6) + +Alternative to TSL: re-run WLS for each replicate weight column and compute +variance from the distribution of replicate estimates. + +- **Reference**: Wolter (2007) "Introduction to Variance Estimation", 2nd ed. + Rao & Wu (1988). +- **Supported methods**: BRR, Fay's BRR, JK1, JKn +- **Formulas**: + - BRR: `V = (1/R) * sum_r (theta_r - theta)^2` + - Fay: `V = 1/(R*(1-rho)^2) * sum_r (theta_r - theta)^2` + - JK1: `V = (R-1)/R * sum_r (theta_r - theta)^2` + - JKn: `V = sum_h ((n_h-1)/n_h) * sum_{r in h} (theta_r - theta)^2` +- **IF-based replicate variance**: For influence-function estimators (CS + aggregation, ContinuousDiD, EfficientDiD, TripleDifference), replicate + contrasts are formed via weight-ratio rescaling: + `theta_r = sum((w_r/w_full) * psi)` when `combined_weights=True`, + `theta_r = sum(w_r * psi)` when `combined_weights=False`. +- **Survey df**: QR-rank of the analysis-weight matrix minus 1, + matching R's `survey::degf()` which uses `qr(..., tol=1e-5)$rank`. + For `combined_weights=True` (default), analysis weights are the raw + replicate columns. For `combined_weights=False`, analysis weights are + `replicate_weights * full_sample_weights`. Returns `None` (undefined) + when rank <= 1, yielding NaN inference. Replaces `n_PSU - n_strata`. +- **Mutual exclusion**: Replicate weights cannot be combined with + strata/psu/fpc (the replicates encode design structure implicitly) +- **Design parameters** (matching R `svrepdesign()`): + - `combined_weights` (default True): replicate columns include full-sample + weight. If False, replicate columns are perturbation factors multiplied + by full-sample weight before WLS. + - `replicate_scale`: overall variance multiplier, applied multiplicatively + with `replicate_rscales` when both are provided (`scale * rscales`) + - `replicate_rscales`: per-replicate scaling factors (vector of length R). + BRR and Fay ignore custom `replicate_scale`/`replicate_rscales` with a + warning (fixed scaling by design); JK1/JKn allow overrides. + - `mse` (default False, matching R's `survey::svrepdesign()`): if True, + center variance on full-sample estimate; if False, center on mean of + replicate estimates. When `replicate_rscales` contains zero entries + and `mse=False`, centering excludes zero-scaled replicates, matching + R's `survey::svrVar()` convention. +- **Note:** Replicate columns are NOT normalized — raw values are preserved + to maintain correct weight ratios in the IF path. +- **Note:** JKn requires explicit `replicate_strata` (per-replicate stratum + assignment). Auto-derivation from weight patterns is not supported. +- **Note:** Invalid replicate solves (singular/degenerate) are dropped with + a warning. Variance is computed from valid replicates only. Fewer than 2 + valid replicates returns NaN variance. The variance scaling factor + (e.g., `1/R` for BRR, `(R-1)/R` for JK1) uses the original design's `R`, + not the valid count — matching R's `survey` package convention where the + design structure is fixed and dropped replicates contribute zero to the + sum without changing the scale. Survey df uses `n_valid - 1` for + t-based inference. +- **Note:** Replicate-weight support matrix: + - **Supported**: CallawaySantAnna (reg/ipw/dr without covariates, no + bootstrap), ContinuousDiD + (no bootstrap), EfficientDiD (no bootstrap), TripleDifference (all + methods), LinearRegression (OLS path) + - **Rejected with NotImplementedError**: SunAbraham, TwoWayFixedEffects + (within-transformation must be recomputed per replicate), + DifferenceInDifferences, MultiPeriodDiD, StackedDiD (use + compute_survey_vcov directly), ImputationDiD, TwoStageDiD (custom + variance), SyntheticDiD, TROP (bootstrap-based variance), + BaconDecomposition (diagnostic only) + - CS/ContinuousDiD/EfficientDiD reject replicate + `n_bootstrap > 0` + (replicate weights provide analytical variance) +- **Note:** When invalid replicates are dropped in `compute_replicate_vcov` + (OLS path), `n_valid` is returned and used for `df_survey = n_valid - 1` + in `LinearRegression.fit()`. For IF-based replicate paths, replicates + essentially never fail (weighted sums cannot be singular), so `n_valid` + equals `R` in practice and df propagation is not needed. + +### DEFF Diagnostics (Phase 6) + +Per-coefficient design effect comparing survey variance to SRS variance. + +- **Reference**: Kish (1965) "Survey Sampling", Wiley. Chapter 8. +- **Formula**: `DEFF_k = Var_survey(beta_k) / Var_SRS(beta_k)` where + SRS baseline uses HC1 sandwich ignoring design structure +- **Effective n**: `n_eff_k = n / DEFF_k` +- **Display**: Existing weight-based DEFF labeled "Kish DEFF (weights)"; + per-coefficient DEFF available via `compute_deff_diagnostics()` or + `LinearRegression.compute_deff()` post-fit +- **Note:** Opt-in computation — not run automatically. Users call standalone + function or post-fit method when diagnostics are needed. + +### Subpopulation Analysis (Phase 6) + +Domain estimation preserving full design structure. + +- **Reference**: Lumley (2004) Section 3.4. Stata `svy: subpop`. +- **Method**: `SurveyDesign.subpopulation(data, mask)` zeros out weights for + excluded observations while retaining strata/PSU layout for correct + variance estimation +- **Note:** Unlike naive subsetting, subpopulation analysis preserves design + information (PSU structure, strata counts) that would be lost by dropping + observations. This is the methodologically correct approach for domain + estimation under complex survey designs. +- **Note:** Weight validation relaxed from "strictly positive" to + "non-negative" to support zero-weight observations. Negative weights + still rejected. All-zero weight vectors rejected at solver level. +- **Note:** Survey design df (`n_PSU - n_strata`) uses total design + structure (including zero-weight rows), matching R's `survey::degf()` + convention after `subset()`. The generic HC1/classical inference paths + use positive-weight count for df adjustments, ensuring zero-weight + padding is inference-invariant outside the survey vcov path. DEFF + effective-n also uses positive-weight count. +- **Note:** For replicate-weight designs, `subpopulation()` zeros out both + full-sample and replicate weight columns for excluded observations, + preserving all replicate metadata. +- **Note:** Defensive enhancement: ContinuousDiD and TripleDifference + validate the positive-weight effective sample size before WLS cell fits. + After `subpopulation()` zeroes weights, raw row counts may exceed the + regression rank requirement while the weighted effective sample does not. + Underidentified cells are skipped (ContinuousDiD) or fall back to + weighted means (TripleDifference). + --- # Version History +- **v1.3** (2026-03-26): Added Replicate Weight Variance, DEFF Diagnostics, + and Subpopulation Analysis sections (Phase 6 completion) - **v1.2** (2026-03-24): Added Survey-Aware Bootstrap section (Phase 6) - **v1.1** (2026-03-20): Added Survey Data Support section - **v1.0** (2025-01-19): Initial registry with 12 estimators diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md index ab95fac0..a521d4ef 100644 --- a/docs/survey-roadmap.md +++ b/docs/survey-roadmap.md @@ -1,7 +1,7 @@ # Survey Data Support Roadmap -This document captures planned future work for survey data support in diff-diff. -Phases 1-4 are implemented. Phase 5 is deferred for future PRs. +This document captures the survey data support roadmap for diff-diff. +All phases (1-6) are implemented. ## Implemented (Phases 1-2) @@ -46,7 +46,7 @@ message pointing to the planned phase or describing the limitation. |-----------|------|----------------|-------| | ImputationDiD | `imputation.py` | Analytical | Weighted iterative FE, weighted ATT aggregation, weighted conservative variance (Theorem 3); bootstrap+survey deferred | | TwoStageDiD | `two_stage.py` | Analytical | Weighted iterative FE, weighted Stage 2 OLS, weighted GMM sandwich variance; bootstrap+survey deferred | -| CallawaySantAnna | `staggered.py` | Weights-only | Weights-only SurveyDesign (strata/PSU/FPC rejected); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; full design SEs, covariates+IPW/DR, and bootstrap+survey deferred | +| CallawaySantAnna | `staggered.py` | Full | Full SurveyDesign (strata/PSU/FPC/replicate weights); reg supports covariates, IPW/DR no-covariate only; survey-weighted WIF in aggregation; replicate IF variance for analytical SEs | **Infrastructure**: Weighted `solve_logit()` added to `linalg.py` — survey weights enter the IRLS working weights as `w_survey * mu * (1 - mu)`. This also unblocked @@ -90,16 +90,33 @@ Survey-aware bootstrap for all 8 bootstrap-using estimators. Two strategies: - **TROP**: cross-classified pseudo-strata (survey_stratum × treatment_group). - **Rust TROP**: pweight-only in Rust; full design falls to Python path. -### Replicate Weight Variance +### Replicate Weight Variance ✅ (2026-03-26) Re-run WLS for each replicate weight column, compute variance from distribution -of estimates. Supports BRR, Fay's BRR, JK1, JKn, bootstrap replicate methods. -Add `replicate_weights`, `replicate_type`, `replicate_rho` fields to SurveyDesign. - -### DEFF Diagnostics -Compare survey vcov to SRS vcov element-wise. Report design effect per -coefficient. Effective n = n / DEFF. - -### Subpopulation Analysis +of estimates. Supports BRR, Fay's BRR, JK1, JKn methods. +JKn requires explicit `replicate_strata` (per-replicate stratum assignment). +- `replicate_weights`, `replicate_method`, `fay_rho` fields on SurveyDesign +- `compute_replicate_vcov()` for OLS-based estimators (re-runs WLS per replicate) +- `compute_replicate_if_variance()` for IF-based estimators (reweights IF) +- Dispatch in `LinearRegression.fit()` and `staggered_aggregation.py` +- Replicate weights mutually exclusive with strata/PSU/FPC +- Survey df = rank(replicate_weights) - 1, matching R's `survey::degf()` +- **Limitations**: Supported in CallawaySantAnna, ContinuousDiD, EfficientDiD, + TripleDifference (analytical only, no bootstrap). Rejected with + `NotImplementedError` in DifferenceInDifferences, TwoWayFixedEffects, + MultiPeriodDiD, StackedDiD, SunAbraham, ImputationDiD, TwoStageDiD, + SyntheticDiD, TROP. + +### DEFF Diagnostics ✅ (2026-03-26) +Per-coefficient design effects comparing survey vcov to SRS (HC1) vcov. +- `DEFFDiagnostics` dataclass with per-coefficient DEFF, effective n, SRS/survey SE +- `compute_deff_diagnostics()` standalone function +- `LinearRegression.compute_deff()` post-fit method +- Display label "Kish DEFF (weights)" for existing weight-based DEFF + +### Subpopulation Analysis ✅ (2026-03-26) `SurveyDesign.subpopulation(data, mask)` — zero-out weights for excluded observations while preserving the full design structure for correct variance estimation (unlike simple subsetting, which would drop design information). +- Mask: bool array/Series, column name, or callable +- Returns (SurveyDesign, DataFrame) pair with synthetic `_subpop_weight` column +- Weight validation relaxed: zero weights allowed (negative still rejected) diff --git a/tests/test_continuous_did.py b/tests/test_continuous_did.py index a123361e..c519c3bb 100644 --- a/tests/test_continuous_did.py +++ b/tests/test_continuous_did.py @@ -801,6 +801,44 @@ def test_anticipation_event_study(self): ) assert np.isfinite(results.event_study_effects[-1]["effect"]) + def test_anticipation_event_study_excludes_contaminated_periods(self): + """With anticipation=2, event study should not contain e < -2.""" + rng = np.random.default_rng(42) + n_per_group = 30 + periods = list(range(1, 9)) # 8 periods + + rows = [] + # Never-treated + for i in range(n_per_group): + for t in periods: + rows.append({ + "unit": i, "period": t, "first_treat": 0, + "dose": 0.0, "outcome": rng.normal(0, 0.5), + }) + # Cohort g=5 — treatment at t=5, anticipation=2 means post at t>=3 + for i in range(n_per_group): + uid = n_per_group + i + d = rng.uniform(0.5, 2.0) + for t in periods: + y = rng.normal(0, 0.5) + (2.0 * d if t >= 5 else 0) + rows.append({ + "unit": uid, "period": t, "first_treat": 5, + "dose": d, "outcome": y, + }) + + data = pd.DataFrame(rows) + est = ContinuousDiD(anticipation=2, n_bootstrap=0) + results = est.fit( + data, "outcome", "unit", "period", "first_treat", "dose", + aggregate="eventstudy", + ) + assert results.event_study_effects is not None + for e in results.event_study_effects.keys(): + assert e >= -2, ( + f"Found relative period e={e} with anticipation=2; " + f"expected e >= -2" + ) + def test_anticipation_not_yet_treated_excludes_anticipation_window(self): """Not-yet-treated controls must exclude cohorts in the anticipation window. diff --git a/tests/test_efficient_did.py b/tests/test_efficient_did.py index 76c8e282..951c46f5 100644 --- a/tests/test_efficient_did.py +++ b/tests/test_efficient_did.py @@ -705,6 +705,52 @@ def test_hausman_clustered(self): assert pretest.recommendation in ("pt_all", "pt_post", "inconclusive") assert pretest.df >= 0 + def test_hausman_clustered_stale_ncl_after_nan_filtering(self): + """After filtering NaN EIF rows, n_cl must be recomputed. + + If entire clusters are dropped by the row_finite filter, the original + n_cl overcounts clusters, inflating variance via the (n_cl / (n_cl-1)) + correction and phantom zero rows in _cluster_aggregate. + """ + rng = np.random.default_rng(99) + n_clusters = 10 + units_per_cluster = 8 + n_units = n_clusters * units_per_cluster + n_periods = 7 + + cluster_ids = np.repeat(np.arange(n_clusters), units_per_cluster) + units = np.repeat(np.arange(n_units), n_periods) + times = np.tile(np.arange(1, n_periods + 1), n_units) + ft = np.full(n_units, np.inf) + # Two small treatment cohorts + ft[:8] = 3 # cluster 0 only + ft[8:24] = 5 # clusters 1-2 + + ft_col = np.repeat(ft, n_periods) + unit_fe = np.repeat(rng.normal(0, 0.3, n_units), n_periods) + eps = rng.normal(0, 0.3, len(units)) + tau = np.where((ft_col < np.inf) & (times >= ft_col), 2.0, 0.0) + y = unit_fe + tau + eps + + df = pd.DataFrame( + { + "unit": units, + "time": times, + "first_treat": ft_col, + "y": y, + "cluster_id": np.repeat(cluster_ids, n_periods), + } + ) + + pretest = EfficientDiD.hausman_pretest( + df, "y", "unit", "time", "first_treat", cluster="cluster_id" + ) + assert pretest.recommendation in ("pt_all", "pt_post", "inconclusive") + # Key assertion: if the statistic is finite, df should reflect the + # actual number of clusters remaining after NaN filtering + if np.isfinite(pretest.statistic): + assert pretest.df > 0 + def test_hausman_last_cohort(self): """Hausman pretest on all-treated panel with last_cohort control.""" df = _make_staggered_panel( diff --git a/tests/test_survey.py b/tests/test_survey.py index f1093064..534379e0 100644 --- a/tests/test_survey.py +++ b/tests/test_survey.py @@ -692,7 +692,7 @@ def test_summary_output(self, survey_2x2_data): assert "Strata:" in summary_text assert "PSU/Cluster:" in summary_text assert "Effective sample size:" in summary_text - assert "Design effect (DEFF):" in summary_text + assert "Kish DEFF (weights):" in summary_text def test_to_dict_survey_fields(self, survey_2x2_data): """Survey metadata fields appear in to_dict().""" @@ -756,11 +756,18 @@ def test_survey_design_validation_missing_column(self): with pytest.raises(ValueError, match="not found"): sd.resolve(df) - def test_survey_design_validation_zero_weights(self): - """Zero weights raise ValueError.""" + def test_survey_design_validation_zero_weights_allowed(self): + """Zero weights are allowed (for subpopulation analysis).""" df = pd.DataFrame({"y": [1, 2, 3], "w": [1.0, 0.0, 1.0]}) sd = SurveyDesign(weights="w") - with pytest.raises(ValueError, match="strictly positive"): + resolved = sd.resolve(df) + assert resolved.weights[1] == 0.0 + + def test_survey_design_validation_all_zero_weights_rejected(self): + """All-zero weights raise ValueError.""" + df = pd.DataFrame({"y": [1, 2, 3], "w": [0.0, 0.0, 0.0]}) + sd = SurveyDesign(weights="w") + with pytest.raises(ValueError, match="All weights are zero"): sd.resolve(df) def test_survey_design_validation_nan_weights(self): @@ -947,7 +954,7 @@ def test_fweight_error_for_fractional(self): } ) sd = SurveyDesign(weights="w", weight_type="fweight") - with pytest.raises(ValueError, match="Frequency weights.*must be positive integers"): + with pytest.raises(ValueError, match="Frequency weights.*must be non-negative integers"): sd.resolve(df) def test_lonely_psu_remove_warning(self): @@ -2360,7 +2367,7 @@ def test_fractional_fweight_rejected_solve_ols(self): y = np.random.randn(n) w = np.array([1.5, 2.3, 1.0, 2.0, 1.7, 3.0, 1.0, 2.0, 1.0, 1.0]) - with pytest.raises(ValueError, match="positive integers"): + with pytest.raises(ValueError, match="non-negative integers"): solve_ols(X, y, weights=w, weight_type="fweight") def test_fractional_fweight_rejected_compute_robust_vcov(self): @@ -2371,7 +2378,7 @@ def test_fractional_fweight_rejected_compute_robust_vcov(self): resid = np.random.randn(n) w = np.array([1.5, 2.0, 1.0, 2.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0]) - with pytest.raises(ValueError, match="positive integers"): + with pytest.raises(ValueError, match="non-negative integers"): compute_robust_vcov(X, resid, weights=w, weight_type="fweight") def test_integer_fweight_accepted(self): diff --git a/tests/test_survey_phase3.py b/tests/test_survey_phase3.py index 830561af..f43c4968 100644 --- a/tests/test_survey_phase3.py +++ b/tests/test_survey_phase3.py @@ -1195,3 +1195,83 @@ def test_continuous_did_dose_response_survey_pvalue(self, continuous_survey_data if np.isfinite(row["effect"]) and np.isfinite(row["se"]) and row["se"] > 0: _, expected_p, _ = safe_inference(row["effect"], row["se"], df=sm.df_survey) assert row["p_value"] == pytest.approx(expected_p, rel=1e-10) + + +# ============================================================================= +# Survey Edge Case Coverage Gaps +# ============================================================================= + + +class TestSurveyEdgeCases: + """Tests for FPC census, single-PSU NaN, and full-design bootstrap.""" + + def test_fpc_census_zero_variance_stacked(self, staggered_survey_data): + """When FPC == n_PSU (full census), sampling variance should be ~0.""" + from diff_diff import StackedDiD + + data = staggered_survey_data.copy() + # Set FPC = n_PSU per stratum (census) + for h in data["stratum"].unique(): + mask = data["stratum"] == h + n_psu_h = data.loc[mask, "psu"].nunique() + data.loc[mask, "fpc"] = float(n_psu_h) + + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", fpc="fpc", + nest=True, + ) + result = StackedDiD().fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + assert result.overall_se == pytest.approx(0.0, abs=1e-10) + + def test_single_psu_lonely_remove_nan_se(self, staggered_survey_data): + """All strata with 1 PSU + lonely_psu='remove' -> NaN SE.""" + from diff_diff import SunAbraham + + data = staggered_survey_data.copy() + # Force all units into a single PSU per stratum + data["single_psu"] = data["stratum"] + + sd = SurveyDesign( + weights="weight", strata="stratum", psu="single_psu", + lonely_psu="remove", + ) + with pytest.warns(UserWarning, match="only 1 PSU"): + result = SunAbraham().fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + # All strata removed -> NaN SE + assert np.isnan(result.overall_se) + + def test_full_design_bootstrap_continuous_did(self, staggered_survey_data): + """Bootstrap with strata survey design works for ContinuousDiD.""" + from diff_diff import ContinuousDiD + + data = staggered_survey_data.copy() + # Add dose column + data["dose"] = np.where(data["first_treat"] > 0, 1.0 + 0.5 * data["unit"] % 3, 0.0) + + sd = SurveyDesign(weights="weight", strata="stratum") + result = ContinuousDiD(n_bootstrap=30, seed=42).fit( + data, "outcome", "unit", "time", "first_treat", "dose", + survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_att_se) + assert result.survey_metadata is not None + + def test_full_design_bootstrap_efficient_did(self, staggered_survey_data): + """Bootstrap with strata survey design works for EfficientDiD.""" + from diff_diff import EfficientDiD + + sd = SurveyDesign(weights="weight", strata="stratum") + result = EfficientDiD(n_bootstrap=30, seed=42).fit( + staggered_survey_data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_se) + assert result.survey_metadata is not None diff --git a/tests/test_survey_phase4.py b/tests/test_survey_phase4.py index a4244b88..f4781b69 100644 --- a/tests/test_survey_phase4.py +++ b/tests/test_survey_phase4.py @@ -202,7 +202,7 @@ def test_negative_weights_raises(self): y = (X @ [0.5, -0.5] + rng.randn(n) > 0).astype(float) weights = np.ones(n) weights[0] = -1.0 - with pytest.raises(ValueError, match="strictly positive"): + with pytest.raises(ValueError, match="non-negative"): solve_logit(X, y, weights=weights) def test_wrong_shape_weights_raises(self): @@ -1509,3 +1509,23 @@ def test_weight_scale_invariance_callaway_santanna(self, staggered_survey_data): assert abs(r1.overall_att - r2.overall_att) < 1e-10 assert abs(r1.overall_se - r2.overall_se) < 1e-8 + + +class TestCallawaySantAnnaFullDesignBootstrap: + """Test full-design bootstrap for CS with strata+PSU+FPC.""" + + def test_bootstrap_full_design_cs(self, staggered_survey_data): + """Bootstrap + full survey (strata+PSU+FPC) works for CS.""" + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", fpc="fpc", + ) + result = CallawaySantAnna( + estimation_method="reg", n_bootstrap=30, seed=42, + ).fit( + staggered_survey_data, "outcome", "unit", "period", + "first_treat", survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_se) + assert result.survey_metadata is not None + assert result.survey_metadata.n_strata is not None diff --git a/tests/test_survey_phase5.py b/tests/test_survey_phase5.py index fcf7e6e0..3cf80968 100644 --- a/tests/test_survey_phase5.py +++ b/tests/test_survey_phase5.py @@ -852,3 +852,38 @@ def test_sdid_to_dict_schema_matches_did(self): "df_survey", ]: assert key in d, f"Missing key: {key}" + + +class TestTROPRaoWuEquivalence: + """Test Rao-Wu vs block bootstrap equivalence under degenerate design.""" + + def test_rao_wu_approximates_block_no_strata(self, trop_survey_data): + """Without real strata variation, Rao-Wu SE ~ block bootstrap SE.""" + from diff_diff import TROP + + data = trop_survey_data.copy() + # Single stratum, PSU = unit (effectively block bootstrap) + data["single_stratum"] = 0 + data["unit_psu"] = data["unit"] + + sd_rw = SurveyDesign( + weights="weight", strata="single_stratum", psu="unit_psu", + ) + sd_block = SurveyDesign(weights="weight") + + result_rw = TROP(method="local", n_bootstrap=99, seed=42, max_iter=5).fit( + data, "outcome", "D", "unit", "time", survey_design=sd_rw, + ) + result_block = TROP(method="local", n_bootstrap=99, seed=42, max_iter=5).fit( + data, "outcome", "D", "unit", "time", survey_design=sd_block, + ) + + # Point estimates identical (same weights) + assert result_rw.att == pytest.approx(result_block.att, abs=1e-10) + # SEs should be within factor of 2 + if result_block.se > 0: + ratio = result_rw.se / result_block.se + assert 0.5 < ratio < 2.0, ( + f"Rao-Wu SE ({result_rw.se:.4f}) and block SE " + f"({result_block.se:.4f}) differ by > 2x (ratio={ratio:.2f})" + ) diff --git a/tests/test_survey_phase6.py b/tests/test_survey_phase6.py new file mode 100644 index 00000000..bb54ada3 --- /dev/null +++ b/tests/test_survey_phase6.py @@ -0,0 +1,1697 @@ +"""Tests for Phase 6 survey support: Subpopulation, DEFF, Replicate Weights.""" + +import numpy as np +import pandas as pd +import pytest + +from diff_diff import ( + DEFFDiagnostics, + DifferenceInDifferences, + SurveyDesign, + compute_deff_diagnostics, +) + + +# ============================================================================= +# Shared Fixtures +# ============================================================================= + + +@pytest.fixture +def basic_did_data(): + """Simple 2x2 DiD panel with survey design columns.""" + np.random.seed(42) + n_units = 80 + n_periods = 4 + rows = [] + for unit in range(n_units): + treated = unit < 40 + ft = 3 if treated else 0 + stratum = unit // 20 # 4 strata + psu = unit // 10 # 8 PSUs + wt = 1.0 + 0.5 * (unit % 5) + + for t in range(1, n_periods + 1): + y = 10.0 + unit * 0.05 + t * 0.3 + if treated and t >= 3: + y += 2.0 + y += np.random.normal(0, 0.5) + rows.append({ + "unit": unit, "time": t, "first_treat": ft, + "outcome": y, "stratum": stratum, "psu": psu, + "weight": wt, "fpc": 200.0, + }) + return pd.DataFrame(rows) + + +# ============================================================================= +# Subpopulation Analysis +# ============================================================================= + + +class TestSubpopulationAnalysis: + """Tests for SurveyDesign.subpopulation().""" + + def test_basic_subpopulation(self, basic_did_data): + """Subpopulation returns a new SurveyDesign and DataFrame.""" + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + mask = basic_did_data["unit"] < 60 # Exclude last 20 units + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + + assert new_sd.weights == "_subpop_weight" + assert new_sd.strata == "stratum" + assert new_sd.psu == "psu" + assert "_subpop_weight" in new_data.columns + + # Excluded obs have zero weight + excluded = new_data[~mask.values] + assert (excluded["_subpop_weight"] == 0.0).all() + + # Included obs have original weights + included = new_data[mask.values] + np.testing.assert_array_equal( + included["_subpop_weight"].values, + basic_did_data.loc[mask.values, "weight"].values, + ) + + def test_subpopulation_preserves_design_structure(self, basic_did_data): + """PSU/strata structure is preserved even when some PSUs are zeroed.""" + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", nest=True, + ) + # Exclude all units in stratum 0 + mask = basic_did_data["stratum"] != 0 + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + + # Should resolve without error — stratum 0 PSUs still exist in design + resolved = new_sd.resolve(new_data) + assert resolved.n_strata == 4 # All strata still present + assert resolved.n_psu == 8 # All PSUs still present + + def test_subpopulation_with_did(self, basic_did_data): + """Subpopulation works end-to-end with DifferenceInDifferences.""" + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + mask = basic_did_data["unit"] < 60 + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + + # Add treated/post columns for the estimator + new_data["treated"] = (new_data["first_treat"] > 0).astype(int) + new_data["post"] = (new_data["time"] >= 3).astype(int) + est = DifferenceInDifferences() + result = est.fit( + new_data, outcome="outcome", treatment="treated", time="post", + survey_design=new_sd, + ) + assert np.isfinite(result.att) + assert np.isfinite(result.se) + assert result.survey_metadata is not None + + def test_subpopulation_preserves_more_psus_than_subset(self, basic_did_data): + """Subpopulation retains all PSUs in the design; subset drops them.""" + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", nest=True, + ) + mask = basic_did_data["unit"] < 60 + + # Subpopulation: preserves full design + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + resolved_subpop = new_sd.resolve(new_data) + + # Naive subset: loses PSUs in excluded observations + subset = basic_did_data[mask.values].copy() + resolved_subset = sd.resolve(subset) + + # Subpopulation preserves the full PSU count + assert resolved_subpop.n_psu >= resolved_subset.n_psu + + def test_subpopulation_mask_as_column_name(self, basic_did_data): + """Mask can be a column name string.""" + sd = SurveyDesign(weights="weight") + data = basic_did_data.copy() + data["include"] = data["unit"] < 60 + new_sd, new_data = sd.subpopulation(data, "include") + assert (new_data.loc[~data["include"], "_subpop_weight"] == 0.0).all() + + def test_subpopulation_mask_as_callable(self, basic_did_data): + """Mask can be a callable.""" + sd = SurveyDesign(weights="weight") + new_sd, new_data = sd.subpopulation( + basic_did_data, lambda df: df["unit"] < 60, + ) + excluded = new_data[basic_did_data["unit"] >= 60] + assert (excluded["_subpop_weight"] == 0.0).all() + + def test_subpopulation_no_weights_uses_uniform(self, basic_did_data): + """When SurveyDesign has no weights, subpopulation uses 1.0.""" + sd = SurveyDesign(strata="stratum") + mask = basic_did_data["unit"] < 60 + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + + included = new_data[mask.values] + assert (included["_subpop_weight"] == 1.0).all() + + def test_subpopulation_all_excluded_raises(self, basic_did_data): + """Excluding all obs should raise ValueError.""" + sd = SurveyDesign(weights="weight") + mask = np.zeros(len(basic_did_data), dtype=bool) + with pytest.raises(ValueError, match="excludes all"): + sd.subpopulation(basic_did_data, mask) + + def test_subpopulation_none_excluded_identity(self, basic_did_data): + """Including all obs should return weights unchanged.""" + sd = SurveyDesign(weights="weight") + mask = np.ones(len(basic_did_data), dtype=bool) + new_sd, new_data = sd.subpopulation(basic_did_data, mask) + np.testing.assert_array_equal( + new_data["_subpop_weight"].values, + basic_did_data["weight"].values, + ) + + def test_subpopulation_mask_length_mismatch_raises(self, basic_did_data): + """Mask length must match data length.""" + sd = SurveyDesign(weights="weight") + mask = np.ones(10, dtype=bool) + with pytest.raises(ValueError, match="length"): + sd.subpopulation(basic_did_data, mask) + + def test_zero_weights_allowed_in_resolve(self): + """resolve() should accept zero weights (no longer strictly positive).""" + data = pd.DataFrame({ + "w": [1.0, 2.0, 0.0, 1.5], + "y": [1, 2, 3, 4], + }) + sd = SurveyDesign(weights="w") + resolved = sd.resolve(data) + assert resolved.weights[2] == 0.0 # Zero preserved after normalization + + def test_negative_weights_rejected(self): + """Negative weights should still be rejected.""" + data = pd.DataFrame({"w": [1.0, -1.0, 2.0], "y": [1, 2, 3]}) + sd = SurveyDesign(weights="w") + with pytest.raises(ValueError, match="non-negative"): + sd.resolve(data) + + +# ============================================================================= +# DEFF Diagnostics +# ============================================================================= + + +class TestDEFFDiagnostics: + """Tests for per-coefficient design effect diagnostics.""" + + @staticmethod + def _fit_lr_with_survey(data, sd): + """Fit a LinearRegression with survey design for testing DEFF.""" + from diff_diff.linalg import LinearRegression + from diff_diff.survey import _resolve_survey_for_fit + + resolved, sw, swt, sm = _resolve_survey_for_fit(sd, data, "analytical") + y = data["outcome"].values + X = np.column_stack([ + np.ones(len(data)), + data["treated"].values, + data["post"].values, + (data["treated"] * data["post"]).values, + ]) + reg = LinearRegression( + include_intercept=False, weights=sw, + weight_type=swt, survey_design=resolved, + ) + reg.fit(X, y) + return reg + + def test_uniform_weights_deff_near_one(self, basic_did_data): + """With uniform weights and no clustering, DEFF should be ~1.""" + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + data["uniform_wt"] = 1.0 + + sd = SurveyDesign(weights="uniform_wt") + reg = self._fit_lr_with_survey(data, sd) + deff = reg.compute_deff() + assert isinstance(deff, DEFFDiagnostics) + # With uniform weights, DEFF should be close to 1 + for d in deff.deff: + if np.isfinite(d): + assert d == pytest.approx(1.0, abs=0.15) + + def test_clustered_design_deff_gt_one(self, basic_did_data): + """With clustered design, DEFF should be > 1 for at least some coefs.""" + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + + sd = SurveyDesign( + weights="weight", strata="stratum", psu="psu", nest=True, + ) + reg = self._fit_lr_with_survey(data, sd) + deff = reg.compute_deff() + # At least one coefficient should have DEFF > 1 (design effect) + assert np.any(deff.deff[np.isfinite(deff.deff)] > 1.0) + + def test_deff_srs_se_matches_robust_vcov(self, basic_did_data): + """SRS SE in DEFF diagnostics should match compute_robust_vcov.""" + from diff_diff.linalg import compute_robust_vcov + + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + reg = self._fit_lr_with_survey(data, sd) + deff = reg.compute_deff() + + # Manually compute SRS SE + srs_vcov = compute_robust_vcov( + reg._X, reg.residuals_, + cluster_ids=None, weights=reg.weights, + weight_type=reg.weight_type, + ) + expected_srs_se = np.sqrt(np.diag(srs_vcov)) + np.testing.assert_allclose(deff.srs_se, expected_srs_se, rtol=1e-10) + + def test_standalone_compute_deff_diagnostics(self, basic_did_data): + """compute_deff_diagnostics() works as a standalone function.""" + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + reg = self._fit_lr_with_survey(data, sd) + deff = compute_deff_diagnostics( + reg._X, reg.residuals_, reg.vcov_, + reg.weights, weight_type=reg.weight_type, + coefficient_names=["intercept", "treated", "post", "treated:post"], + ) + assert deff.coefficient_names is not None + assert len(deff.coefficient_names) == 4 + assert len(deff.deff) == 4 + + def test_deff_effective_n(self, basic_did_data): + """effective_n should be n / DEFF.""" + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + + sd = SurveyDesign(weights="weight", strata="stratum", psu="psu") + reg = self._fit_lr_with_survey(data, sd) + deff = reg.compute_deff() + n = reg.n_obs_ + for i in range(len(deff.deff)): + if np.isfinite(deff.deff[i]) and deff.deff[i] > 0: + assert deff.effective_n[i] == pytest.approx( + n / deff.deff[i], rel=1e-10 + ) + + def test_compute_deff_no_survey_raises(self, basic_did_data): + """compute_deff() without survey should raise.""" + from diff_diff.linalg import LinearRegression + + data = basic_did_data.copy() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + + y = data["outcome"].values + X = np.column_stack([np.ones(len(data)), data["treated"].values]) + reg = LinearRegression(include_intercept=False) + reg.fit(X, y) + with pytest.raises(ValueError, match="survey design"): + reg.compute_deff() + + +# ============================================================================= +# Replicate Weight Variance +# ============================================================================= + + +@pytest.fixture +def replicate_data(): + """Data with full-sample + replicate weight columns for testing.""" + np.random.seed(123) + n = 200 + k = 20 # number of replicates + + # Simple regression: y = 1 + 2*x + eps + x = np.random.randn(n) + eps = np.random.randn(n) * 0.5 + y = 1.0 + 2.0 * x + eps + + # Full-sample weights (non-uniform) + w = 1.0 + np.random.exponential(0.5, n) + + data = pd.DataFrame({"x": x, "y": y, "weight": w}) + + # Generate JK1 replicate weights (delete-1 jackknife on PSU clusters) + # Each replicate drops one cluster and rescales remaining weights + cluster_size = n // k + for r in range(k): + w_r = w.copy() + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + w_r[start:end] = 0.0 + # Rescale remaining: multiply by k/(k-1) + w_r[w_r > 0] *= k / (k - 1) + data[f"rep_{r}"] = w_r + + return data, [f"rep_{r}" for r in range(k)] + + +class TestReplicateWeightVariance: + """Tests for replicate weight variance estimation.""" + + def test_brr_validation(self): + """BRR validation: method + weights required together.""" + with pytest.raises(ValueError, match="replicate_method must be provided"): + SurveyDesign(weights="w", replicate_weights=["r1", "r2"]) + with pytest.raises(ValueError, match="replicate_weights must be provided"): + SurveyDesign(weights="w", replicate_method="BRR") + + def test_fay_rho_validation(self): + """Fay requires rho in (0, 1).""" + with pytest.raises(ValueError, match="fay_rho"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="Fay", fay_rho=0.0, + ) + with pytest.raises(ValueError, match="fay_rho"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="BRR", fay_rho=0.5, + ) + + def test_replicate_strata_mutual_exclusion(self): + """Replicate weights cannot be combined with strata/psu/fpc.""" + with pytest.raises(ValueError, match="cannot be combined"): + SurveyDesign( + weights="w", strata="s", + replicate_weights=["r1", "r2"], replicate_method="BRR", + ) + + def test_replicate_requires_weights(self): + """Full-sample weights required alongside replicates.""" + with pytest.raises(ValueError, match="Full-sample weights"): + SurveyDesign( + replicate_weights=["r1", "r2"], replicate_method="BRR", + ) + + def test_jk1_resolve(self, replicate_data): + """JK1 replicates resolve correctly.""" + data, rep_cols = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + resolved = sd.resolve(data) + assert resolved.uses_replicate_variance + assert resolved.n_replicates == len(rep_cols) + assert resolved.replicate_weights.shape == (len(data), len(rep_cols)) + assert resolved.strata is None # Short-circuited + assert resolved.df_survey == len(rep_cols) - 1 + + def test_jk1_vcov_finite(self, replicate_data): + """JK1 replicate vcov produces finite results.""" + from diff_diff.linalg import LinearRegression + + data, rep_cols = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + resolved = sd.resolve(data) + y = data["y"].values + X = np.column_stack([np.ones(len(data)), data["x"].values]) + + reg = LinearRegression( + include_intercept=False, weights=resolved.weights, + weight_type="pweight", survey_design=resolved, + ) + reg.fit(X, y) + + assert np.all(np.isfinite(reg.coefficients_)) + assert np.all(np.isfinite(np.diag(reg.vcov_))) + # SE should be positive + se = np.sqrt(np.diag(reg.vcov_)) + assert np.all(se > 0) + + def test_brr_vcov(self, replicate_data): + """BRR uses 1/R factor.""" + from diff_diff.survey import compute_replicate_vcov + + data, rep_cols = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="BRR", + ) + resolved = sd.resolve(data) + y = data["y"].values + X = np.column_stack([np.ones(len(data)), data["x"].values]) + + from diff_diff.linalg import solve_ols + coef, _, _ = solve_ols( + X, y, weights=resolved.weights, weight_type="pweight", + ) + + vcov, _nv = compute_replicate_vcov(X, y, coef, resolved) + assert np.all(np.isfinite(np.diag(vcov))) + + def test_fay_inflates_over_brr(self, replicate_data): + """Fay's BRR with rho > 0 produces larger variance than BRR.""" + from diff_diff.survey import compute_replicate_vcov + from diff_diff.linalg import solve_ols + + data, rep_cols = replicate_data + y = data["y"].values + X = np.column_stack([np.ones(len(data)), data["x"].values]) + + sd_brr = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="BRR", + ) + resolved_brr = sd_brr.resolve(data) + coef, _, _ = solve_ols(X, y, weights=resolved_brr.weights) + vcov_brr, _nv = compute_replicate_vcov(X, y, coef, resolved_brr) + + sd_fay = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="Fay", fay_rho=0.5, + ) + resolved_fay = sd_fay.resolve(data) + vcov_fay, _nv = compute_replicate_vcov(X, y, coef, resolved_fay) + + # Fay variance = BRR variance / (1-rho)^2 > BRR variance + assert np.all(np.diag(vcov_fay) > np.diag(vcov_brr)) + + def test_replicate_metadata(self, replicate_data): + """SurveyMetadata should include replicate info.""" + from diff_diff.survey import compute_survey_metadata + + data, rep_cols = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + resolved = sd.resolve(data) + sm = compute_survey_metadata(resolved, data["weight"].values) + assert sm.replicate_method == "JK1" + assert sm.n_replicates == len(rep_cols) + assert sm.df_survey == len(rep_cols) - 1 + + def test_replicate_rejected_by_base_did(self, replicate_data): + """DifferenceInDifferences rejects replicate-weight designs.""" + data, rep_cols = replicate_data + n = len(data) + data["treated"] = (np.arange(n) < n // 2).astype(int) + data["post"] = (np.arange(n) % 4 >= 2).astype(int) + data["outcome"] = data["y"] + 1.5 * data["treated"] * data["post"] + + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="DifferenceInDifferences"): + DifferenceInDifferences().fit( + data, outcome="outcome", treatment="treated", time="post", + survey_design=sd, + ) + + def test_replicate_if_variance(self, replicate_data): + """IF-based replicate variance produces finite results.""" + from diff_diff.survey import compute_replicate_if_variance + + data, rep_cols = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + resolved = sd.resolve(data) + + # Synthetic influence function + psi = np.random.randn(len(data)) * 0.1 + var, _nv = compute_replicate_if_variance(psi, resolved) + assert np.isfinite(var) + assert var >= 0 + + def test_min_2_replicates(self, replicate_data): + """Need at least 2 replicate columns.""" + data, _ = replicate_data + sd = SurveyDesign( + weights="weight", replicate_weights=["rep_0"], + replicate_method="BRR", + ) + with pytest.raises(ValueError, match="At least 2"): + sd.resolve(data) + + def test_jkn_requires_replicate_strata(self): + """JKn must have replicate_strata.""" + with pytest.raises(ValueError, match="replicate_strata is required"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="JKn", + ) + + def test_jkn_replicate_strata_length_mismatch(self): + """replicate_strata length must match replicate_weights.""" + with pytest.raises(ValueError, match="length"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="JKn", replicate_strata=[0], + ) + + def test_jkn_variance(self, replicate_data): + """JKn with per-stratum scaling produces finite results.""" + from diff_diff.survey import compute_replicate_vcov + from diff_diff.linalg import solve_ols + + data, rep_cols = replicate_data + # Assign first half of replicates to stratum 0, second half to stratum 1 + n_rep = len(rep_cols) + strata = [0] * (n_rep // 2) + [1] * (n_rep - n_rep // 2) + + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JKn", replicate_strata=strata, + ) + resolved = sd.resolve(data) + y = data["y"].values + X = np.column_stack([np.ones(len(data)), data["x"].values]) + + coef, _, _ = solve_ols(X, y, weights=resolved.weights) + vcov, _nv = compute_replicate_vcov(X, y, coef, resolved) + assert np.all(np.isfinite(np.diag(vcov))) + assert np.all(np.diag(vcov) > 0) + + def test_replicate_if_scale_matches_analytical(self): + """Replicate IF variance should match analytical sum(psi^2) for SRS.""" + from diff_diff.survey import compute_replicate_if_variance + + np.random.seed(42) + n = 50 + psi = np.random.randn(n) * 0.1 + + # Build JK1 replicates: delete-1 jackknife + w_full = np.ones(n) + rep_cols = [] + data = pd.DataFrame({"w": w_full}) + for r in range(n): + w_r = np.ones(n) * n / (n - 1) + w_r[r] = 0.0 + col = f"rep_{r}" + data[col] = w_r + rep_cols.append(col) + + sd = SurveyDesign( + weights="w", replicate_weights=rep_cols, replicate_method="JK1", + ) + resolved = sd.resolve(data) + + v_rep, _nv = compute_replicate_if_variance(psi, resolved) + v_analytical = float(np.sum(psi**2)) + + # JK1 gives (n-1)/n * sum(...) which should approximate sum(psi^2) + assert v_rep == pytest.approx(v_analytical, rel=0.05), ( + f"Replicate IF variance {v_rep:.6f} does not match analytical " + f"{v_analytical:.6f}" + ) + + def test_replicate_if_matches_survey_if_variance(self): + """Replicate IF variance should approximate compute_survey_if_variance + when JK1 replicates match the PSU structure.""" + from diff_diff.survey import ( + compute_replicate_if_variance, + compute_survey_if_variance, + ResolvedSurveyDesign, + ) + + np.random.seed(123) + n = 60 + n_psu = 15 + psu = np.repeat(np.arange(n_psu), n // n_psu) + psi = np.random.randn(n) * 0.1 + weights = np.ones(n) + + # TSL variance with PSU clustering + resolved_tsl = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=psu, fpc=None, + n_strata=0, n_psu=n_psu, lonely_psu="remove", + ) + v_tsl = compute_survey_if_variance(psi, resolved_tsl) + + # JK1 replicates: delete one PSU at a time, rescale remaining + rep_arr = np.zeros((n, n_psu)) + for r in range(n_psu): + w_r = weights.copy() + w_r[psu == r] = 0.0 + w_r[psu != r] *= n_psu / (n_psu - 1) + rep_arr[:, r] = w_r + # Normalize each column to sum=n (matching resolve() normalization) + for c in range(n_psu): + cs = rep_arr[:, c].sum() + if cs > 0: + rep_arr[:, c] *= n / cs + + resolved_rep = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", + n_replicates=n_psu, + ) + v_rep, _nv = compute_replicate_if_variance(psi, resolved_rep) + + # Should be in the same ballpark (within 50% — different estimators + # of the same quantity) + assert v_rep > 0 and v_tsl > 0 + ratio = v_rep / v_tsl + assert 0.5 < ratio < 2.0, ( + f"Replicate IF var ({v_rep:.6f}) vs TSL IF var ({v_tsl:.6f}) " + f"ratio={ratio:.2f}, expected within [0.5, 2.0]" + ) + + def test_all_zero_weights_rejected_by_solve_ols(self): + """solve_ols should reject all-zero weight vectors.""" + from diff_diff.linalg import solve_ols + + X = np.column_stack([np.ones(10), np.random.randn(10)]) + y = np.random.randn(10) + with pytest.raises(ValueError, match="sum to zero"): + solve_ols(X, y, weights=np.zeros(10)) + + def test_solve_logit_rejects_single_class_positive_weight(self): + """solve_logit should reject when positive-weight obs have one class.""" + from diff_diff.linalg import solve_logit + + n = 20 + X = np.column_stack([np.ones(n), np.random.randn(n)]) + y = np.array([1] * 10 + [0] * 10, dtype=float) + w = np.ones(n) + # Zero out all class-0 obs — only class 1 remains + w[10:] = 0.0 + with pytest.raises(ValueError, match="outcome class"): + solve_logit(X, y, weights=w) + + def test_solve_logit_rejects_too_few_positive_weight_obs(self): + """solve_logit should reject when too few positive-weight obs.""" + from diff_diff.linalg import solve_logit + + n = 20 + X = np.column_stack([np.ones(n), np.random.randn(n)]) + y = np.array([1] * 10 + [0] * 10, dtype=float) + w = np.zeros(n) + # Only 2 positive-weight obs (one per class), but 2 params + w[0] = 1.0 # class 1 + w[10] = 1.0 # class 0 + with pytest.raises(ValueError, match="Cannot identify"): + solve_logit(X, y, weights=w) + + def test_solve_logit_rank_deficient_positive_weight_subset_error_mode(self): + """solve_logit with rank_deficient_action='error' should reject when + positive-weight subset is rank-deficient.""" + from diff_diff.linalg import solve_logit + + n = 20 + x1 = np.random.randn(n) + x2 = np.random.randn(n) + X = np.column_stack([x1, x2]) + y = np.array([1] * 10 + [0] * 10, dtype=float) + w = np.zeros(n) + for i in [0, 1, 2, 10, 11, 12]: + w[i] = 1.0 + X[i, 1] = 5.0 # x2 constant → rank deficient in subset + with pytest.raises(ValueError, match="rank-deficient"): + solve_logit(X, y, weights=w, rank_deficient_action="error") + + def test_solve_logit_rank_deficient_positive_weight_subset_warn_mode(self): + """solve_logit with rank_deficient_action='warn' should warn but not + error on rank-deficient positive-weight subset.""" + from diff_diff.linalg import solve_logit + + n = 20 + x1 = np.random.randn(n) + x2 = np.random.randn(n) + X = np.column_stack([x1, x2]) + y = np.array([1] * 10 + [0] * 10, dtype=float) + w = np.zeros(n) + for i in [0, 1, 2, 10, 11, 12]: + w[i] = 1.0 + X[i, 1] = 5.0 + with pytest.warns(UserWarning, match="rank-deficient"): + beta, probs = solve_logit(X, y, weights=w, rank_deficient_action="warn") + # Key assertion: beta must have original p+1 length (intercept + 2 covariates) + assert len(beta) == 3, ( + f"Expected beta length 3 (p+1), got {len(beta)}. " + f"Column dropping broke the coefficient vector shape." + ) + + def test_subpopulation_string_mask_rejected(self, basic_did_data): + """Subpopulation mask with string values should be rejected.""" + sd = SurveyDesign(weights="weight") + mask = ["yes"] * len(basic_did_data) + mask[0] = "no" + with pytest.raises(ValueError, match="string"): + sd.subpopulation(basic_did_data, mask) + + def test_nonbinary_numeric_mask_rejected(self, basic_did_data): + """Subpopulation mask with non-binary numeric codes should be rejected.""" + sd = SurveyDesign(weights="weight") + # Coded domain column {1, 2} — not boolean, should fail + mask = np.ones(len(basic_did_data), dtype=int) + mask[:10] = 2 + with pytest.raises(ValueError, match="non-binary"): + sd.subpopulation(basic_did_data, mask) + + def test_scale_invariance_combined_weights(self): + """Multiplying all replicate columns by a constant should not change SE.""" + from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign + + np.random.seed(42) + n = 40 + R = 10 + psi = np.random.randn(n) * 0.1 + weights = 1.0 + np.random.exponential(0.3, n) + + # Build replicate weights (combined: include full-sample weight) + rep_arr = np.zeros((n, R)) + cluster_size = n // R + for r in range(R): + w_r = weights.copy() + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + w_r[start:end] = 0.0 + w_r[w_r > 0] *= R / (R - 1) + rep_arr[:, r] = w_r + + resolved1 = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + combined_weights=True, + ) + v1, _ = compute_replicate_if_variance(psi, resolved1) + + # Scale all replicate columns by 3x + resolved2 = ResolvedSurveyDesign( + weights=weights * 3, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr * 3, + replicate_method="JK1", n_replicates=R, + combined_weights=True, + ) + v2, _ = compute_replicate_if_variance(psi, resolved2) + + assert v1 == pytest.approx(v2, rel=1e-10), ( + f"Scale invariance violated: v1={v1:.8f} vs v2={v2:.8f}" + ) + + def test_combined_weights_false(self): + """combined_weights=False treats replicate cols as perturbation factors.""" + sd = SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="BRR", combined_weights=False, + ) + assert sd.combined_weights is False + + def test_custom_scale(self): + """Custom replicate_scale overrides default factor.""" + sd = SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="BRR", replicate_scale=0.5, + ) + assert sd.replicate_scale == 0.5 + + def test_scale_rscales_multiplicative(self): + """scale and rscales are applied multiplicatively for JK1.""" + from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign + from diff_diff.linalg import solve_ols + + np.random.seed(42) + n, R = 100, 5 + x = np.random.randn(n) + y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5 + X = np.column_stack([np.ones(n), x]) + w = np.ones(n) + + cluster_size = n // R + rep_arr = np.ones((n, R)) + for r in range(R): + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + rep_arr[start:end, r] = 0.0 + rep_arr[~((np.arange(n) >= start) & (np.arange(n) < end)), r] *= R / (R - 1) + + coef, _, _ = solve_ols(X, y, weights=w) + + # rscales only + rscales = np.array([0.5, 1.0, 1.5, 1.0, 0.5]) + resolved_rscales = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + replicate_rscales=rscales, + ) + vcov_rscales, _ = compute_replicate_vcov(X, y, coef, resolved_rscales) + + # scale * rscales should multiply + scale_val = 2.0 + resolved_both = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + replicate_scale=scale_val, replicate_rscales=rscales, + ) + vcov_both, _ = compute_replicate_vcov(X, y, coef, resolved_both) + + # V(scale+rscales) == scale * V(rscales) + assert np.allclose(np.diag(vcov_both), scale_val * np.diag(vcov_rscales), rtol=1e-10) + + def test_brr_ignores_custom_scaling(self): + """BRR ignores custom scale/rscales with a warning.""" + import warnings + from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign + from diff_diff.linalg import solve_ols + + np.random.seed(42) + n, R = 100, 5 + x = np.random.randn(n) + y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5 + X = np.column_stack([np.ones(n), x]) + w = np.ones(n) + rep_arr = np.random.uniform(0.8, 1.2, (n, R)) + + coef, _, _ = solve_ols(X, y, weights=w) + + # Default BRR + resolved_default = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="BRR", n_replicates=R, + ) + vcov_default, _ = compute_replicate_vcov(X, y, coef, resolved_default) + + # BRR with custom scale (should be ignored with warning) + resolved_custom = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="BRR", n_replicates=R, + replicate_scale=99.0, + ) + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + vcov_custom, _ = compute_replicate_vcov(X, y, coef, resolved_custom) + assert any("ignored" in str(w.message) for w in caught) + assert np.allclose(np.diag(vcov_default), np.diag(vcov_custom), rtol=1e-10) + + def test_replicate_if_no_divide_by_zero_warning(self): + """compute_replicate_if_variance should not warn on zero weights.""" + from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign + + n = 20 + psi = np.random.randn(n) + weights = np.ones(n) + weights[:5] = 0.0 # Some zero full-sample weights (subpopulation) + + rep_arr = np.ones((n, 5)) + rep_arr[:5, :] = 0.0 # Match zero pattern + for r in range(5): + cs = rep_arr[:, r].sum() + if cs > 0: + rep_arr[:, r] *= n / cs + + resolved = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=5, + ) + import warnings + with warnings.catch_warnings(): + warnings.simplefilter("error", RuntimeWarning) + # Should NOT raise RuntimeWarning for divide by zero + v, _nv = compute_replicate_if_variance(psi, resolved) + assert np.isfinite(v) + + +class TestReplicateEdgeCases: + """Regression tests for analysis-weight df and rscales centering.""" + + def test_df_survey_combined_weights_false(self): + """df_survey uses analysis-weight rank when combined_weights=False.""" + from diff_diff.survey import ResolvedSurveyDesign + + np.random.seed(42) + n = 50 + R = 5 + weights = 1.0 + np.random.exponential(0.5, n) + # Perturbation factors (not full weights) + rep_factors = np.random.uniform(0.8, 1.2, (n, R)) + + resolved = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_factors, + replicate_method="BRR", n_replicates=R, + combined_weights=False, + ) + # df should match pivoted QR rank of analysis weights + analysis_weights = rep_factors * weights[:, np.newaxis] + from scipy.linalg import qr as scipy_qr + _, R_mat, _ = scipy_qr(analysis_weights, pivoting=True, mode='economic') + diag_abs = np.abs(np.diag(R_mat)) + expected_rank = int(np.sum(diag_abs > 1e-5 * diag_abs.max())) + expected_df = expected_rank - 1 if expected_rank > 1 else None + assert resolved.df_survey == expected_df + + # Verify it differs from raw perturbation-factor rank when weights + # cause a rank reduction (e.g., zero full-sample weights) + weights_with_zeros = weights.copy() + weights_with_zeros[:10] = 0.0 # subpopulation-zeroed + resolved2 = ResolvedSurveyDesign( + weights=weights_with_zeros, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_factors, + replicate_method="BRR", n_replicates=R, + combined_weights=False, + ) + analysis_z = rep_factors * weights_with_zeros[:, np.newaxis] + _, R_z, _ = scipy_qr(analysis_z, pivoting=True, mode='economic') + diag_z = np.abs(np.diag(R_z)) + z_rank = int(np.sum(diag_z > 1e-5 * diag_z.max())) if diag_z.max() > 0 else 0 + z_df = z_rank - 1 if z_rank > 1 else None + assert resolved2.df_survey == z_df + + def test_rscales_zero_centering_vcov(self): + """mse=False with zero rscales: center only on rscales > 0 replicates.""" + from diff_diff.survey import compute_replicate_vcov, ResolvedSurveyDesign + from diff_diff.linalg import solve_ols + + np.random.seed(42) + n = 100 + R = 6 + x = np.random.randn(n) + y = 1.0 + 2.0 * x + np.random.randn(n) * 0.5 + X = np.column_stack([np.ones(n), x]) + w = np.ones(n) + + # Build JK1-style replicates + cluster_size = n // R + rep_arr = np.ones((n, R)) + for r in range(R): + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + rep_arr[start:end, :] = 0.0 + # Correct column r only + rep_arr[:, r] = np.where( + (np.arange(n) >= start) & (np.arange(n) < end), 0.0, + R / (R - 1) + ) + + # rscales with one zero entry + rscales = np.array([1.0, 1.0, 0.0, 1.0, 1.0, 1.0]) + + coef, _, _ = solve_ols(X, y, weights=w) + + resolved = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + replicate_rscales=rscales, mse=False, + ) + vcov, _nv = compute_replicate_vcov(X, y, coef, resolved) + + # Manual computation: center only on replicates with rscales > 0 + coef_reps = [] + for r in range(R): + c_r, _, _ = solve_ols(X, y, weights=rep_arr[:, r]) + coef_reps.append(c_r) + coef_reps = np.array(coef_reps) + pos_mask = rscales > 0 + center = np.mean(coef_reps[pos_mask], axis=0) + diffs = coef_reps - center[np.newaxis, :] + V_manual = np.zeros((2, 2)) + for r in range(R): + V_manual += rscales[r] * np.outer(diffs[r], diffs[r]) + + assert np.allclose(np.diag(vcov), np.diag(V_manual), rtol=1e-10) + + def test_rscales_zero_centering_if(self): + """mse=False with zero rscales: IF path centers only on rscales > 0.""" + from diff_diff.survey import compute_replicate_if_variance, ResolvedSurveyDesign + + np.random.seed(42) + n = 50 + R = 5 + psi = np.random.randn(n) * 0.1 + w = np.ones(n) + + # Build simple replicates + rep_arr = np.ones((n, R)) + for r in range(R): + start = r * (n // R) + end = min((r + 1) * (n // R), n) + rep_arr[start:end, r] = 0.0 + rep_arr[:, r] = np.where( + (np.arange(n) >= start) & (np.arange(n) < end), 0.0, + R / (R - 1) + ) + + rscales = np.array([1.0, 0.0, 1.0, 1.0, 1.0]) + + resolved = ResolvedSurveyDesign( + weights=w, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + replicate_rscales=rscales, mse=False, + ) + var, _nv = compute_replicate_if_variance(psi, resolved) + + # Manual: theta_r = sum((w_r/w) * psi), center on rscales > 0 only + theta_full = float(np.sum(psi)) + theta_reps = np.array([ + float(np.sum(np.divide(rep_arr[:, r], w, out=np.zeros(n), where=w > 0) * psi)) + for r in range(R) + ]) + pos_mask = rscales > 0 + center = float(np.mean(theta_reps[pos_mask])) + diffs = theta_reps - center + var_manual = float(np.sum(rscales * diffs**2)) + + assert var == pytest.approx(var_manual, rel=1e-10) + + def test_rank_one_replicate_nan_inference(self): + """Rank-1 replicate design yields df_survey=None and NaN inference.""" + from diff_diff.survey import ResolvedSurveyDesign + from diff_diff.utils import safe_inference + + np.random.seed(42) + n = 50 + R = 3 + weights = np.ones(n) + # All replicate columns identical → rank 1 + col = 1.0 + np.random.randn(n) * 0.1 + rep_arr = np.column_stack([col, col, col]) + + resolved = ResolvedSurveyDesign( + weights=weights, weight_type="pweight", + strata=None, psu=None, fpc=None, + n_strata=0, n_psu=0, lonely_psu="remove", + replicate_weights=rep_arr, + replicate_method="JK1", n_replicates=R, + ) + # Rank-1 → df_survey should be None + assert resolved.df_survey is None + + # Passing df=0 (the guard value) should produce NaN p-value and CI + _, p_val, ci = safe_inference(1.0, 0.5, df=0) + assert np.isnan(p_val) + assert np.isnan(ci[0]) and np.isnan(ci[1]) + + +# ============================================================================= +# Estimator-Level Replicate Weight Tests +# ============================================================================= + + +class TestEstimatorReplicateWeights: + """End-to-end tests for replicate weights with each estimator.""" + + @staticmethod + def _make_staggered_replicate_data(): + """Create staggered panel with replicate weight columns.""" + np.random.seed(42) + n_units, n_periods = 60, 6 + n_rep = 15 + rows = [] + for unit in range(n_units): + if unit < 20: + ft = 3 + elif unit < 40: + ft = 5 + else: + ft = 0 + wt = 1.0 + 0.3 * (unit % 5) + for t in range(1, n_periods + 1): + y = 10.0 + unit * 0.05 + t * 0.2 + if ft > 0 and t >= ft: + y += 2.0 + y += np.random.normal(0, 0.5) + rows.append({ + "unit": unit, "time": t, "first_treat": ft, + "outcome": y, "weight": wt, + }) + data = pd.DataFrame(rows) + # Generate JK1 replicates (delete-cluster jackknife) + cluster_size = n_units // n_rep + rep_cols = [] + for r in range(n_rep): + start = r * cluster_size + end = min((r + 1) * cluster_size, n_units) + w_r = data["weight"].values.copy() + mask = (data["unit"] >= start) & (data["unit"] < end) + w_r[mask] = 0.0 + w_r[~mask] *= n_rep / (n_rep - 1) + col = f"rep_{r}" + data[col] = w_r + rep_cols.append(col) + return data, rep_cols + + def test_callaway_santanna_replicate(self): + """CallawaySantAnna with replicate weights produces finite results.""" + from diff_diff import CallawaySantAnna + + data, rep_cols = self._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + result = CallawaySantAnna(estimation_method="reg", n_bootstrap=0).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_se) + assert result.survey_metadata is not None + + def test_efficient_did_replicate(self): + """EfficientDiD with replicate weights produces finite results.""" + from diff_diff import EfficientDiD + + data, rep_cols = self._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + result = EfficientDiD(n_bootstrap=0).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_se) + assert result.survey_metadata is not None + + def test_subpopulation_preserves_replicate_metadata(self): + """subpopulation() should preserve replicate design fields.""" + data, rep_cols = self._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + mask = data["unit"] < 40 + new_sd, new_data = sd.subpopulation(data, mask) + + assert new_sd.replicate_method == "JK1" + assert new_sd.replicate_weights == rep_cols + # Excluded obs have zero replicate weights + excluded = new_data[~mask.values] + for col in rep_cols: + assert (excluded[col] == 0.0).all() + + def test_continuous_did_replicate_bootstrap_rejected(self): + """ContinuousDiD should reject replicate weights + n_bootstrap > 0.""" + from diff_diff import ContinuousDiD + + data, rep_cols = self._make_staggered_replicate_data() + data["dose"] = np.where(data["first_treat"] > 0, 1.0 + 0.5 * data["unit"] % 3, 0.0) + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="not supported"): + ContinuousDiD(n_bootstrap=30, seed=42).fit( + data, "outcome", "unit", "time", "first_treat", "dose", + survey_design=sd, + ) + + @pytest.mark.parametrize("est_method", ["reg", "ipw", "dr"]) + def test_triple_diff_replicate_all_methods(self, est_method): + """TripleDifference with replicate weights works for reg/ipw/dr.""" + from diff_diff import TripleDifference + + np.random.seed(42) + n = 200 + n_rep = 10 + d1 = np.repeat([0, 1], n // 2) + d2 = np.tile([0, 1], n // 2) + post = np.random.choice([0, 1], n) + y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5 + w = 1.0 + np.random.exponential(0.3, n) + + data = pd.DataFrame({ + "y": y, "d1": d1, "d2": d2, "post": post, "weight": w, + }) + # Build JK1 replicates + cluster_size = n // n_rep + rep_cols = [] + for r in range(n_rep): + w_r = w.copy() + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + w_r[start:end] = 0.0 + w_r[w_r > 0] *= n_rep / (n_rep - 1) + col = f"rep_{r}" + data[col] = w_r + rep_cols.append(col) + + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + result = TripleDifference(estimation_method=est_method).fit( + data, outcome="y", group="d1", partition="d2", + time="post", survey_design=sd, + ) + assert np.isfinite(result.att) + assert np.isfinite(result.se) + assert result.survey_metadata is not None + + def test_sun_abraham_replicate_rejected(self): + """SunAbraham should reject replicate-weight survey designs.""" + from diff_diff import SunAbraham + + data, rep_cols = self._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="SunAbraham"): + SunAbraham(n_bootstrap=0).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + + +class TestSubpopulationMaskValidation: + """Tests for subpopulation mask edge cases.""" + + def test_nan_mask_rejected(self, basic_did_data): + """Subpopulation mask with NaN should be rejected.""" + sd = SurveyDesign(weights="weight") + nan_mask = np.ones(len(basic_did_data)) + nan_mask[0] = np.nan + with pytest.raises(ValueError, match="NA|NaN|missing"): + sd.subpopulation(basic_did_data, nan_mask) + + def test_none_mask_rejected(self, basic_did_data): + """Subpopulation mask with None should be rejected.""" + sd = SurveyDesign(weights="weight") + mask = [True] * len(basic_did_data) + mask[0] = None + with pytest.raises(ValueError, match="None|NA|missing"): + sd.subpopulation(basic_did_data, mask) + + def test_pd_na_mask_rejected(self, basic_did_data): + """Subpopulation mask with pd.NA should be rejected.""" + sd = SurveyDesign(weights="weight") + mask = pd.array([True] * len(basic_did_data), dtype=pd.BooleanDtype()) + mask[0] = pd.NA + with pytest.raises(ValueError, match="NA|missing"): + sd.subpopulation(basic_did_data, mask) + + def test_efficient_did_replicate_bootstrap_rejected(self): + """EfficientDiD should reject replicate weights + n_bootstrap > 0.""" + from diff_diff import EfficientDiD + + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="not supported"): + EfficientDiD(n_bootstrap=30, seed=42).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + + def test_callaway_santanna_replicate_bootstrap_rejected(self): + """CallawaySantAnna should reject replicate weights + n_bootstrap > 0.""" + from diff_diff import CallawaySantAnna + + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="not supported"): + CallawaySantAnna(estimation_method="reg", n_bootstrap=30, seed=42).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + + def test_twfe_replicate_rejected(self): + """TwoWayFixedEffects should reject replicate-weight designs.""" + from diff_diff.twfe import TwoWayFixedEffects + + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="TwoWayFixedEffects"): + TwoWayFixedEffects().fit( + data, outcome="outcome", treatment="first_treat", + unit="unit", time="time", survey_design=sd, + ) + + def test_stacked_did_replicate_rejected(self): + """StackedDiD should reject replicate-weight designs.""" + from diff_diff import StackedDiD + + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + with pytest.raises(NotImplementedError, match="StackedDiD"): + StackedDiD().fit( + data, outcome="outcome", unit="unit", time="time", + first_treat="first_treat", survey_design=sd, + ) + + def test_invalid_replicate_scale_rejected(self): + """Negative or zero replicate_scale should be rejected.""" + with pytest.raises(ValueError, match="positive finite"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="JK1", replicate_scale=-1.0, + ) + with pytest.raises(ValueError, match="positive finite"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="JK1", replicate_scale=0.0, + ) + + def test_invalid_replicate_rscales_rejected(self): + """Negative replicate_rscales should be rejected.""" + with pytest.raises(ValueError, match="non-negative"): + SurveyDesign( + weights="w", replicate_weights=["r1", "r2"], + replicate_method="JK1", replicate_rscales=[-1.0, 1.0], + ) + + def _replicate_sd_and_data(self): + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + return data, sd + + def test_multi_period_did_replicate_rejected(self): + """MultiPeriodDiD rejects replicate-weight designs.""" + from diff_diff.estimators import MultiPeriodDiD + data, sd = self._replicate_sd_and_data() + data["treated"] = (data["first_treat"] > 0).astype(int) + data["post"] = (data["time"] >= 3).astype(int) + with pytest.raises(NotImplementedError): + MultiPeriodDiD().fit( + data, outcome="outcome", treatment="treated", + time="post", survey_design=sd, + ) + + def test_imputation_did_replicate_rejected(self): + """ImputationDiD rejects replicate-weight designs.""" + from diff_diff.imputation import ImputationDiD + data, sd = self._replicate_sd_and_data() + with pytest.raises(NotImplementedError): + ImputationDiD().fit( + data, outcome="outcome", unit="unit", time="time", + first_treat="first_treat", survey_design=sd, + ) + + def test_two_stage_did_replicate_rejected(self): + """TwoStageDiD rejects replicate-weight designs.""" + from diff_diff.two_stage import TwoStageDiD + data, sd = self._replicate_sd_and_data() + with pytest.raises(NotImplementedError): + TwoStageDiD().fit( + data, outcome="outcome", unit="unit", time="time", + first_treat="first_treat", survey_design=sd, + ) + + def test_bacon_replicate_rejected(self): + """BaconDecomposition rejects replicate-weight designs.""" + from diff_diff.bacon import BaconDecomposition + data, sd = self._replicate_sd_and_data() + with pytest.raises(NotImplementedError): + BaconDecomposition().fit( + data, outcome="outcome", unit="unit", time="time", + first_treat="first_treat", survey_design=sd, + ) + + def test_synthetic_did_replicate_rejected(self): + """SyntheticDiD rejects replicate-weight designs.""" + from diff_diff.synthetic_did import SyntheticDiD + data, sd = self._replicate_sd_and_data() + data["treated"] = (data["first_treat"] > 0).astype(int) + with pytest.raises(NotImplementedError): + SyntheticDiD().fit( + data, outcome="outcome", unit="unit", time="time", + treatment="treated", survey_design=sd, + ) + + @pytest.mark.slow + def test_trop_replicate_rejected(self): + """TROP rejects replicate-weight designs.""" + from diff_diff.trop import TROP + data, sd = self._replicate_sd_and_data() + data["treated"] = (data["first_treat"] > 0).astype(int) + with pytest.raises(NotImplementedError): + TROP().fit( + data, outcome="outcome", unit="unit", time="time", + treatment="treated", survey_design=sd, + ) + + +class TestCSReplicateMethodCoverage: + """Test that CS replicate weights work for all supported estimation methods.""" + + @staticmethod + def _make_cs_replicate_data(): + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + return data, sd + + @pytest.mark.parametrize("method", ["ipw", "dr"]) + def test_cs_replicate_ipw_dr_no_covariates(self, method): + """CS replicate weights work for ipw/dr without covariates.""" + from diff_diff import CallawaySantAnna + data, sd = self._make_cs_replicate_data() + result = CallawaySantAnna( + estimation_method=method, n_bootstrap=0, + ).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + assert np.isfinite(result.overall_att) + assert np.isfinite(result.overall_se) + assert result.survey_metadata is not None + + +# ============================================================================= +# Effective-sample and d.f. consistency tests +# ============================================================================= + + +class TestEffectiveSampleAndDfConsistency: + """Tests for P1 effective-sample gates and P2 d.f. metadata consistency.""" + + def test_continuous_did_positive_weight_gate(self): + """ContinuousDiD skips cells where positive-weight count < n_basis.""" + import warnings + from diff_diff import ContinuousDiD + + np.random.seed(42) + n_units, n_periods = 60, 6 + n_rep = 10 + rows = [] + for unit in range(n_units): + if unit < 20: + ft = 3 + elif unit < 40: + ft = 5 + else: + ft = 0 + dose = float(unit % 3 + 1) if ft > 0 else 0.0 + wt = 1.0 + 0.3 * (unit % 5) + for t in range(1, n_periods + 1): + y = 10.0 + unit * 0.05 + t * 0.2 + if ft > 0 and t >= ft: + y += 2.0 * dose + y += np.random.normal(0, 0.5) + rows.append({ + "unit": unit, "time": t, "first_treat": ft, + "outcome": y, "weight": wt, "dose": dose, + }) + data = pd.DataFrame(rows) + + # Build replicate columns + cluster_size = n_units // n_rep + rep_cols = [] + for r in range(n_rep): + w_r = data["weight"].values.copy() + start = r * cluster_size + end = min((r + 1) * cluster_size, n_units) + mask = (data["unit"] >= start) & (data["unit"] < end) + w_r[mask] = 0.0 + w_r[~mask] *= n_rep / (n_rep - 1) + col = f"rep_{r}" + data[col] = w_r + rep_cols.append(col) + + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + + # Zero weights for most of cohort ft=5 (units 20-39) while keeping + # cohort ft=3 (units 0-19) and controls intact. This leaves too few + # positive-weight treated units in ft=5 cells, triggering the warning, + # but ft=3 cells remain valid so the estimator still produces a result. + subpop_mask = (data["first_treat"] != 5) | (data["unit"] < 22) + new_sd, new_data = sd.subpopulation(data, subpop_mask) + + # Should emit warning about not enough positive-weight treated units + # for the cohort-5 cells, but still produce a result from cohort-3 + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always") + result = ContinuousDiD(n_bootstrap=0, num_knots=2).fit( + new_data, "outcome", "unit", "time", "first_treat", "dose", + survey_design=new_sd, + ) + weight_warnings = [ + w for w in caught + if "positive-weight" in str(w.message) + ] + assert len(weight_warnings) > 0, "Expected warning about insufficient positive-weight units" + # Result should still be finite (from the valid cohort-3 cells) + assert result is not None + assert np.isfinite(result.overall_att) + + def test_triple_diff_positive_weight_gate(self): + """TripleDifference falls back to weighted mean when n_eff < n_cols.""" + from diff_diff.triple_diff import TripleDifference + + np.random.seed(42) + n = 200 + d1 = np.repeat([0, 1], n // 2) + d2 = np.tile([0, 1], n // 2) + post = np.tile([0, 0, 1, 1], n // 4) + x1 = np.random.randn(n) + x2 = np.random.randn(n) + x3 = np.random.randn(n) + y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5 + w = 1.0 + np.random.exponential(0.3, n) + + # Create a cell (d1=1, d2=1, post=1) where most obs have zero weight + # but each cell still has SOME positive weight for cell means. + # With 3 covariates + intercept = 4 columns, n_eff=2 < 4 triggers fallback + cell_target = (d1 == 1) & (d2 == 1) & (post == 1) + cell_indices = np.where(cell_target)[0] + # Zero all but 2 obs in this cell + for idx in cell_indices[2:]: + w[idx] = 0.0 + + data = pd.DataFrame({ + "y": y, "d1": d1, "d2": d2, "post": post, + "weight": w, "x1": x1, "x2": x2, "x3": x3, + }) + + sd = SurveyDesign(weights="weight") + + # Should not crash — falls back to weighted mean for the thin cell + result = TripleDifference(estimation_method="reg").fit( + data, outcome="y", group="d1", partition="d2", + time="post", covariates=["x1", "x2", "x3"], survey_design=sd, + ) + assert result is not None + assert np.isfinite(result.att) + + def test_cs_replicate_df_metadata_consistency(self): + """CS survey_metadata.df_survey matches the d.f. used for inference.""" + from diff_diff import CallawaySantAnna + + data, rep_cols = TestEstimatorReplicateWeights._make_staggered_replicate_data() + n_rep = len(rep_cols) + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + result = CallawaySantAnna(estimation_method="reg", n_bootstrap=0).fit( + data, "outcome", "unit", "time", "first_treat", + survey_design=sd, + ) + sm = result.survey_metadata + assert sm is not None + assert sm.df_survey is not None + assert sm.df_survey == n_rep - 1 + + # summary() should include the df line + summary_str = result.summary() + assert "Survey d.f." in summary_str + + def test_triple_diff_replicate_df_metadata_consistency(self): + """TripleDifference survey_metadata.df_survey matches inference d.f.""" + from diff_diff import TripleDifference + + np.random.seed(42) + n = 200 + n_rep = 10 + d1 = np.repeat([0, 1], n // 2) + d2 = np.tile([0, 1], n // 2) + post = np.random.choice([0, 1], n) + y = 1.0 + 0.5 * d1 + 0.3 * d2 + 2.0 * d1 * d2 * post + np.random.randn(n) * 0.5 + w = 1.0 + np.random.exponential(0.3, n) + + data = pd.DataFrame({ + "y": y, "d1": d1, "d2": d2, "post": post, "weight": w, + }) + cluster_size = n // n_rep + rep_cols = [] + for r in range(n_rep): + w_r = w.copy() + start = r * cluster_size + end = min((r + 1) * cluster_size, n) + w_r[start:end] = 0.0 + w_r[w_r > 0] *= n_rep / (n_rep - 1) + col = f"rep_{r}" + data[col] = w_r + rep_cols.append(col) + + sd = SurveyDesign( + weights="weight", replicate_weights=rep_cols, + replicate_method="JK1", + ) + result = TripleDifference(estimation_method="reg").fit( + data, outcome="y", group="d1", partition="d2", + time="post", survey_design=sd, + ) + sm = result.survey_metadata + assert sm is not None + assert sm.df_survey is not None + assert sm.df_survey == n_rep - 1 + + d = result.to_dict() + assert d["df_survey"] == sm.df_survey