Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
29eae1e
Add survey Phase 6: replicate weights, DEFF diagnostics, subpopulatio…
igerber Mar 26, 2026
7bd49f2
Address AI review P1/P2 findings for replicate weights
igerber Mar 26, 2026
0a11b76
Fix panel→unit replicate propagation, weight normalization, validation
igerber Mar 27, 2026
927f6cd
Fix replicate IF variance contract, add zero-weight guards, numerical…
igerber Mar 27, 2026
da98004
Add replicate weights, DEFF, subpopulation to REGISTRY.md
igerber Mar 27, 2026
94b6153
Fix solve_logit effective-sample validation, np.divide warning
igerber Mar 27, 2026
10c733c
Add effective-sample rank check in solve_logit for zero-weight designs
igerber Mar 27, 2026
2cd91b9
Fix TripleDiff replicate IF scale, reject replicate+bootstrap, harden…
igerber Mar 27, 2026
0af9e55
Fix compute_deff all-NaN branch, parameterize TripleDiff replicate test
igerber Mar 27, 2026
a3ec51a
Reject replicate designs for SunAbraham, validate subpopulation masks
igerber Mar 27, 2026
b063f5e
Fix replicate IF variance score scaling for EfficientDiD, TripleDiff,…
igerber Mar 27, 2026
2c2ef14
Fix subpopulation mask validation, EfficientDiD bootstrap guard, logi…
igerber Mar 27, 2026
657d918
Guard zero-weight cells in TripleDiff and ContinuousDiD
igerber Mar 27, 2026
50d8314
Guard <2 valid replicates, drop logit columns, filter NaN cells
igerber Mar 27, 2026
14f2110
Reject CS replicate+bootstrap, guard Bacon zero-weight cells
igerber Mar 27, 2026
200745c
Address CI review: fix logit coef expansion, reject string masks
igerber Mar 27, 2026
a8b3e4c
Address CI review round 2: reject non-binary masks, fix logit test, d…
igerber Mar 27, 2026
8a757b1
Propagate replicate metadata in CS unit-level survey collapse
igerber Mar 27, 2026
bc9a94d
Document stale replicate df as Note in REGISTRY.md
igerber Mar 27, 2026
91d778f
Fix replicate df: update n_replicates to valid count after drops
igerber Mar 27, 2026
84809cd
Return n_valid from replicate variance functions, fix df properly
igerber Mar 27, 2026
bb8edfc
Clear _replicate_df on refit, add CS zero-mass guards, document IF df
igerber Mar 27, 2026
5434007
Fix CS zero-mass return type, vectorized guard, propagate IF replicat…
igerber Mar 27, 2026
5930286
Fix CS df key name, re-read df after aggregation, propagate Continuou…
igerber Mar 27, 2026
86f09f2
Propagate replicate df through EfficientDiD and TripleDifference IF p…
igerber Mar 27, 2026
978e866
Reset _replicate_n_valid at start of TripleDifference.fit()
igerber Mar 27, 2026
088bdc2
Expand replicate weight API: combined_weights, scale, rscales, mse
igerber Mar 27, 2026
aabb404
Propagate replicate design params through subpopulation, subset_to_un…
igerber Mar 27, 2026
0bb5953
Use original R for replicate variance scaling, not n_valid
igerber Mar 27, 2026
cb0880a
Fix IF replicate weight ratio: skip full-sample normalization for rep…
igerber Mar 27, 2026
aefa7f3
Validate combined-weight IF ratio, zero aweight scores for zero-weigh…
igerber Mar 27, 2026
3efa276
Rank-based replicate df, mse=False default, update docs
igerber Mar 28, 2026
e3c8f0e
Document replicate scaling convention: original R for scale, n_valid …
igerber Mar 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,16 @@ Deferred items from PR reviews that were not addressed before merge.
| ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
| EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
| Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
| Replicate-weight survey df — **Resolved**. `n_replicates` updated to valid count after invalid replicates are dropped, so `df_survey = n_valid - 1`. | `survey.py` | #238 | Resolved |
| CallawaySantAnna survey: strata/PSU/FPC — **Resolved**. Aggregated SEs (overall, event study, group) use `compute_survey_if_variance()`. Bootstrap uses PSU-level multiplier weights. | `staggered.py` | #237 | Resolved |
| CallawaySantAnna survey + covariates + IPW/DR: DRDID panel nuisance-estimation IF corrections not implemented. Currently gated with NotImplementedError. Regression method with covariates works (has WLS nuisance IF correction). | `staggered.py` | #233 | Medium |
| SyntheticDiD/TROP survey: strata/PSU/FPC — **Resolved**. Rao-Wu rescaled bootstrap implemented for both. TROP uses cross-classified pseudo-strata. Rust TROP remains pweight-only (Python fallback for full design). | `synthetic_did.py`, `trop.py` | — | Resolved |
| EfficientDiD hausman_pretest() clustered covariance uses stale `n_cl` after filtering non-finite EIF rows — should recompute effective cluster count and remap indices after `row_finite` filtering | `efficient_did.py` | #230 | Medium |
| EfficientDiD hausman_pretest() clustered covariance stale `n_cl` — **Resolved**. Recompute `n_cl` and remap indices after `row_finite` filtering via `np.unique(return_inverse=True)`. | `efficient_did.py` | #230 | Resolved |
| EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
| TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
| ContinuousDiD event-study aggregation does not filter by `anticipation` — uses all (g,t) cells instead of anticipation-filtered subset; pre-existing in both survey and non-survey paths | `continuous_did.py` | #226 | Medium |
| ContinuousDiD event-study aggregation anticipation filter — **Resolved**. `_aggregate_event_study()` now filters `e < -anticipation` when `anticipation > 0`, matching CallawaySantAnna behavior. Bootstrap paths also filtered. | `continuous_did.py` | #226 | Resolved |
| Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
| Duplicated survey metadata summary formatting across 6 results classes — extract shared `_format_survey_metadata(sm, width)` helper to reduce maintenance burden as more estimators gain survey support in Phases 4-5 | `results.py`, `stacked_did_results.py`, `sun_abraham.py`, `bacon.py`, `triple_diff.py`, `continuous_did_results.py`, `efficient_did_results.py` | #226 | Low |
| Survey metadata formatting dedup — **Resolved**. Extracted `_format_survey_block()` helper in `results.py`, replaced 13 occurrences across 11 files. | `results.py` + 10 results files | — | Resolved |
| TROP: `fit()` and `_fit_global()` share ~150 lines of near-identical data setup (panel pivoting, absorbing-state validation, first-treatment detection, effective rank, NaN warnings). Both bootstrap methods also duplicate the stratified resampling loop. Extract shared helpers to eliminate cross-file sync risk. | `trop.py`, `trop_global.py`, `trop_local.py` | — | Low |

#### Performance
Expand All @@ -78,7 +79,7 @@ Deferred items from PR reviews that were not addressed before merge.
| CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
| ~376 `duplicate object description` Sphinx warnings — caused by autodoc `:members:` on dataclass attributes within manual API pages (not from autosummary stubs); fix requires restructuring `docs/api/*.rst` pages to avoid documenting the same attribute via both `:members:` and inline `autosummary` tables | `docs/api/*.rst` | — | Low |
| Plotly renderers silently ignore styling kwargs (marker, markersize, linewidth, capsize, ci_linewidth) that the matplotlib backend honors; thread them through or reject when `backend="plotly"` | `visualization/_event_study.py`, `_diagnostic.py`, `_power.py` | #222 | Medium |
| Survey bootstrap test coverage: add FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests | `tests/test_survey_phase*.py` | #237 | Medium |
| Survey bootstrap test coverage — **Resolved**. Added FPC census zero-variance, single-PSU NaN, full-design bootstrap for CS/ContinuousDiD/EfficientDiD, and TROP Rao-Wu vs block bootstrap equivalence tests. | `tests/test_survey_phase*.py` | | Resolved |

---

Expand Down
4 changes: 4 additions & 0 deletions diff_diff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,10 @@
SyntheticDiDResults,
)
from diff_diff.survey import (
DEFFDiagnostics,
SurveyDesign,
SurveyMetadata,
compute_deff_diagnostics,
)
from diff_diff.staggered import (
CallawaySantAnna,
Expand Down Expand Up @@ -327,6 +329,8 @@
# Survey support
"SurveyDesign",
"SurveyMetadata",
"DEFFDiagnostics",
"compute_deff_diagnostics",
# Rust backend
"HAS_RUST_BACKEND",
# Linear algebra helpers
Expand Down
36 changes: 18 additions & 18 deletions diff_diff/bacon.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import numpy as np
import pandas as pd

from diff_diff.results import _format_survey_block
from diff_diff.utils import within_transform as _within_transform_util


Expand Down Expand Up @@ -144,23 +145,7 @@ def summary(self) -> str:
# Add survey design info
if self.survey_metadata is not None:
sm = self.survey_metadata
lines.extend(
[
"-" * 85,
"Survey Design".center(85),
"-" * 85,
f"{'Weight type:':<35} {sm.weight_type:>10}",
]
)
if sm.n_strata is not None:
lines.append(f"{'Strata:':<35} {sm.n_strata:>10}")
if sm.n_psu is not None:
lines.append(f"{'PSU/Cluster:':<35} {sm.n_psu:>10}")
lines.append(f"{'Effective sample size:':<35} {sm.effective_n:>10.1f}")
lines.append(f"{'Design effect (DEFF):':<35} {sm.design_effect:>10.2f}")
if sm.df_survey is not None:
lines.append(f"{'Survey d.f.:':<35} {sm.df_survey:>10}")
lines.extend(["-" * 85, ""])
lines.extend(_format_survey_block(sm, 85))

lines.extend(
[
Expand Down Expand Up @@ -849,13 +834,21 @@ def _compute_treated_vs_never(
never_post_mask = never_mask & df[time].isin(post_periods)

# Guard against empty cells (unbalanced/filtered panels)
# Also check positive weight mass for survey/subpopulation designs
if not (
np.any(treated_pre_mask)
and np.any(treated_post_mask)
and np.any(never_pre_mask)
and np.any(never_post_mask)
):
return None
if (
np.sum(w[treated_pre_mask]) <= 0
or np.sum(w[treated_post_mask]) <= 0
or np.sum(w[never_pre_mask]) <= 0
or np.sum(w[never_post_mask]) <= 0
):
return None

treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
Expand Down Expand Up @@ -966,14 +959,21 @@ def _compute_timing_comparison(
control_pre_mask = control_mask & df[time].isin(pre_periods)
control_post_mask = control_mask & df[time].isin(post_periods)

# Skip if any cell is empty
# Skip if any cell is empty or has zero effective weight
if (
treated_pre_mask.sum() == 0
or treated_post_mask.sum() == 0
or control_pre_mask.sum() == 0
or control_post_mask.sum() == 0
):
return None
if (
np.sum(w[treated_pre_mask]) <= 0
or np.sum(w[treated_post_mask]) <= 0
or np.sum(w[control_pre_mask]) <= 0
or np.sum(w[control_post_mask]) <= 0
):
return None

treated_pre = np.average(y[treated_pre_mask], weights=w[treated_pre_mask])
treated_post = np.average(y[treated_post_mask], weights=w[treated_post_mask])
Expand Down
Loading
Loading