Survey Phase 7: CS IPW/DR covariates, repeated cross-sections, HonestDiD survey variance#240
Survey Phase 7: CS IPW/DR covariates, repeated cross-sections, HonestDiD survey variance#240
Conversation
…DiD survey variance Phase 7a: Remove NotImplementedError gate for IPW/DR + covariates + survey. Add DRDID panel nuisance IF corrections (PS + OR) for both survey and non-survey DR paths. Extract _safe_inv helper for matrix inversions. Phase 7d: Thread survey df through HonestDiD for t-distribution critical values. Compute full event-study VCV from influence function vectors. Add event_study_vcov to CallawaySantAnnaResults. Phase 7b: Add panel=False for repeated cross-section support in CallawaySantAnna. New _precompute_structures_rc, _compute_att_gt_rc, and three RC estimation methods (reg, ipw, dr) with covariates and survey weights. Canonical index abstraction in aggregation/bootstrap. RCS data generator in generate_staggered_data(panel=False). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR ReviewOverall assessment
Executive summary
Methodology Cross-check basis:
Code Quality No additional findings beyond the methodology issues above. Performance No findings. Maintainability
Tech Debt No separate findings. The new Security No findings. Documentation/Tests
Path to Approval
Static review only; I was not able to do runtime spot-checks in this environment because the Python runtime here is missing required dependencies such as |
… VCV, panel on results Fix 5 findings from PR #240 CI review: - Add cross-sectional nuisance IF corrections (PS + OR) to _ipw_estimation_rc and _doubly_robust_rc, matching panel path methodology - Use fixed full-sample cohort masses for unweighted RCS aggregation weights (consistency with WIF group-share denominator) - Guard replicate-weight designs from full event-study VCV (diagonal fallback) - Add panel field to CallawaySantAnnaResults, fix summary labels for RCS - Add panel to class docstring, replicate VCV test, RCS IF correction test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR ReviewOverall assessment
Executive Summary
Methodology Affected methods: Callaway-Sant'Anna repeated cross sections (
Code Quality No findings beyond the methodology issues above. Performance No findings. Maintainability No findings. Tech Debt No separate findings. Neither remaining P1 issue is tracked in TODO.md:L48-L60, so Security No findings. Documentation/Tests
Path to Approval
Static review only: I could not execute the test suite here because this environment lacks |
…equal-count tests - Use separate normalizer_t/normalizer_s in _doubly_robust_rc() IF corrections (post vs base period treated denominators must match the DR estimator) - Precompute rcs_cohort_masses in _precompute_structures_rc() and return cohort mass as n_treated from _compute_att_gt_rc() — fixes all downstream consumers (aggregation, bootstrap, balance_e) at the source instead of per-consumer patches - Remove now-unnecessary rcs_cohort_counts blocks from aggregation - Add unequal cohort count test fixture and regression tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: PR ReviewOverall assessment Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ulas
_outcome_regression_rc: Pool all treated obs for OR correction term
(was: separate per-period averages). Period-specific treated means for Y.
Matches Sant'Anna & Zhao (2020) Eq 2.2 / R reg_did_rc exactly.
_doubly_robust_rc: Fit 4 OLS models (control+treated, pre+post) for
locally efficient DR estimator (was: 2 control-only). Implements tau_1
(AIPW) + tau_2 (local efficiency adjustment) with full 11-component IF.
Matches Sant'Anna & Zhao (2020) Eq 3.3+3.4 / R drdid_rc exactly.
Add agg_weight field to group_time_effects for RCS aggregation weight
(cohort mass), separate from n_treated (per-cell display count).
Aggregation uses data.get("agg_weight", data["n_treated"]) for
backward compatibility with panel data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability
Tech Debt No separate findings. The P1 bootstrap-weight issue above is not currently tracked in TODO.md:L48-L60 or TODO.md:L167-L170, so it remains unmitigated. Security No findings. Documentation/Tests
Path to Approval
|
- Bootstrap overall/event-study reaggregation now uses agg_weight (fixed cohort mass) for panel=False, matching the analytical aggregation path - Reset self._event_study_vcov = None at start of fit() to prevent stale VCV from prior fit leaking into reused estimator objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not execute the changed tests in this environment because the default Python interpreter here does not have the project dependencies available. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…dy SEs Prevents HonestDiD from mixing analytical IF-based VCV with bootstrap SEs on bootstrap-fit CallawaySantAnna results. When n_bootstrap>0, the event_study_vcov is set to None so HonestDiD falls back to diagonal from the bootstrap SEs (consistent variance path). Add regression test: bootstrap CS → HonestDiD asserts vcov is None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Static review only: I could not execute the added tests here because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…s sum not mean - _outcome_regression_rc: M1 denominator changed from sum_w_D to n_all (matching R colMeans convention); inf_cont_2 / sum_w_D then gives correct single normalization by mean_w_D * n_all = sum_w_D - _ipw_estimation_rc: PS M2 uses np.sum/n_all instead of np.mean (which divided by n_ct/n_cs instead of n_all, under-scaling the correction) - _doubly_robust_rc: PS M2 already correct (np.sum/n_all), no change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review plus source cross-check only. I could not run Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
|
/ai-review |
…convergence test REGISTRY.md: Document that RCS IFs use phi=psi/n convention (SE = sqrt(sum(phi^2))), algebraically equivalent to R's sd(psi)/sqrt(n). The 1/n_all denominator in gradient terms is the colMeans -> phi conversion, not extra shrinkage. Add test proving correctness: analytical SE within 20% of bootstrap SE (499 iters) for RCS reg with covariates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cfcc441 to
c2f8fdc
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the test suite here because this environment does not have Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Restructure _outcome_regression_rc, _ipw_estimation_rc, _doubly_robust_rc to compute leading IF terms in R's unnormalized psi convention (using mean_w_* = sum_w_*/n_all normalizers matching R's mean()), then convert to library phi = psi/n_all at the boundary. Makes DRDID correspondence explicit with R variable name comments. Fix HonestDiD event_study_vcov subsetting: when filtering NaN-SE event times, subset the VCV matrix to match the surviving rel_times (was using the full unfiltered matrix, causing dimension mismatch on interior drops). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the suite here because this environment is missing Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The Security No findings. Documentation/Tests
Path to Approval
|
…2 gradients Resolve TODO.md merge conflict (keep our resolved items from Phase 7). Panel IPW/DR PS nuisance gradients (M2, M2_dr): change np.mean() over control rows to np.sum() / (n_t + n_c), matching R's colMeans() over ALL n observations (zero for treated). This makes the panel paths consistent with the RC paths' R-convention refactoring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the suite here because this environment is missing Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No separate findings. The Security No findings. Documentation/Tests
Path to Approval
|
…s; fix VCV index subsetting Panel IPW/DR PS corrections: restructure to match R's std_ipw_did_panel / drdid_panel convention: H = X'WX/n, asy_lin_rep = score @ solve(H) / n, M2 = colMeans(). Algebraically equivalent but mirrors R source literally. HonestDiD VCV subsetting: store event_study_vcov_index (the exact event-time ordering matching VCV columns) so subsetting works correctly even when universal base period injects a reference row into event_study_effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the suite here because this environment is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
VCV index: Build event_study_vcov_index from event times that actually contributed psi vectors (skipping NaN-only periods), not from all sorted_periods. Fixes misalignment when interior event times drop out. Stationarity: Document "stationary repeated cross-sections" in panel parameter docstring. Emit UserWarning on panel=False noting the stationarity assumption is not data-checkable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not execute the added tests here because this environment is missing both Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…RDID Panel DR normalizes control augmentation by treated mass (sw_t_sum/n_t) rather than control IPW mass (sum(w_cont)) as in DRDID::drdid_panel. Both are asymptotically equivalent; finite-sample difference documented as intentional deviation in REGISTRY.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not execute the affected tests here because this environment is missing both Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Path to Approval
|
Add UserWarning when CallawaySantAnna results with base_period != "universal" are passed to HonestDiD (R requires universal; we warn). Document as deviation in REGISTRY.md. Update Phase 7d HonestDiD tests to use base_period="universal" for methodologically valid bounds. Fix VCV diagonal test index for universal base reference period row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the changed tests here because this environment is missing Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…me guard Panel IPW/DR M2 gradients: change np.mean(control_slice) to np.sum(control_slice) / n_all_panel, matching R's colMeans over ALL n observations (zero for treated rows). Previous np.mean divided by n_c instead of n_all. HonestDiD: warn when retained CS event-study horizons are not consecutive (interior gaps change smoothness/RM restriction geometry). R requires a consecutive grid. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not execute the Python suite here because the repo dependencies are unavailable in this environment ( Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. The blocking items above are Security No findings. Documentation/Tests
Path to Approval
|
…strap for zero-mass cells HonestDiD: validate pre and post blocks separately for consecutive horizons, allowing the expected reference-period gap between them. Only warns on true interior gaps within a block (e.g., missing -2 in [-3,-1,0,1]). Bootstrap: filter gt_pairs to only those with influence_func_info, skipping zero-mass (g,t) cells that have NaN ATT but no IF record. Prevents KeyError when RCS survey weights zero out a cell. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not execute the changed suites here because Executive Summary
Methodology
Code Quality No unmitigated findings. Performance No findings. Maintainability No findings. Tech Debt No new deferrable item stood out. The remaining Security No findings. Documentation/Tests
Path to Approval
|
…arning) Match R's honest_did.AGGTEobj behavior: refuse to construct bounds when retained event-study horizons have interior gaps within pre or post blocks. The single gap for the omitted reference period is still allowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Static review only. I could not run the changed suites here because Executive Summary
Methodology
Code Quality No unmitigated findings. Performance No findings. Maintainability No unmitigated findings. Tech Debt No new deferrable item stood out. The remaining HonestDiD issue is a methodology blocker and is not TODO-mitigable under the stated review rules. Security No findings. Documentation/Tests
Path to Approval
|
…n-block Check that post_times[0] - pre_times[-1] == 2 (exactly one gap for the omitted reference period). Catches missing e=0 or missing boundary horizons that the within-block check missed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker — highest unmitigated severity: Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No new deferrable item stood out. The remaining issues are inference/methodology defects and are not TODO-mitigable under the review rubric. Security No findings. Documentation/Tests
Path to Approval
|
P0 fix: When survey_metadata has replicate_method but df_survey=None (rank-deficient replicate design), set df_survey=0 sentinel so _get_critical_value returns NaN. Prevents finite HonestDiD bounds when CS inference is NaN. P1 fix: Infer omitted reference period from n_groups=0 entry in event_study_effects (handles anticipation>0 where ref=-1-anticipation). Split pre/post relative to reference, not hardcoded at t<0/t>=0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
Summary
NotImplementedErrorgate for CallawaySantAnna IPW/DR + covariates + survey. Implement DRDID panel nuisance IF corrections (propensity score + outcome regression) for both survey-weighted and non-survey DR paths (Sant'Anna & Zhao 2020, Theorem 3.1). Extract_safe_inv()helper for matrix inversions.event_study_vcovfield toCallawaySantAnnaResultsandsurvey_metadata/df_surveytoHonestDiDResults.panel=Falsefor repeated cross-section support in CallawaySantAnna. New_precompute_structures_rc(),_compute_att_gt_rc(), and three RC estimation methods (_outcome_regression_rc,_ipw_estimation_rc,_doubly_robust_rc) with covariates and survey weights. Canonical index abstraction in aggregation/bootstrap mixins. RCS data generator viagenerate_staggered_data(panel=False).Methodology references
Validation
tests/test_survey_phase7a.py(22 tests): smoke, scale invariance, uniform-weight equivalence, IF correction, aggregation, bootstrap, edge casestests/test_staggered_rc.py(23 tests): all methods, covariates, survey, aggregation, bootstrap, control groups, base periods, data generator, edge casestests/test_honest_did.py(+4 tests): survey df extraction, VCV computation, bounds widening, no-survey baselinetests/test_survey_phase4.py: 2 negative tests converted to positive assertionsSecurity / privacy
Generated with Claude Code