Update survey documentation: compatibility matrix, roadmap, deferred work#263
Update survey documentation: compatibility matrix, roadmap, deferred work#263
Conversation
…work Add survey compatibility matrix to choosing_estimator.rst (Phase 8f), fix 11 stale entries in the tutorial table and replace with cross-reference, mark Phase 8a-8e as shipped in survey-roadmap.md, consolidate all remaining NotImplementedError paths into a single deferred work section, add SDR to replicate method lists, and update ROADMAP.md version/status entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Overall Assessment✅ Looks good No unmitigated P0/P1 findings. This PR is documentation-only, and the estimator implementations I spot-checked remain consistent with the methodology registry. I did find a few P2 documentation inconsistencies that are worth fixing if these docs are meant to be the canonical survey-support reference. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…imators - Revert efficient DRDID nuisance IF for reg+covariates to deferred status (code and REGISTRY.md still use conservative plug-in IF) - Update phase summary table Notes to reflect resolved bootstrap+survey paths (SA, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, CS) - Add SunAbraham, ImputationDiD, TwoStageDiD to bootstrap+replicate mutual exclusion table in consolidated deferred section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. This re-review resolves the prior methodology/documentation mismatch on Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Some bootstrap+replicate exclusions raise ValueError (not NotImplementedError). Update wording to "raise an error" to accurately reflect the runtime contract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The prior re-review finding on exception-type wording is resolved; the remaining issues are documentation consistency/accuracy only. Executive Summary
Methodology
Code Quality No findings. Documentation-only PR. Performance No findings. No runtime path changed. Maintainability No findings in executable code. Tech Debt
Security No findings. I did not see secrets, unsafe executable content, or other security-relevant changes. Documentation/Tests
|
…ording - Update REGISTRY.md replicate-weight support matrix: CS now supports covariates with replicate weights (IF-based path is covariate-agnostic, shipped in Phase 7a) - Qualify ROADMAP.md: "replicate weights supported for 12 of 15" instead of "across all estimators" - Soften consolidated deferred section from "single source of truth" to "summary of major remaining limitations" with TODO.md cross-reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The prior roadmap-coverage issue is fixed, and the prior CallawaySantAnna support-boundary mismatch is largely resolved. One P2 documentation inconsistency remains in the Methodology Registry, plus one minor stale runtime-message mismatch. Executive Summary
Methodology
No other methodology findings. This PR is documentation-only; no estimator implementation or variance logic changed. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings beyond the documentation drift noted below. Security No findings. Documentation/Tests
|
- Replace LinearRegression (internal helper) with StaggeredTripleDifference (public estimator) in REGISTRY.md replicate-weight support matrix - Update wild bootstrap + survey error message to remove stale "planned Phase 5 support" reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The prior registry-support mismatch is fixed. Remaining issues are limited to documentation consistency and one user-facing runtime-message mismatch. Executive Summary
Methodology No findings. Cross-checking the changed registry note against the current compatibility matrix shows the previous public-contract mismatch is resolved: docs/methodology/REGISTRY.md:L2310-L2325, docs/choosing_estimator.rst:L596-L670. Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
|
… msg - ROADMAP.md: restructure to say "survey-aware inference across all 15 estimators; replicate weights supported for 12 of 15" - survey-roadmap.md Phase 6: add SDR to replicate method list - survey.py: make wild bootstrap error message generic (not all estimators expose n_bootstrap) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. This PR does not change estimator math, SE/VCOV computation, weighting, assumptions, or defaults, and the updated replicate-weight documentation is now aligned with the methodology registry (REGISTRY:L2310-L2325, choosing_estimator.rst:L596-L670). Code Quality No findings. The changed runtime message in survey.py:L1087-L1090 is consistent with the current API and removes the stale “planned Phase 5” wording. Performance No findings. Maintainability No findings. Tech Debt No findings. The new cross-reference in Security No findings. Documentation/Tests
|
…otstrap - ROADMAP.md: distinguish survey weights (all 15) from design-based variance (varies by estimator), carve out BaconDecomposition - survey-roadmap.md: header says "Phases 1-8f implemented" (8g partial) instead of "All phases implemented" - Deferred work: scope wild bootstrap row to DiD/TWFE/MultiPeriod (the estimators that expose inference='wild_bootstrap') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated issue is P2 documentation accuracy. I did not find any P0/P1 methodology, inference, or correctness defects in the changed code. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…erred - ROADMAP.md line 11: remove "Taylor linearization, replicate weights integrated across all estimators" — now says "all estimators accept survey weights, with design-based variance varying by estimator" - survey-roadmap.md: move Phase 8g documentation tasks into their own section outside the consolidated runtime-limitations block Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. This re-review resolves the prior documentation-overstatement issues. The only remaining issue I found is a P3 documentation wording mismatch in the consolidated survey deferred-work intro. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Move CallawaySantAnna conservative plug-in IF entry into its own "Documented Deviations" subsection (supported path, not an error). Runtime limitations intro now accurately describes only error-raising items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. This re-review resolves the prior deferred-work wording issue. I found one remaining P3 documentation mismatch in the consolidated survey limitations list. Executive Summary
Methodology No findings. The updated survey-support notes remain aligned with the Methodology Registry, including the documented Callaway-Sant’Anna survey deviation and the expanded replicate-weight matrix. docs/methodology/REGISTRY.md:L434 docs/methodology/REGISTRY.md:L2310 Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The new Security No findings. Documentation/Tests
|
Split wild bootstrap row: DiD/TWFE raise NotImplementedError, MultiPeriodDiD warns and falls back to analytical inference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Highest unmitigated severity: P3. Executive Summary
Methodology No findings. I cross-checked the changed survey-support claims in docs/choosing_estimator.rst:583, docs/methodology/REGISTRY.md:2310, diff_diff/staggered.py:1431, diff_diff/imputation.py:213, diff_diff/two_stage.py:209, and diff_diff/survey.py:1087; the updated docs are consistent on the materially changed claims. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. The new cross-reference in TODO.md:18 appropriately points readers to the consolidated deferred-work section in docs/survey-roadmap.md:275. Security No findings. Documentation/Tests
|
MultiPeriodDiD wild bootstrap warns and falls back rather than raising. Move it into its own "Warning/Fallback Behaviors" subsection outside the runtime-error block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: P3. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Tutorial Section 9 now links to the compatibility matrix rather than containing the table itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Highest unmitigated severity: P3. Executive Summary
Methodology No findings. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
|
Summary
choosing_estimator.rst(Phase 8f) with 15-estimator × 4-feature table, legend, and notessurvey-roadmap.mdwith version numbersNotImplementedErrorpaths into a single "Deferred Work" section in survey-roadmap.mdMethodology references (required if estimator / math changes)
Validation
pytest tests/test_doc_snippets.pypasses — RST syntax validatedNotImplementedErrorpaths indiff_diff/*.pyverified against consolidated deferred listSecurity / privacy
Generated with Claude Code