Update survey documentation: compatibility matrix, roadmap, deferred work by igerber · Pull Request #263 · igerber/diff-diff

igerber · 2026-04-04T13:09:12Z

Summary

Add canonical survey compatibility matrix to choosing_estimator.rst (Phase 8f) with 15-estimator × 4-feature table, legend, and notes
Replace stale tutorial table (11 of 15 rows had errors) with cross-reference to the docs-hosted matrix
Mark Phase 8a–8e as shipped in survey-roadmap.md with version numbers
Consolidate all remaining NotImplementedError paths into a single "Deferred Work" section in survey-roadmap.md
Update replicate weight coverage from 4/15 → 12/15 estimators (v2.8.3 expansion) and add SDR to replicate method lists
Fix stale ROADMAP.md entries (version v2.7.5→v2.8.4, tutorial Open→Implemented, Staggered DDD marked Implemented)
Add cross-reference from TODO.md to consolidated deferred list

Methodology references (required if estimator / math changes)

N/A — documentation-only changes, no methodology or code changes

Validation

Tests added/updated: No test changes (documentation only)
pytest tests/test_doc_snippets.py passes — RST syntax validated
Notebook JSON validated (37 cells, all assertions pass)
All NotImplementedError paths in diff_diff/*.py verified against consolidated deferred list

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…work Add survey compatibility matrix to choosing_estimator.rst (Phase 8f), fix 11 stale entries in the tutorial table and replace with cross-reference, mark Phase 8a-8e as shipped in survey-roadmap.md, consolidate all remaining NotImplementedError paths into a single deferred work section, add SDR to replicate method lists, and update ROADMAP.md version/status entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-04T13:15:33Z

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This PR is documentation-only, and the estimator implementations I spot-checked remain consistent with the methodology registry. I did find a few P2 documentation inconsistencies that are worth fixing if these docs are meant to be the canonical survey-support reference.

Executive Summary

No estimator, weighting, variance, or inference code changed, so I found no silent-correctness or methodology regressions in implementation.
docs/survey-roadmap.md now says the Callaway-Sant'Anna survey reg+covariates efficient DRDID nuisance correction is resolved, but the registry and code still document/use a conservative plug-in IF for that path.
The same roadmap is internally inconsistent: older phase tables still say several survey-bootstrap paths are deferred and still describe CS survey IPW/DR as no-covariate-only, while later rows mark those items resolved.
The new consolidated deferred-work section is not actually exhaustive; it omits the still-unsupported replicate-weight + bootstrap combinations in SunAbraham, ImputationDiD, and TwoStageDiD.

Methodology

Severity: P2. Impact: Affected method is CallawaySantAnna survey estimation_method="reg" with covariates. The roadmap now marks “Efficient DRDID nuisance IF for reg+covariates” as resolved in docs/survey-roadmap.md:L64, but the canonical registry still says this path uses a conservative plug-in IF and that the efficient correction is deferred in docs/methodology/REGISTRY.md:L437. The code also still computes residual-based plug-in IF terms in diff_diff/staggered.py:L2094. Because the deviation is still documented in REGISTRY.md, I’m treating this as a documentation-accuracy issue, not a P1 methodology defect. Concrete fix: change that roadmap row back to deferred/documented-deviation status and link to the registry note instead of calling it shipped.

Code Quality

No findings. The PR does not change executable code.

Performance

No findings. No runtime path changed.

Maintainability

Severity: P2. Impact: docs/survey-roadmap.md is now internally contradictory. Early summary tables still say SunAbraham/ContinuousDiD/EfficientDiD bootstrap+survey is deferred and ImputationDiD/TwoStageDiD bootstrap+survey is deferred, and still describe Callaway-Sant'Anna survey IPW/DR as no-covariate-only in docs/survey-roadmap.md:L22, docs/survey-roadmap.md:L47, and docs/survey-roadmap.md:L49, while later sections mark those same capabilities resolved in docs/survey-roadmap.md:L35, docs/survey-roadmap.md:L59, and docs/survey-roadmap.md:L63. Concrete fix: update or remove the stale phase-summary rows so each capability has exactly one status in the roadmap.

Tech Debt

No findings. The TODO.md cross-reference is reasonable and does not hide a correctness issue.

Security

No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

Severity: P2. Impact: The new “Deferred Work (Consolidated)” section claims to be the “single source of truth” for remaining survey limitations in docs/survey-roadmap.md:L276, but it omits the still-unsupported n_bootstrap > 0 + replicate-weight combinations enforced in diff_diff/sun_abraham.py:L519, diff_diff/imputation.py:L272, and diff_diff/two_stage.py:L251. Users relying on the docs will still hit unsupported combinations that are not listed. Concrete fix: add SunAbraham, ImputationDiD, and TwoStageDiD to the mutual-exclusion table in docs/survey-roadmap.md:L297, or soften the “single source of truth” wording; if the section is meant to describe exception behavior precisely, note that these three currently raise ValueError, not NotImplementedError.

…imators - Revert efficient DRDID nuisance IF for reg+covariates to deferred status (code and REGISTRY.md still use conservative plug-in IF) - Update phase summary table Notes to reflect resolved bootstrap+survey paths (SA, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, CS) - Add SunAbraham, ImputationDiD, TwoStageDiD to bootstrap+replicate mutual exclusion table in consolidated deferred section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T13:28:22Z

/ai-review

github-actions · 2026-04-04T13:33:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8677e76fb856179cbed5a5056fe938c3ac3205c6

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior methodology/documentation mismatch on CallawaySantAnna; one P2 documentation-accuracy issue remains in the new consolidated deferred-work section.

Executive Summary

P2: the new “single source of truth” section says all listed limitations raise NotImplementedError, but several listed bootstrap+replicate exclusions actually raise ValueError in code, so the docs still misstate the runtime contract.
The previous CallawaySantAnna survey reg+covariates roadmap mismatch is fixed: the roadmap now correctly defers the efficient nuisance IF and matches the documented registry deviation.
The previous omission of missing bootstrap+replicate exclusions is largely fixed: the consolidated deferred-work section now includes the previously missing estimators.
No estimator, weighting, variance, inference, performance, or security code changed, and I found no new methodology regressions in implementation.

Methodology

No findings. The changed roadmap entry in docs/survey-roadmap.md:L61-L64 now matches the documented deviation in docs/methodology/REGISTRY.md:L434-L437.

Code Quality

No findings. This PR does not change executable code.

Performance

No findings. No runtime path changed.

Maintainability

No findings.

Tech Debt

No findings. The new TODO.md cross-reference is reasonable and does not mask a correctness issue.

Security

No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

Severity: P2. Impact: docs/survey-roadmap.md:L276-L314 says the remaining survey limitations “raise NotImplementedError when attempted,” and the bootstrap+replicate subsection states the listed combinations raise NotImplementedError. That is not true for at least diff_diff/sun_abraham.py:L518-L523, diff_diff/imputation.py:L269-L275, and diff_diff/two_stage.py:L249-L254, which raise ValueError for the documented n_bootstrap > 0 + replicate-weight exclusion. Because this section is now presented as the canonical deferred-work reference, readers or tests relying on exception type will still be misled. Concrete fix: change the wording to “raise an error” or list the exact exception class per estimator; alternatively, standardize those guards to NotImplementedError if that is the intended public contract.

Some bootstrap+replicate exclusions raise ValueError (not NotImplementedError). Update wording to "raise an error" to accurately reflect the runtime contract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T13:43:33Z

/ai-review

github-actions · 2026-04-04T13:49:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 38f087baeaa65679ca10eadd45315dd10c50890a

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior re-review finding on exception-type wording is resolved; the remaining issues are documentation consistency/accuracy only.

Executive Summary

The prior P2 is fixed: the new consolidated deferred-work section now correctly says unsupported survey combinations may raise either NotImplementedError or ValueError, which matches live code in docs/survey-roadmap.md:276, diff_diff/sun_abraham.py:519, diff_diff/imputation.py:272, and diff_diff/two_stage.py:250.
P2: the new survey-support docs now present broader CallawaySantAnna replicate-weight support than the Methodology Registry still records, so the public docs and the methodology source-of-truth disagree on the supported contract for that method.
P2: ROADMAP.md:19 still says replicate-weight survey support is integrated “across all estimators,” but the new matrix/deferred-work docs in this PR explicitly say replicate weights are supported for only 12 of 15 estimators.
P3: the new “single source of truth” wording for remaining survey limitations is too strong; at least one live survey restriction (absorb with multiple variables plus survey weights) still lives only in TODO.md/code, not in the consolidated list.
No estimator code, variance math, weighting logic, or inference code changed in this PR.

Methodology

Severity: P2. Impact: the new docs imply full/canonical CallawaySantAnna replicate-weight survey support in docs/choosing_estimator.rst:611 and docs/survey-roadmap.md:49, but the Methodology Registry’s replicate-weight support matrix still narrows CallawaySantAnna to replicate support “without covariates, no bootstrap” at docs/methodology/REGISTRY.md:2310. That leaves the canonical public docs and the registry out of sync on a methodology-adjacent support boundary. Concrete fix: either update the registry replicate-support matrix to the intended current contract, or qualify the new public docs to the narrower support. If covariate+replicate support is intended, add a targeted coverage test; current replicate tests only pin no-covariate IPW/DR coverage at tests/test_survey_phase6.py:1525.

Code Quality

No findings. Documentation-only PR.

Performance

No findings. No runtime path changed.

Maintainability

No findings in executable code.

Tech Debt

Severity: P3. Impact: the new consolidated section says it is “the single source of truth for remaining survey limitations” at docs/survey-roadmap.md:276, and TODO.md now points readers there at TODO.md:18, but the live survey limitation for multiple absorbed fixed effects with survey weights is still tracked separately in TODO.md:62 and enforced in code at diff_diff/estimators.py:264 and diff_diff/estimators.py:1102. Because that limitation is already tracked, this is informational only, but the new wording overstates completeness. Concrete fix: either add this restriction to the consolidated list or soften the wording to “summary of major remaining survey limitations.”

Security

No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

Severity: P2. Impact: ROADMAP.md:19 now says survey support with replicate weights is integrated “across all estimators,” but the compatibility matrix and deferred-work section added/updated in this PR say replicate weights are supported for only 12 of 15 estimators and are rejected for SyntheticDiD, TROP, and BaconDecomposition at docs/choosing_estimator.rst:596, docs/survey-roadmap.md:103, and docs/survey-roadmap.md:281. The top-level roadmap now contradicts the more precise canonical docs in the same PR. Concrete fix: reword the roadmap bullet to say survey support exists across all estimators, with replicate weights supported for 12 of 15 estimators, and link readers to the compatibility matrix for details.

…ording - Update REGISTRY.md replicate-weight support matrix: CS now supports covariates with replicate weights (IF-based path is covariate-agnostic, shipped in Phase 7a) - Qualify ROADMAP.md: "replicate weights supported for 12 of 15" instead of "across all estimators" - Soften consolidated deferred section from "single source of truth" to "summary of major remaining limitations" with TODO.md cross-reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T13:55:37Z

/ai-review

github-actions · 2026-04-04T14:00:43Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7180d94d3c8748a91218affb426b4beef110bde4

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior roadmap-coverage issue is fixed, and the prior CallawaySantAnna support-boundary mismatch is largely resolved. One P2 documentation inconsistency remains in the Methodology Registry, plus one minor stale runtime-message mismatch.

Executive Summary

The prior ROADMAP.md overstatement on replicate-weight coverage is resolved; the roadmap now points readers to the 12-of-15 compatibility matrix.
The prior CallawaySantAnna registry/public-doc mismatch is mostly resolved; REGISTRY.md now reflects survey support for covariates in reg/ipw/dr.
P2: the updated replicate-support note in docs/methodology/REGISTRY.md still misstates the actual 12-estimator set by counting LinearRegression and omitting StaggeredTripleDifference.
P3: the new deferred-work docs are current, but the live wild-bootstrap-with-survey error message still points users to future “planned Phase 5 support.”
No estimator code, variance math, weighting logic, or inference paths changed in this PR.

Methodology

Severity: P2. Impact: docs/methodology/REGISTRY.md:L2310-L2324 says “12 of 15 estimators” support replicate weights, but the enumerated supported set includes LinearRegression and omits StaggeredTripleDifference. That conflicts with the public compatibility matrix in docs/choosing_estimator.rst:L591-L650, the roadmap coverage list in docs/survey-roadmap.md:L103-L109, and the existing staggered DDD replicate-support tests in tests/test_survey_staggered_ddd.py:L376-L445. This leaves the Methodology Registry internally inconsistent about the actual supported estimator contract. Concrete fix: replace LinearRegression (OLS path) with StaggeredTripleDifference, or rewrite the note to explicitly distinguish helper-level implementation paths from public-estimator coverage.

No other methodology findings. This PR is documentation-only; no estimator implementation or variance logic changed.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings beyond the documentation drift noted below.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the new consolidated deferred-work section in docs/survey-roadmap.md:L274-L325 presents remaining survey limitations as current deferred items, but the live wild-bootstrap guard in diff_diff/survey.py:L1087-L1091 still tells users to see the roadmap for “planned Phase 5 support.” This is minor, but it leaves one user-facing error path out of sync with the updated roadmap status. Concrete fix: update that message to say wild bootstrap + survey weights is currently unsupported and point to the consolidated deferred-work section instead.

- Replace LinearRegression (internal helper) with StaggeredTripleDifference (public estimator) in REGISTRY.md replicate-weight support matrix - Update wild bootstrap + survey error message to remove stale "planned Phase 5 support" reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:02:56Z

/ai-review

github-actions · 2026-04-04T14:08:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1bb37b17e8115e63b3f87755ef2c21774ea7a53d

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior registry-support mismatch is fixed. Remaining issues are limited to documentation consistency and one user-facing runtime-message mismatch.

Executive Summary

The prior P2 is resolved: the replicate-weight note in docs/methodology/REGISTRY.md:L2310-L2325 now matches the 12-of-15 public support boundary in docs/choosing_estimator.rst:L596-L670.
The prior stale “planned Phase 5 support” message is gone, but the replacement text in diff_diff/survey.py:L1087-L1091 now overgeneralizes by recommending n_bootstrap > 0 even on estimators that do not expose that API.
Severity P2: the survey docs still disagree on replicate-weight support. ROADMAP.md:L11-L19 says replicate-weight inference is integrated across all estimators, while docs/survey-roadmap.md:L93-L96 still omits SDR.
No estimator implementation, identification assumptions, weighting logic, variance/SE math, or defaults changed in this PR.
No P0/P1 findings identified.

Methodology

No findings. Cross-checking the changed registry note against the current compatibility matrix shows the previous public-contract mismatch is resolved: docs/methodology/REGISTRY.md:L2310-L2325, docs/choosing_estimator.rst:L596-L670.

Code Quality

Severity: P3. Impact: The shared survey guard in diff_diff/survey.py:L1087-L1091 tells users to “use survey-aware multiplier bootstrap (n_bootstrap > 0)”, but that helper is invoked from estimators that only expose inference='wild_bootstrap' and have no n_bootstrap parameter, including diff_diff/estimators.py:L237-L242 and diff_diff/twfe.py:L124-L129. That makes the new runtime guidance inaccurate for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects; the compatibility matrix also marks those rows as having no survey bootstrap path in docs/choosing_estimator.rst:L596-L610. Concrete fix: make the shared message generic (“use analytical survey inference; survey-aware bootstrap is estimator-specific”) or move estimator-specific bootstrap guidance into callers that actually support n_bootstrap.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: The survey-support docs are still internally inconsistent about replicate-weight support. ROADMAP.md:L11-L19 says design-based variance estimation, including replicate weights, is integrated across all estimators, and docs/survey-roadmap.md:L93-L96 still lists only BRR/Fay/JK1/JKn, while docs/survey-roadmap.md:L214-L219, docs/choosing_estimator.rst:L596-L670, and docs/methodology/REGISTRY.md:L2263-L2270 all reflect SDR support and the 12-of-15 boundary. Concrete fix: rephrase the roadmap summary to distinguish “survey-aware inference across all estimators” from “replicate weights on 12 of 15,” and update the replicate-weight variance section to list SDR explicitly.

… msg - ROADMAP.md: restructure to say "survey-aware inference across all 15 estimators; replicate weights supported for 12 of 15" - survey-roadmap.md Phase 6: add SDR to replicate method list - survey.py: make wild bootstrap error message generic (not all estimators expose n_bootstrap) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:11:10Z

/ai-review

github-actions · 2026-04-04T14:15:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 51bff5551cab2f38dc9c0e69ba0a8a2760ca79e3

Overall Assessment

✅ Looks good

Executive Summary

The prior replicate-weight documentation mismatch is resolved: the registry, survey roadmap, estimator matrix, and tutorial now all agree on SDR support and the 12-of-15 replicate-weight boundary (REGISTRY:L2310-L2325, survey-roadmap.md:L93-L109, choosing_estimator.rst:L596-L670, 16_survey_did.ipynb:L532-L541).
No estimator implementation, weighting logic, identification assumptions, variance formulas, or defaults changed in this PR. The only code change is the survey wild-bootstrap guard message, and that prior guidance issue is fixed (survey.py:L1087-L1090).
Remaining issues are documentation-only: ROADMAP.md still overclaims survey inference coverage, docs/survey-roadmap.md now contradicts itself about whether Phase 8 is fully done, and the new wild-bootstrap deferred-work row still points users toward a bootstrap fallback that the compatibility matrix says does not exist for DID/TWFE/MultiPeriod.

Methodology

No findings. This PR does not change estimator math, SE/VCOV computation, weighting, assumptions, or defaults, and the updated replicate-weight documentation is now aligned with the methodology registry (REGISTRY:L2310-L2325, choosing_estimator.rst:L596-L670).

Code Quality

No findings. The changed runtime message in survey.py:L1087-L1090 is consistent with the current API and removes the stale “planned Phase 5” wording.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new cross-reference in TODO.md is appropriate tracking, not a blocker.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: ROADMAP.md still overstates survey methodology coverage. It says Taylor linearization and replicate weights are integrated across all estimators and that survey-aware inference exists across all 15 estimators, but the canonical matrix shows SyntheticDiD and TROP are bootstrap-only for full-design surveys and BaconDecomposition is diagnostic-only; the Bacon implementation also explicitly says no survey vcov is computed and replicate-weight designs are rejected (ROADMAP.md:L11-L19, choosing_estimator.rst:L596-L670, bacon.py:L438-L470). Concrete fix: reword the roadmap summary to say survey support exists across all 15 estimators, with TSL/replicate/bootstrap support varying by estimator, and explicitly carve out BaconDecomposition as diagnostic-only.
Severity: P2. Impact: docs/survey-roadmap.md now opens with “All phases (1-8) are implemented,” but later in the same file marks Phase 8g as only “Partially addressed” and keeps those items in the deferred section, so the roadmap contradicts itself (survey-roadmap.md:L1-L4, survey-roadmap.md:L264-L330). Concrete fix: change the intro to something like “Phases 1-8f are implemented; remaining Phase 8g documentation items are deferred below.”
Severity: P3. Impact: The new deferred-work row for “Wild bootstrap + survey weights” says “(all estimators)” should use analytical SEs or “survey-aware multiplier bootstrap instead,” but the compatibility matrix shows DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD have no survey-bootstrap path at all (--), and the shared guard now correctly tells users to fall back to analytical survey inference (survey-roadmap.md:L303-L325, choosing_estimator.rst:L596-L610, survey.py:L1087-L1090). Concrete fix: scope that row to the estimators that actually expose inference='wild_bootstrap', and state that their supported fallback is analytical survey inference; if users want bootstrap + survey, point them to estimators whose matrix row lists a survey-bootstrap method.

…otstrap - ROADMAP.md: distinguish survey weights (all 15) from design-based variance (varies by estimator), carve out BaconDecomposition - survey-roadmap.md: header says "Phases 1-8f implemented" (8g partial) instead of "All phases implemented" - Deferred work: scope wild bootstrap row to DiD/TWFE/MultiPeriod (the estimators that expose inference='wild_bootstrap') Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:17:09Z

/ai-review

github-actions · 2026-04-04T14:25:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 815e0e44adae7523eb25b3b00d1a62d11279457c

Overall Assessment

✅ Looks good

Highest unmitigated issue is P2 documentation accuracy. I did not find any P0/P1 methodology, inference, or correctness defects in the changed code.

Executive Summary

Re-review status: 2 of the 3 prior findings are resolved. The survey-roadmap intro no longer contradicts its deferred section, and the unsupported wild_bootstrap + survey_design guidance is now correctly scoped to the base estimators that actually use that path (docs/survey-roadmap.md:L4, docs/survey-roadmap.md:L326, diff_diff/survey.py:L1087, diff_diff/estimators.py:L240, diff_diff/estimators.py:L1079).
No estimator math, weighting, identification assumptions, variance formulas, or defaults changed in this PR. The methodology registry remains aligned with the documented survey deviations and the current replicate-weight support matrix (docs/methodology/REGISTRY.md:L434, docs/methodology/REGISTRY.md:L437, docs/methodology/REGISTRY.md:L2310).
The new canonical compatibility matrix, registry entry, and tutorial now agree on SDR support and 12-of-15 replicate-weight coverage (docs/choosing_estimator.rst:L580, docs/methodology/REGISTRY.md:L2310, docs/tutorials/16_survey_did.ipynb:L532, docs/tutorials/16_survey_did.ipynb:L540).
[Previously raised, still open] ROADMAP.md still opens with a sentence that implies Taylor linearization and replicate weights are integrated across all estimators, which conflicts with the new matrix and registry exceptions (ROADMAP.md:L11, ROADMAP.md:L19, docs/choosing_estimator.rst:L656, docs/choosing_estimator.rst:L661, docs/choosing_estimator.rst:L666, docs/methodology/REGISTRY.md:L2310).
Minor new doc issue: the consolidated deferred-work section says all items below are runtime errors, but it ends with documentation-only tasks (docs/survey-roadmap.md:L277, docs/survey-roadmap.md:L328).

Methodology

No findings. This PR does not change estimator implementations or inference math, and the only methodology-adjacent updates are documentation changes that remain consistent with the registry, including the documented CallawaySantAnna survey reg+covariates deviation from R (docs/methodology/REGISTRY.md:L437, docs/methodology/REGISTRY.md:L2310).

Code Quality

No findings. The only runtime change in _resolve_survey_for_fit() is an error-message update, and it matches the actual base-estimator control flow for survey wild bootstrap rejection (diff_diff/survey.py:L1087, diff_diff/estimators.py:L240, diff_diff/estimators.py:L1079).

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new cross-reference from TODO.md to the consolidated survey deferred-work list is appropriate tracking, not a mitigation attempt for a correctness issue (TODO.md:L18, docs/survey-roadmap.md:L275).

Security

No findings.

Documentation/Tests

Severity: P2. Impact: [Previously raised, still open] ROADMAP.md still overstates survey inference coverage. The opening sentence says Taylor linearization and replicate weights are integrated across all estimators, but the compatibility matrix and registry now say support varies by estimator, with replicate weights available for only 12 of 15 public estimators and SyntheticDiD, TROP, and BaconDecomposition excluded. Concrete fix: rewrite the opening sentence to say survey support is library-wide, while TSL / replicate / survey-bootstrap support varies by estimator, and point directly to the compatibility matrix (ROADMAP.md:L11, ROADMAP.md:L19, docs/choosing_estimator.rst:L656, docs/choosing_estimator.rst:L661, docs/choosing_estimator.rst:L666, docs/methodology/REGISTRY.md:L2310).
Severity: P3. Impact: the new “Deferred Work (Consolidated)” intro overpromises runtime-error semantics. It says “All items below raise an error when attempted,” but the final “Documentation-Only (Phase 8g)” subsection is not a runtime limitation. Concrete fix: move the Phase 8g bullets outside the consolidated runtime-limitation section, or change the intro to distinguish runtime limitations from documentation tasks (docs/survey-roadmap.md:L277, docs/survey-roadmap.md:L328).

…erred - ROADMAP.md line 11: remove "Taylor linearization, replicate weights integrated across all estimators" — now says "all estimators accept survey weights, with design-based variance varying by estimator" - survey-roadmap.md: move Phase 8g documentation tasks into their own section outside the consolidated runtime-limitations block Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:27:33Z

/ai-review

github-actions · 2026-04-04T14:34:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1b174a63ebbdf14c00633c0842e8841800b56d56

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior documentation-overstatement issues. The only remaining issue I found is a P3 documentation wording mismatch in the consolidated survey deferred-work intro.

Executive Summary

Re-review status: the prior documentation findings are resolved. ROADMAP.md now says survey variance support varies by estimator, and docs/survey-roadmap.md now separates remaining documentation tasks from runtime limitations (ROADMAP.md:L11-L19, docs/survey-roadmap.md:L1-L5, docs/survey-roadmap.md:L330-L335).
No estimator math, weighting, identification assumptions, variance formulas, or defaults changed in this PR.
The new canonical survey matrix, registry note, and tutorial are consistent on SDR support and 12-of-15 replicate-weight coverage (docs/choosing_estimator.rst:L580-L690, docs/methodology/REGISTRY.md:L2310-L2325, docs/tutorials/16_survey_did.ipynb:L532-L541).
The only runtime code change is the wild_bootstrap + survey error message, and it matches the documented limitation for the base estimators (diff_diff/survey.py:L1087-L1090, docs/survey-roadmap.md:L323-L326).
Severity: P3. Impact: the consolidated deferred-work intro still says every listed item raises an error, but the listed Callaway-Sant’Anna reg+covariates item is a documented conservative-IF deviation, not a rejection path. Concrete fix: reword the intro or move that row into a separate documented-deviations subsection (docs/survey-roadmap.md:L277-L295, diff_diff/staggered.py:L1431-L1437, docs/methodology/REGISTRY.md:L437-L438).

Methodology

No findings. No estimator or inference logic changed, and the updated survey-support claims remain aligned with the Methodology Registry, including the documented Callaway-Sant’Anna survey deviations and the replicate-weight support matrix (docs/methodology/REGISTRY.md:L434-L438, docs/methodology/REGISTRY.md:L2310-L2325, docs/choosing_estimator.rst:L596-L690).

Code Quality

No findings. The lone code change is an error-message simplification in the shared survey resolver, and it is consistent with the documented wild_bootstrap + survey_design rejection for DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD (diff_diff/survey.py:L1087-L1090, docs/survey-roadmap.md:L323-L326).

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new TODO.md cross-reference is appropriate tracking and does not attempt to TODO-away a correctness issue (TODO.md:L18-L20, docs/survey-roadmap.md:L275-L326).

Security

No findings.

Documentation/Tests

Severity: P3. Impact: docs/survey-roadmap.md overstates the runtime semantics of the consolidated deferred-work section. The intro says all listed items raise NotImplementedError/ValueError, but the Callaway-Sant’Anna reg+covariates row is explicitly a supported path with a documented conservative SE deviation, not an error condition. That can mislead readers into thinking the feature is unavailable (docs/survey-roadmap.md:L277-L295, diff_diff/staggered.py:L1431-L1437, docs/methodology/REGISTRY.md:L437-L438).
Severity: P3. Concrete fix: either change the intro to say the section includes both runtime rejections and documented survey limitations, or move the Callaway-Sant’Anna row into its own “Documented deviations” subsection that points directly to REGISTRY.md (docs/survey-roadmap.md:L277-L295, docs/methodology/REGISTRY.md:L437-L438).

Move CallawaySantAnna conservative plug-in IF entry into its own "Documented Deviations" subsection (supported path, not an error). Runtime limitations intro now accurately describes only error-raising items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:37:41Z

/ai-review

github-actions · 2026-04-04T14:43:09Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 92df37c1c1234227a2f0b39002bbe6592ae7e1ac

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior deferred-work wording issue. I found one remaining P3 documentation mismatch in the consolidated survey limitations list.

Executive Summary

Re-review status: the prior documentation issue is resolved. docs/survey-roadmap.md now separates supported documented deviations from actual runtime limitations in distinct sections. docs/survey-roadmap.md:L277 docs/survey-roadmap.md:L286
No estimator math, weighting formulas, identification assumptions, variance/SE formulas, or defaults changed in this PR. The only runtime code delta is a shortened survey+wild-bootstrap error message. diff_diff/survey.py:L1087
The new SDR and replicate-weight coverage claims are consistent with the implementation and registry: SurveyDesign accepts SDR, the variance factor is 4/R, and the registry now documents 12-of-15 estimator coverage. diff_diff/survey.py:L86 diff_diff/survey.py:L1510 docs/methodology/REGISTRY.md:L2310
Severity: P3. docs/survey-roadmap.md says all listed runtime limitations raise an error, but MultiPeriodDiD with inference='wild_bootstrap' warns and falls back to analytical inference instead of rejecting. docs/survey-roadmap.md:L288 docs/survey-roadmap.md:L330 diff_diff/estimators.py:L923 diff_diff/estimators.py:L1227

Methodology

No findings. The updated survey-support notes remain aligned with the Methodology Registry, including the documented Callaway-Sant’Anna survey deviation and the expanded replicate-weight matrix. docs/methodology/REGISTRY.md:L434 docs/methodology/REGISTRY.md:L2310

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new TODO.md cross-reference is appropriate tracking and does not attempt to TODO-away a correctness issue. TODO.md:L18

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the consolidated runtime-limitations section overstates one runtime behavior. It says all listed items raise NotImplementedError/ValueError, but the MultiPeriodDiD case listed there does not error; fit() warns and uses analytical inference when inference='wild_bootstrap'. That can mislead readers about actual API behavior. Concrete fix: split that row so DifferenceInDifferences and TwoWayFixedEffects remain under runtime rejections, and either remove MultiPeriodDiD from the error list or reword it as a warning/fallback limitation. docs/survey-roadmap.md:L288 docs/survey-roadmap.md:L330 diff_diff/estimators.py:L923 diff_diff/estimators.py:L1227
No testing blocker identified from the diff itself. I did not rerun the notebook/RST validation commands cited in the PR body.

Split wild bootstrap row: DiD/TWFE raise NotImplementedError, MultiPeriodDiD warns and falls back to analytical inference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:46:50Z

/ai-review

github-actions · 2026-04-04T14:57:28Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f54dea835ccebdcbae7642e7f59369e13d458317

Overall Assessment

Looks good

Highest unmitigated severity: P3.

Executive Summary

Re-review status: the prior documentation issue is only partially resolved. The MultiPeriodDiD row now describes the fallback correctly, but the enclosing “Runtime Limitations” intro still says every listed item raises an error.
No estimator math, weighting formulas, identification assumptions, variance/SE logic, or default behaviors changed in this PR. The only runtime code delta is the shortened survey wild-bootstrap error message in diff_diff/survey.py:1087, and it remains consistent with current behavior.
The new survey compatibility matrix is otherwise aligned with the Methodology Registry and the estimator docstrings, including SDR support and 12-of-15 replicate-weight coverage.
No unmitigated P0/P1 findings.

Methodology

No findings. I cross-checked the changed survey-support claims in docs/choosing_estimator.rst:583, docs/methodology/REGISTRY.md:2310, diff_diff/staggered.py:1431, diff_diff/imputation.py:213, diff_diff/two_stage.py:209, and diff_diff/survey.py:1087; the updated docs are consistent on the materially changed claims.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new cross-reference in TODO.md:18 appropriately points readers to the consolidated deferred-work section in docs/survey-roadmap.md:275.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the consolidated runtime-limitations section still contradicts itself. The intro says all listed items raise NotImplementedError/ValueError, but the MultiPeriodDiD row directly below says inference='wild_bootstrap' with survey_design warns and falls back to analytical inference instead of raising. That leaves the prior re-review item only partially resolved and can still misstate actual API behavior. Concrete fix: either move MultiPeriodDiD into a separate warning/fallback subsection, or soften the intro so only the true error cases are described as raising. Refs: docs/survey-roadmap.md:288, docs/survey-roadmap.md:331, diff_diff/estimators.py:923, diff_diff/estimators.py:927.
I did not rerun the notebook or RST validation commands cited in the PR body.

MultiPeriodDiD wild bootstrap warns and falls back rather than raising. Move it into its own "Warning/Fallback Behaviors" subsection outside the runtime-error block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T14:59:32Z

/ai-review

github-actions · 2026-04-04T15:13:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2168e2c72d275a0845db3106651c2f7f3a6d4703

Overall Assessment

✅ Looks good

Highest unmitigated severity: P3.

Executive Summary

Re-review status: the prior documentation issue is resolved. The consolidated deferred-work section now separates true runtime errors from the MultiPeriodDiD warning/fallback path, so it no longer contradicts current behavior.
No estimator math, weighting formulas, identification assumptions, variance/SE logic, or default behaviors changed in this PR. The only runtime code delta is the shortened error text in diff_diff/survey.py:1087, and behavior is unchanged.
The changed survey-support claims are consistent with the Methodology Registry and current code on the material points: SDR support, 12-of-15 replicate-weight coverage, CallawaySantAnna survey covariates, and StaggeredTripleDifference survey support.
One minor documentation sync issue remains: the roadmap still describes the survey tutorial as containing an estimator support table, but the notebook now links to the canonical matrix instead of embedding that table.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the tutorial blurb in docs/survey-roadmap.md:166 is now stale relative to the notebook change. It still says the tutorial covers an estimator support reference table, but Section 9 of the notebook now only cross-references the canonical matrix and lists highlights in docs/tutorials/16_survey_did.ipynb:532. Concrete fix: update the roadmap blurb to say the notebook links to the central compatibility matrix, rather than saying it includes the table itself.
No test findings from the diff itself. I did not rerun the notebook or RST validation commands cited in the PR body.

Tutorial Section 9 now links to the compatibility matrix rather than containing the table itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-04T15:15:47Z

/ai-review

github-actions · 2026-04-04T15:21:42Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2fee9b419ed16626e71fdae9c327a11347d2969b

Overall Assessment

✅ Looks good

Highest unmitigated severity: P3.

Executive Summary

Re-review status: the prior documentation issue is resolved. The roadmap/tutorial now point readers to the canonical survey compatibility matrix instead of saying the notebook embeds that table.
No estimator logic, weighting math, variance/SE computation, identification assumptions, or default behaviors changed in this PR. The only runtime code delta is the shortened NotImplementedError text in diff_diff/survey.py:L1087-L1091, and behavior is unchanged.
The new survey-support claims are consistent with the Methodology Registry and current runtime gates, including SDR support and the 12-of-15 replicate-weight coverage summarized in docs/methodology/REGISTRY.md:L2310-L2325 and docs/choosing_estimator.rst:L583-L690.
One minor documentation sync issue remains: the roadmap still says the notebook is a 35-cell tutorial, but the notebook in this PR now has 37 cells.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: docs/survey-roadmap.md:L166-L173 still describes 16_survey_did.ipynb as a “35-cell tutorial,” but the notebook changed in this PR now contains 37 cells in docs/tutorials/16_survey_did.ipynb. That makes the roadmap stale again immediately after merge. Concrete fix: update the blurb to 37-cell tutorial, or better, remove the hardcoded cell count so future notebook edits do not require another sync change.
I did not rerun pytest tests/test_doc_snippets.py or notebook validation from the PR body.

Address P3: MultiPeriodDiD wild bootstrap warns, not errors

f54dea8

Split wild bootstrap row: DiD/TWFE raise NotImplementedError, MultiPeriodDiD warns and falls back to analytical inference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address P3: update 7c blurb to reflect tutorial cross-reference

2fee9b4

Tutorial Section 9 now links to the compatibility matrix rather than containing the table itself. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber merged commit c184e14 into main Apr 4, 2026
14 checks passed

igerber deleted the survey-docs-update branch April 4, 2026 16:14

Conversation

igerber commented Apr 4, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 4, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

igerber commented Apr 4, 2026

Uh oh!

github-actions bot commented Apr 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant