Skip to content

Update survey documentation: compatibility matrix, roadmap, deferred work#263

Merged
igerber merged 12 commits intomainfrom
survey-docs-update
Apr 4, 2026
Merged

Update survey documentation: compatibility matrix, roadmap, deferred work#263
igerber merged 12 commits intomainfrom
survey-docs-update

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 4, 2026

Summary

  • Add canonical survey compatibility matrix to choosing_estimator.rst (Phase 8f) with 15-estimator × 4-feature table, legend, and notes
  • Replace stale tutorial table (11 of 15 rows had errors) with cross-reference to the docs-hosted matrix
  • Mark Phase 8a–8e as shipped in survey-roadmap.md with version numbers
  • Consolidate all remaining NotImplementedError paths into a single "Deferred Work" section in survey-roadmap.md
  • Update replicate weight coverage from 4/15 → 12/15 estimators (v2.8.3 expansion) and add SDR to replicate method lists
  • Fix stale ROADMAP.md entries (version v2.7.5→v2.8.4, tutorial Open→Implemented, Staggered DDD marked Implemented)
  • Add cross-reference from TODO.md to consolidated deferred list

Methodology references (required if estimator / math changes)

  • N/A — documentation-only changes, no methodology or code changes

Validation

  • Tests added/updated: No test changes (documentation only)
  • pytest tests/test_doc_snippets.py passes — RST syntax validated
  • Notebook JSON validated (37 cells, all assertions pass)
  • All NotImplementedError paths in diff_diff/*.py verified against consolidated deferred list

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…work

Add survey compatibility matrix to choosing_estimator.rst (Phase 8f),
fix 11 stale entries in the tutorial table and replace with cross-reference,
mark Phase 8a-8e as shipped in survey-roadmap.md, consolidate all remaining
NotImplementedError paths into a single deferred work section, add SDR to
replicate method lists, and update ROADMAP.md version/status entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This PR is documentation-only, and the estimator implementations I spot-checked remain consistent with the methodology registry. I did find a few P2 documentation inconsistencies that are worth fixing if these docs are meant to be the canonical survey-support reference.

Executive Summary

  • No estimator, weighting, variance, or inference code changed, so I found no silent-correctness or methodology regressions in implementation.
  • docs/survey-roadmap.md now says the Callaway-Sant'Anna survey reg+covariates efficient DRDID nuisance correction is resolved, but the registry and code still document/use a conservative plug-in IF for that path.
  • The same roadmap is internally inconsistent: older phase tables still say several survey-bootstrap paths are deferred and still describe CS survey IPW/DR as no-covariate-only, while later rows mark those items resolved.
  • The new consolidated deferred-work section is not actually exhaustive; it omits the still-unsupported replicate-weight + bootstrap combinations in SunAbraham, ImputationDiD, and TwoStageDiD.

Methodology

  • Severity: P2. Impact: Affected method is CallawaySantAnna survey estimation_method="reg" with covariates. The roadmap now marks “Efficient DRDID nuisance IF for reg+covariates” as resolved in docs/survey-roadmap.md:L64, but the canonical registry still says this path uses a conservative plug-in IF and that the efficient correction is deferred in docs/methodology/REGISTRY.md:L437. The code also still computes residual-based plug-in IF terms in diff_diff/staggered.py:L2094. Because the deviation is still documented in REGISTRY.md, I’m treating this as a documentation-accuracy issue, not a P1 methodology defect. Concrete fix: change that roadmap row back to deferred/documented-deviation status and link to the registry note instead of calling it shipped.

Code Quality

  • No findings. The PR does not change executable code.

Performance

  • No findings. No runtime path changed.

Maintainability

Tech Debt

  • No findings. The TODO.md cross-reference is reasonable and does not hide a correctness issue.

Security

  • No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

  • Severity: P2. Impact: The new “Deferred Work (Consolidated)” section claims to be the “single source of truth” for remaining survey limitations in docs/survey-roadmap.md:L276, but it omits the still-unsupported n_bootstrap > 0 + replicate-weight combinations enforced in diff_diff/sun_abraham.py:L519, diff_diff/imputation.py:L272, and diff_diff/two_stage.py:L251. Users relying on the docs will still hit unsupported combinations that are not listed. Concrete fix: add SunAbraham, ImputationDiD, and TwoStageDiD to the mutual-exclusion table in docs/survey-roadmap.md:L297, or soften the “single source of truth” wording; if the section is meant to describe exception behavior precisely, note that these three currently raise ValueError, not NotImplementedError.

…imators

- Revert efficient DRDID nuisance IF for reg+covariates to deferred status
  (code and REGISTRY.md still use conservative plug-in IF)
- Update phase summary table Notes to reflect resolved bootstrap+survey
  paths (SA, ContinuousDiD, EfficientDiD, ImputationDiD, TwoStageDiD, CS)
- Add SunAbraham, ImputationDiD, TwoStageDiD to bootstrap+replicate
  mutual exclusion table in consolidated deferred section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 8677e76fb856179cbed5a5056fe938c3ac3205c6


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior methodology/documentation mismatch on CallawaySantAnna; one P2 documentation-accuracy issue remains in the new consolidated deferred-work section.

Executive Summary

  • P2: the new “single source of truth” section says all listed limitations raise NotImplementedError, but several listed bootstrap+replicate exclusions actually raise ValueError in code, so the docs still misstate the runtime contract.
  • The previous CallawaySantAnna survey reg+covariates roadmap mismatch is fixed: the roadmap now correctly defers the efficient nuisance IF and matches the documented registry deviation.
  • The previous omission of missing bootstrap+replicate exclusions is largely fixed: the consolidated deferred-work section now includes the previously missing estimators.
  • No estimator, weighting, variance, inference, performance, or security code changed, and I found no new methodology regressions in implementation.

Methodology

Code Quality

  • No findings. This PR does not change executable code.

Performance

  • No findings. No runtime path changed.

Maintainability

  • No findings.

Tech Debt

  • No findings. The new TODO.md cross-reference is reasonable and does not mask a correctness issue.

Security

  • No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

  • Severity: P2. Impact: docs/survey-roadmap.md:L276-L314 says the remaining survey limitations “raise NotImplementedError when attempted,” and the bootstrap+replicate subsection states the listed combinations raise NotImplementedError. That is not true for at least diff_diff/sun_abraham.py:L518-L523, diff_diff/imputation.py:L269-L275, and diff_diff/two_stage.py:L249-L254, which raise ValueError for the documented n_bootstrap > 0 + replicate-weight exclusion. Because this section is now presented as the canonical deferred-work reference, readers or tests relying on exception type will still be misled. Concrete fix: change the wording to “raise an error” or list the exact exception class per estimator; alternatively, standardize those guards to NotImplementedError if that is the intended public contract.

Some bootstrap+replicate exclusions raise ValueError (not
NotImplementedError). Update wording to "raise an error" to
accurately reflect the runtime contract.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 38f087baeaa65679ca10eadd45315dd10c50890a


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior re-review finding on exception-type wording is resolved; the remaining issues are documentation consistency/accuracy only.

Executive Summary

  • The prior P2 is fixed: the new consolidated deferred-work section now correctly says unsupported survey combinations may raise either NotImplementedError or ValueError, which matches live code in docs/survey-roadmap.md:276, diff_diff/sun_abraham.py:519, diff_diff/imputation.py:272, and diff_diff/two_stage.py:250.
  • P2: the new survey-support docs now present broader CallawaySantAnna replicate-weight support than the Methodology Registry still records, so the public docs and the methodology source-of-truth disagree on the supported contract for that method.
  • P2: ROADMAP.md:19 still says replicate-weight survey support is integrated “across all estimators,” but the new matrix/deferred-work docs in this PR explicitly say replicate weights are supported for only 12 of 15 estimators.
  • P3: the new “single source of truth” wording for remaining survey limitations is too strong; at least one live survey restriction (absorb with multiple variables plus survey weights) still lives only in TODO.md/code, not in the consolidated list.
  • No estimator code, variance math, weighting logic, or inference code changed in this PR.

Methodology

  • Severity: P2. Impact: the new docs imply full/canonical CallawaySantAnna replicate-weight survey support in docs/choosing_estimator.rst:611 and docs/survey-roadmap.md:49, but the Methodology Registry’s replicate-weight support matrix still narrows CallawaySantAnna to replicate support “without covariates, no bootstrap” at docs/methodology/REGISTRY.md:2310. That leaves the canonical public docs and the registry out of sync on a methodology-adjacent support boundary. Concrete fix: either update the registry replicate-support matrix to the intended current contract, or qualify the new public docs to the narrower support. If covariate+replicate support is intended, add a targeted coverage test; current replicate tests only pin no-covariate IPW/DR coverage at tests/test_survey_phase6.py:1525.

Code Quality

No findings. Documentation-only PR.

Performance

No findings. No runtime path changed.

Maintainability

No findings in executable code.

Tech Debt

  • Severity: P3. Impact: the new consolidated section says it is “the single source of truth for remaining survey limitations” at docs/survey-roadmap.md:276, and TODO.md now points readers there at TODO.md:18, but the live survey limitation for multiple absorbed fixed effects with survey weights is still tracked separately in TODO.md:62 and enforced in code at diff_diff/estimators.py:264 and diff_diff/estimators.py:1102. Because that limitation is already tracked, this is informational only, but the new wording overstates completeness. Concrete fix: either add this restriction to the consolidated list or soften the wording to “summary of major remaining survey limitations.”

Security

No findings. I did not see secrets, unsafe executable content, or other security-relevant changes.

Documentation/Tests

  • Severity: P2. Impact: ROADMAP.md:19 now says survey support with replicate weights is integrated “across all estimators,” but the compatibility matrix and deferred-work section added/updated in this PR say replicate weights are supported for only 12 of 15 estimators and are rejected for SyntheticDiD, TROP, and BaconDecomposition at docs/choosing_estimator.rst:596, docs/survey-roadmap.md:103, and docs/survey-roadmap.md:281. The top-level roadmap now contradicts the more precise canonical docs in the same PR. Concrete fix: reword the roadmap bullet to say survey support exists across all estimators, with replicate weights supported for 12 of 15 estimators, and link readers to the compatibility matrix for details.

…ording

- Update REGISTRY.md replicate-weight support matrix: CS now supports
  covariates with replicate weights (IF-based path is covariate-agnostic,
  shipped in Phase 7a)
- Qualify ROADMAP.md: "replicate weights supported for 12 of 15" instead
  of "across all estimators"
- Soften consolidated deferred section from "single source of truth" to
  "summary of major remaining limitations" with TODO.md cross-reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 7180d94d3c8748a91218affb426b4beef110bde4


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior roadmap-coverage issue is fixed, and the prior CallawaySantAnna support-boundary mismatch is largely resolved. One P2 documentation inconsistency remains in the Methodology Registry, plus one minor stale runtime-message mismatch.

Executive Summary

  • The prior ROADMAP.md overstatement on replicate-weight coverage is resolved; the roadmap now points readers to the 12-of-15 compatibility matrix.
  • The prior CallawaySantAnna registry/public-doc mismatch is mostly resolved; REGISTRY.md now reflects survey support for covariates in reg/ipw/dr.
  • P2: the updated replicate-support note in docs/methodology/REGISTRY.md still misstates the actual 12-estimator set by counting LinearRegression and omitting StaggeredTripleDifference.
  • P3: the new deferred-work docs are current, but the live wild-bootstrap-with-survey error message still points users to future “planned Phase 5 support.”
  • No estimator code, variance math, weighting logic, or inference paths changed in this PR.

Methodology

  • Severity: P2. Impact: docs/methodology/REGISTRY.md:L2310-L2324 says “12 of 15 estimators” support replicate weights, but the enumerated supported set includes LinearRegression and omits StaggeredTripleDifference. That conflicts with the public compatibility matrix in docs/choosing_estimator.rst:L591-L650, the roadmap coverage list in docs/survey-roadmap.md:L103-L109, and the existing staggered DDD replicate-support tests in tests/test_survey_staggered_ddd.py:L376-L445. This leaves the Methodology Registry internally inconsistent about the actual supported estimator contract. Concrete fix: replace LinearRegression (OLS path) with StaggeredTripleDifference, or rewrite the note to explicitly distinguish helper-level implementation paths from public-estimator coverage.

No other methodology findings. This PR is documentation-only; no estimator implementation or variance logic changed.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings beyond the documentation drift noted below.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the new consolidated deferred-work section in docs/survey-roadmap.md:L274-L325 presents remaining survey limitations as current deferred items, but the live wild-bootstrap guard in diff_diff/survey.py:L1087-L1091 still tells users to see the roadmap for “planned Phase 5 support.” This is minor, but it leaves one user-facing error path out of sync with the updated roadmap status. Concrete fix: update that message to say wild bootstrap + survey weights is currently unsupported and point to the consolidated deferred-work section instead.

- Replace LinearRegression (internal helper) with StaggeredTripleDifference
  (public estimator) in REGISTRY.md replicate-weight support matrix
- Update wild bootstrap + survey error message to remove stale "planned
  Phase 5 support" reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 1bb37b17e8115e63b3f87755ef2c21774ea7a53d


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The prior registry-support mismatch is fixed. Remaining issues are limited to documentation consistency and one user-facing runtime-message mismatch.

Executive Summary

  • The prior P2 is resolved: the replicate-weight note in docs/methodology/REGISTRY.md:L2310-L2325 now matches the 12-of-15 public support boundary in docs/choosing_estimator.rst:L596-L670.
  • The prior stale “planned Phase 5 support” message is gone, but the replacement text in diff_diff/survey.py:L1087-L1091 now overgeneralizes by recommending n_bootstrap > 0 even on estimators that do not expose that API.
  • Severity P2: the survey docs still disagree on replicate-weight support. ROADMAP.md:L11-L19 says replicate-weight inference is integrated across all estimators, while docs/survey-roadmap.md:L93-L96 still omits SDR.
  • No estimator implementation, identification assumptions, weighting logic, variance/SE math, or defaults changed in this PR.
  • No P0/P1 findings identified.

Methodology

No findings. Cross-checking the changed registry note against the current compatibility matrix shows the previous public-contract mismatch is resolved: docs/methodology/REGISTRY.md:L2310-L2325, docs/choosing_estimator.rst:L596-L670.

Code Quality

  • Severity: P3. Impact: The shared survey guard in diff_diff/survey.py:L1087-L1091 tells users to “use survey-aware multiplier bootstrap (n_bootstrap > 0)”, but that helper is invoked from estimators that only expose inference='wild_bootstrap' and have no n_bootstrap parameter, including diff_diff/estimators.py:L237-L242 and diff_diff/twfe.py:L124-L129. That makes the new runtime guidance inaccurate for DifferenceInDifferences, MultiPeriodDiD, and TwoWayFixedEffects; the compatibility matrix also marks those rows as having no survey bootstrap path in docs/choosing_estimator.rst:L596-L610. Concrete fix: make the shared message generic (“use analytical survey inference; survey-aware bootstrap is estimator-specific”) or move estimator-specific bootstrap guidance into callers that actually support n_bootstrap.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

… msg

- ROADMAP.md: restructure to say "survey-aware inference across all 15
  estimators; replicate weights supported for 12 of 15"
- survey-roadmap.md Phase 6: add SDR to replicate method list
- survey.py: make wild bootstrap error message generic (not all estimators
  expose n_bootstrap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 51bff5551cab2f38dc9c0e69ba0a8a2760ca79e3


Overall Assessment

✅ Looks good

Executive Summary

  • The prior replicate-weight documentation mismatch is resolved: the registry, survey roadmap, estimator matrix, and tutorial now all agree on SDR support and the 12-of-15 replicate-weight boundary (REGISTRY:L2310-L2325, survey-roadmap.md:L93-L109, choosing_estimator.rst:L596-L670, 16_survey_did.ipynb:L532-L541).
  • No estimator implementation, weighting logic, identification assumptions, variance formulas, or defaults changed in this PR. The only code change is the survey wild-bootstrap guard message, and that prior guidance issue is fixed (survey.py:L1087-L1090).
  • Remaining issues are documentation-only: ROADMAP.md still overclaims survey inference coverage, docs/survey-roadmap.md now contradicts itself about whether Phase 8 is fully done, and the new wild-bootstrap deferred-work row still points users toward a bootstrap fallback that the compatibility matrix says does not exist for DID/TWFE/MultiPeriod.

Methodology

No findings. This PR does not change estimator math, SE/VCOV computation, weighting, assumptions, or defaults, and the updated replicate-weight documentation is now aligned with the methodology registry (REGISTRY:L2310-L2325, choosing_estimator.rst:L596-L670).

Code Quality

No findings. The changed runtime message in survey.py:L1087-L1090 is consistent with the current API and removes the stale “planned Phase 5” wording.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new cross-reference in TODO.md is appropriate tracking, not a blocker.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: ROADMAP.md still overstates survey methodology coverage. It says Taylor linearization and replicate weights are integrated across all estimators and that survey-aware inference exists across all 15 estimators, but the canonical matrix shows SyntheticDiD and TROP are bootstrap-only for full-design surveys and BaconDecomposition is diagnostic-only; the Bacon implementation also explicitly says no survey vcov is computed and replicate-weight designs are rejected (ROADMAP.md:L11-L19, choosing_estimator.rst:L596-L670, bacon.py:L438-L470). Concrete fix: reword the roadmap summary to say survey support exists across all 15 estimators, with TSL/replicate/bootstrap support varying by estimator, and explicitly carve out BaconDecomposition as diagnostic-only.
  • Severity: P2. Impact: docs/survey-roadmap.md now opens with “All phases (1-8) are implemented,” but later in the same file marks Phase 8g as only “Partially addressed” and keeps those items in the deferred section, so the roadmap contradicts itself (survey-roadmap.md:L1-L4, survey-roadmap.md:L264-L330). Concrete fix: change the intro to something like “Phases 1-8f are implemented; remaining Phase 8g documentation items are deferred below.”
  • Severity: P3. Impact: The new deferred-work row for “Wild bootstrap + survey weights” says “(all estimators)” should use analytical SEs or “survey-aware multiplier bootstrap instead,” but the compatibility matrix shows DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD have no survey-bootstrap path at all (--), and the shared guard now correctly tells users to fall back to analytical survey inference (survey-roadmap.md:L303-L325, choosing_estimator.rst:L596-L610, survey.py:L1087-L1090). Concrete fix: scope that row to the estimators that actually expose inference='wild_bootstrap', and state that their supported fallback is analytical survey inference; if users want bootstrap + survey, point them to estimators whose matrix row lists a survey-bootstrap method.

…otstrap

- ROADMAP.md: distinguish survey weights (all 15) from design-based
  variance (varies by estimator), carve out BaconDecomposition
- survey-roadmap.md: header says "Phases 1-8f implemented" (8g partial)
  instead of "All phases implemented"
- Deferred work: scope wild bootstrap row to DiD/TWFE/MultiPeriod
  (the estimators that expose inference='wild_bootstrap')

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 815e0e44adae7523eb25b3b00d1a62d11279457c


Overall Assessment

✅ Looks good

Highest unmitigated issue is P2 documentation accuracy. I did not find any P0/P1 methodology, inference, or correctness defects in the changed code.

Executive Summary

Methodology

  • No findings. This PR does not change estimator implementations or inference math, and the only methodology-adjacent updates are documentation changes that remain consistent with the registry, including the documented CallawaySantAnna survey reg+covariates deviation from R (docs/methodology/REGISTRY.md:L437, docs/methodology/REGISTRY.md:L2310).

Code Quality

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The new cross-reference from TODO.md to the consolidated survey deferred-work list is appropriate tracking, not a mitigation attempt for a correctness issue (TODO.md:L18, docs/survey-roadmap.md:L275).

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: [Previously raised, still open] ROADMAP.md still overstates survey inference coverage. The opening sentence says Taylor linearization and replicate weights are integrated across all estimators, but the compatibility matrix and registry now say support varies by estimator, with replicate weights available for only 12 of 15 public estimators and SyntheticDiD, TROP, and BaconDecomposition excluded. Concrete fix: rewrite the opening sentence to say survey support is library-wide, while TSL / replicate / survey-bootstrap support varies by estimator, and point directly to the compatibility matrix (ROADMAP.md:L11, ROADMAP.md:L19, docs/choosing_estimator.rst:L656, docs/choosing_estimator.rst:L661, docs/choosing_estimator.rst:L666, docs/methodology/REGISTRY.md:L2310).
  • Severity: P3. Impact: the new “Deferred Work (Consolidated)” intro overpromises runtime-error semantics. It says “All items below raise an error when attempted,” but the final “Documentation-Only (Phase 8g)” subsection is not a runtime limitation. Concrete fix: move the Phase 8g bullets outside the consolidated runtime-limitation section, or change the intro to distinguish runtime limitations from documentation tasks (docs/survey-roadmap.md:L277, docs/survey-roadmap.md:L328).

…erred

- ROADMAP.md line 11: remove "Taylor linearization, replicate weights
  integrated across all estimators" — now says "all estimators accept
  survey weights, with design-based variance varying by estimator"
- survey-roadmap.md: move Phase 8g documentation tasks into their own
  section outside the consolidated runtime-limitations block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 1b174a63ebbdf14c00633c0842e8841800b56d56


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior documentation-overstatement issues. The only remaining issue I found is a P3 documentation wording mismatch in the consolidated survey deferred-work intro.

Executive Summary

Methodology

Code Quality

  • No findings. The lone code change is an error-message simplification in the shared survey resolver, and it is consistent with the documented wild_bootstrap + survey_design rejection for DifferenceInDifferences, TwoWayFixedEffects, and MultiPeriodDiD (diff_diff/survey.py:L1087-L1090, docs/survey-roadmap.md:L323-L326).

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

Security

  • No findings.

Documentation/Tests

Move CallawaySantAnna conservative plug-in IF entry into its own
"Documented Deviations" subsection (supported path, not an error).
Runtime limitations intro now accurately describes only error-raising
items.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 92df37c1c1234227a2f0b39002bbe6592ae7e1ac


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review resolves the prior deferred-work wording issue. I found one remaining P3 documentation mismatch in the consolidated survey limitations list.

Executive Summary

Methodology

No findings. The updated survey-support notes remain aligned with the Methodology Registry, including the documented Callaway-Sant’Anna survey deviation and the expanded replicate-weight matrix. docs/methodology/REGISTRY.md:L434 docs/methodology/REGISTRY.md:L2310

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new TODO.md cross-reference is appropriate tracking and does not attempt to TODO-away a correctness issue. TODO.md:L18

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the consolidated runtime-limitations section overstates one runtime behavior. It says all listed items raise NotImplementedError/ValueError, but the MultiPeriodDiD case listed there does not error; fit() warns and uses analytical inference when inference='wild_bootstrap'. That can mislead readers about actual API behavior. Concrete fix: split that row so DifferenceInDifferences and TwoWayFixedEffects remain under runtime rejections, and either remove MultiPeriodDiD from the error list or reword it as a warning/fallback limitation. docs/survey-roadmap.md:L288 docs/survey-roadmap.md:L330 diff_diff/estimators.py:L923 diff_diff/estimators.py:L1227
  • No testing blocker identified from the diff itself. I did not rerun the notebook/RST validation commands cited in the PR body.

Split wild bootstrap row: DiD/TWFE raise NotImplementedError,
MultiPeriodDiD warns and falls back to analytical inference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: f54dea835ccebdcbae7642e7f59369e13d458317


Overall Assessment

Looks good

Highest unmitigated severity: P3.

Executive Summary

  • Re-review status: the prior documentation issue is only partially resolved. The MultiPeriodDiD row now describes the fallback correctly, but the enclosing “Runtime Limitations” intro still says every listed item raises an error.
  • No estimator math, weighting formulas, identification assumptions, variance/SE logic, or default behaviors changed in this PR. The only runtime code delta is the shortened survey wild-bootstrap error message in diff_diff/survey.py:1087, and it remains consistent with current behavior.
  • The new survey compatibility matrix is otherwise aligned with the Methodology Registry and the estimator docstrings, including SDR support and 12-of-15 replicate-weight coverage.
  • No unmitigated P0/P1 findings.

Methodology

No findings. I cross-checked the changed survey-support claims in docs/choosing_estimator.rst:583, docs/methodology/REGISTRY.md:2310, diff_diff/staggered.py:1431, diff_diff/imputation.py:213, diff_diff/two_stage.py:209, and diff_diff/survey.py:1087; the updated docs are consistent on the materially changed claims.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new cross-reference in TODO.md:18 appropriately points readers to the consolidated deferred-work section in docs/survey-roadmap.md:275.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the consolidated runtime-limitations section still contradicts itself. The intro says all listed items raise NotImplementedError/ValueError, but the MultiPeriodDiD row directly below says inference='wild_bootstrap' with survey_design warns and falls back to analytical inference instead of raising. That leaves the prior re-review item only partially resolved and can still misstate actual API behavior. Concrete fix: either move MultiPeriodDiD into a separate warning/fallback subsection, or soften the intro so only the true error cases are described as raising. Refs: docs/survey-roadmap.md:288, docs/survey-roadmap.md:331, diff_diff/estimators.py:923, diff_diff/estimators.py:927.
  • I did not rerun the notebook or RST validation commands cited in the PR body.

MultiPeriodDiD wild bootstrap warns and falls back rather than raising.
Move it into its own "Warning/Fallback Behaviors" subsection outside
the runtime-error block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 2168e2c72d275a0845db3106651c2f7f3a6d4703


Overall Assessment

✅ Looks good

Highest unmitigated severity: P3.

Executive Summary

  • Re-review status: the prior documentation issue is resolved. The consolidated deferred-work section now separates true runtime errors from the MultiPeriodDiD warning/fallback path, so it no longer contradicts current behavior.
  • No estimator math, weighting formulas, identification assumptions, variance/SE logic, or default behaviors changed in this PR. The only runtime code delta is the shortened error text in diff_diff/survey.py:1087, and behavior is unchanged.
  • The changed survey-support claims are consistent with the Methodology Registry and current code on the material points: SDR support, 12-of-15 replicate-weight coverage, CallawaySantAnna survey covariates, and StaggeredTripleDifference survey support.
  • One minor documentation sync issue remains: the roadmap still describes the survey tutorial as containing an estimator support table, but the notebook now links to the canonical matrix instead of embedding that table.

Methodology

  • No findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: the tutorial blurb in docs/survey-roadmap.md:166 is now stale relative to the notebook change. It still says the tutorial covers an estimator support reference table, but Section 9 of the notebook now only cross-references the canonical matrix and lists highlights in docs/tutorials/16_survey_did.ipynb:532. Concrete fix: update the roadmap blurb to say the notebook links to the central compatibility matrix, rather than saying it includes the table itself.
  • No test findings from the diff itself. I did not rerun the notebook or RST validation commands cited in the PR body.

Tutorial Section 9 now links to the compatibility matrix rather than
containing the table itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 4, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 2fee9b419ed16626e71fdae9c327a11347d2969b


Overall Assessment

✅ Looks good

Highest unmitigated severity: P3.

Executive Summary

  • Re-review status: the prior documentation issue is resolved. The roadmap/tutorial now point readers to the canonical survey compatibility matrix instead of saying the notebook embeds that table.
  • No estimator logic, weighting math, variance/SE computation, identification assumptions, or default behaviors changed in this PR. The only runtime code delta is the shortened NotImplementedError text in diff_diff/survey.py:L1087-L1091, and behavior is unchanged.
  • The new survey-support claims are consistent with the Methodology Registry and current runtime gates, including SDR support and the 12-of-15 replicate-weight coverage summarized in docs/methodology/REGISTRY.md:L2310-L2325 and docs/choosing_estimator.rst:L583-L690.
  • One minor documentation sync issue remains: the roadmap still says the notebook is a 35-cell tutorial, but the notebook in this PR now has 37 cells.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: docs/survey-roadmap.md:L166-L173 still describes 16_survey_did.ipynb as a “35-cell tutorial,” but the notebook changed in this PR now contains 37 cells in docs/tutorials/16_survey_did.ipynb. That makes the roadmap stale again immediately after merge. Concrete fix: update the blurb to 37-cell tutorial, or better, remove the hardcoded cell count so future notebook edits do not require another sync change.
  • I did not rerun pytest tests/test_doc_snippets.py or notebook validation from the PR body.

@igerber igerber merged commit c184e14 into main Apr 4, 2026
14 checks passed
@igerber igerber deleted the survey-docs-update branch April 4, 2026 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant