Skip to content

Presence-claim audit workflow + README/lead-paragraph scoring fixes#25

Merged
saagpatel merged 1 commit into
mainfrom
feat/portfolio-claim-audit
May 30, 2026
Merged

Presence-claim audit workflow + README/lead-paragraph scoring fixes#25
saagpatel merged 1 commit into
mainfrom
feat/portfolio-claim-audit

Conversation

@saagpatel
Copy link
Copy Markdown
Owner

What this is

An external dynamic-workflow audit that independently re-checks the portfolio-truth snapshot's six presence claims against on-disk ground truth — and uses it to find and verify two scoring fixes in the auditor itself. Read-only; never mutates repos, the snapshot, or git.

The audit (read-only)

  • src/run_instructions_audit.py — deterministic pre-step: stratified pilot selection, evidence prep, live tool_today recompute, git-drift detection, and the bucket logic (TDD'd).
  • scripts/presence-claims-audit.workflow.js — the Workflow: fans out one Haiku verifier per repo (judging all 6 claims in one read, blind to the tool's answer) → deterministic per-(repo, claim) tally → one Sonnet synthesis.
  • scripts/run-instructions-audit.workflow.js — original single-claim version, superseded by presence-claims (kept as the simpler example).

The fixes (analyze_project_context), both verified by the audit

  1. README fallback — presence claims now consider the top-level README.md, not only the primary context file. Wires the previously-dormant readme_text parameter; primary-file identity unchanged (surgical).
  2. Lead-paragraph fallback — a project summary is detected as the prose under the # Title, not only under an ## Overview section.

Verification (deterministic — verifier verdicts held constant, only tool_today recomputed)

step overall agreement project_summary
baseline 76/96 = 79% 12/16
+ README fallback 82/96 = 85% 12/16
+ lead-paragraph 86/96 = 90% 16/16 (100%)

stack and project_summary reach 100%. Adds the first direct unit coverage for analyze_project_context (it had none).

Test plan

  • pytest -q2091 passed, 2 skipped, 0 regressions
  • ruff check clean
  • Every audit disagreement hand-validated against the files on disk

Notes / out of scope

  • Canonical portfolio-truth-latest.json is not regenerated — these fixes shift context_quality portfolio-wide; that actualization (plus merge-gate/tier review) is a separate step.
  • Residual ~10% is mostly non-heuristic: a malformed-AGENTS.md-generator fence, a boilerplate-vs-real judgment, an auditor-audits-itself branch confound, and deferred bespoke-heading cases.
  • Full design + pilot results in docs/plans/2026-05-29-run-instructions-external-audit.md.

…ixes

Adds an external dynamic-workflow audit that independently re-checks the
portfolio-truth snapshot's six presence claims against on-disk ground truth,
and uses it to find and verify two scoring fixes in the auditor itself.

Audit (read-only):
- src/run_instructions_audit.py: deterministic pre-step (stratified pilot
  selection, evidence prep, live tool_today recompute, git drift) + bucket logic
- scripts/presence-claims-audit.workflow.js: Workflow that fans out one Haiku
  verifier per repo (judging all 6 claims), deterministic tally, Sonnet synthesis
- scripts/run-instructions-audit.workflow.js: original single-claim version,
  superseded by presence-claims (kept as the simpler example)

Auditor fixes in analyze_project_context, both verified by the audit:
- README fallback: presence claims now consider the top-level README, not only
  the primary context file (wires the previously-dormant readme_text param)
- lead-paragraph fallback: a project summary is detected as the prose under the
  H1 title, not only under an "## Overview" section

Verified deterministically on a 16-repo pilot: overall agreement 79% -> 90%,
project_summary 75% -> 100%, stack 75% -> 100%. Adds direct unit coverage for
analyze_project_context (previously untested). Full suite: 2091 passed.

Canonical portfolio-truth-latest.json intentionally NOT regenerated (the fixes
shift context_quality portfolio-wide; that actualization is a separate step).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c101c6058

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +152 to +153
if readme_path.is_file():
readme_text = _read_small_text(readme_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep README fallback within context-file limits

When a top-level README.md exists but was excluded by _collect_context_files because it exceeds MAX_CONTEXT_BYTES, this fallback still reads the entire file and lets it drive all presence claims and context_quality. In the normal discovery path, that means oversized READMEs that the collector intentionally filtered out can now be scored as valid context, and the read is no longer bounded; gate this fallback on README.md being in context_file_names or apply the same size limit before reading.

Useful? React with 👍 / 👎.

Comment on lines +322 to +323
def _has_lead_summary(text: str) -> bool:
return _is_nontrivial_text(_lead_paragraph_text(text))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't treat list-only leads as summaries

For READMEs whose area before the first ## is only a table of contents or navigation list, _lead_paragraph_text preserves the link/list text and _is_nontrivial_text will mark project_summary_present=True once it has four words. That over-claims the project summary and can promote context_quality even though there is no prose saying what the project is; strip list-only/TOC leads or require at least one non-list prose sentence before accepting the fallback.

Useful? React with 👍 / 👎.

@saagpatel saagpatel merged commit d41e46a into main May 30, 2026
3 checks passed
@saagpatel saagpatel deleted the feat/portfolio-claim-audit branch May 30, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant