Skip to content

feat(capture): add the failed-then-fixed learning trigger#3568

Merged
marcusrbrown merged 7 commits into
mainfrom
feat/c2-failed-then-fixed
Jun 24, 2026
Merged

feat(capture): add the failed-then-fixed learning trigger#3568
marcusrbrown merged 7 commits into
mainfrom
feat/c2-failed-then-fixed

Conversation

@marcusrbrown

Copy link
Copy Markdown
Collaborator

Adds a second source to the learning-capture run: merged PRs whose CI failed then passed before merge. The fixing change becomes the learning, distilled from real evidence rather than the PR title.

What it does

  • Detection is one GraphQL query per PR — per-commit check conclusions across the PR's history — walked chronologically to find the last-failing and first-passing commits for a required check. Only real failures count (cancelled runs and policy gates are excluded).
  • Evidence is the fixing diff (primary, always available) plus a best-effort failing-log excerpt that degrades to a placeholder when logs are purged.
  • Privacy layers a new secret scan (PATs, cloud/LLM keys, JWTs, connection strings, generic credentials) and structural redaction (paths, hostnames, bearer tokens) on top of the existing private-name gate. Every evidence field — diff, log, and the failing check name — is redacted and scanned before it reaches the agent, fail-closed, with the open step re-checking the authored body as a final gate before posting.
  • Both triggers share the same dedup, per-run cap, and proposal flow; a PR matching both yields one proposal.

Hardening from review

Bounded gh timeouts and capped pagination so a hung process can't stall the run; recognized errored status contexts; widened the secret patterns; and added precedence, scan-availability, and log-redaction tests.

The brainstorm and plan ship in this branch. The accepted value-gate risk (corpus adoption not yet measured) is recorded in the requirements doc.

Add the second capture trigger: PRs whose CI failed then passed before merge feed
the existing propose-only pipeline. Transition detection is one GraphQL query per
PR; the fixing diff is the primary signal with best-effort failing logs; a new
secret/redaction scan layers on the private-name gate because logs and diffs leak
more than repo names; a discriminated candidate union keeps dedup, cap, and
privacy shared across both triggers.
…vacy module

CI logs and diffs leak more than repo names — tokens, keys, connection strings,
internal hostnames, and paths. Add a block scan for hard secrets (PATs, private
keys, credential-bearing connection strings, cloud/LLM keys) and a structural
redaction for paths, internal hostnames, and bearer/authorization values, both
pure and callable from the same shared module the existing private-name gate uses.
Make Candidate a trigger-discriminated union so a second source can share the
digest's dedup, cap, and privacy path without overloading review-specific fields.
The privacy step now redacts structural secrets and drops on a residual hard
secret as well as a private name. Collapse candidates to one per merge SHA within
a run so a PR matching both triggers yields a single proposal, review prose
taking precedence. The review source emits the same shape as before.
…ence

Add the CI fail-then-pass harvester: one GraphQL query per PR yields per-commit
check conclusions, a chronological walk finds the last-failing and first-passing
commits for a required check, and the fixing diff between them becomes the primary
evidence. A best-effort failing-log excerpt enriches it and degrades to a
placeholder when logs are gone. The diff and log pass the same redact-and-scan
privacy gate as review prose, dropping on a private name or residual secret. A PR
with no fixing diff (a bare green re-run) is not a candidate.
Fetch merged PRs once and feed both harvesters, resolve the required-check set
from branch protection (falling back to all checks when none are configured),
and concatenate both sources into the digest. The agent prompt now branches on
the trigger so a failed-then-fixed candidate is distilled from its fixing diff
and failure log, and the run summary reports the failed-then-fixed counts and the
secret-blocked count.
Bound each gh invocation with a timeout and cap commit pagination so a hung
process can't stall the run. Stop treating cancelled and action-required checks
as failures — only real failures qualify — and recognize errored status contexts.
Scan the failing check name alongside the diff and log, add the open step's
secret check as a final gate before posting, widen the secret patterns (Google,
GitLab, JWT, generic credentials), include base64 characters in the bearer
redaction, decode buffer log responses, and give failed-then-fixed proposals a
neutral title.
fro-bot
fro-bot previously approved these changes Jun 23, 2026

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structural spine here is clean: pure core (buildCandidateDigest, findFailPassTransition) split from the I/O shell, evidence injected so the privacy scan stays pure and testable. The fail-closed gating follows the prior learning in docs/solutions/best-practices/pure-core-privacy-gates-shared-module-2026-06-22.md to the letter — one shared capture-learnings-privacy.ts module, two chokepoints (upstream enrichment scan + authored-body backstop in the open step), and the empty-set-vs-load-failure distinction handled in main() via applyEnrichmentScanAvailability rather than smuggled into the core. 222 tests pass, covering the transition ordering invariant, CANCELLED exclusion, StatusContext ERROR detection, redaction ordering, and the mutation-proof "evidence reaches output when the gate is bypassed" case.

The chrome is light where it counts: SHA-pinned actions, contents: read only, file-based digest (no GITHUB_OUTPUT shell-injection vector), App token scoped to this repo, and the agent step deliberately denied issues: write so it cannot bypass the deterministic open step. No pull_request_target. No any, no @ts-ignore, no console in the new code — the as unknown as Parameters<...> casts are the established Octokit pagination idiom here, not type holes.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

  • scripts/capture-learnings-harvest.ts:1069 buildLogExcerpt reranks error lines ahead of everything else ([...errorLines, ...otherLines]), which severs each error from its surrounding stack-trace context before the excerpt reaches the agent. The diff is the primary signal so this doesn't break the learning, but the log evidence loses the very context that makes "this failure → this fix" legible. Consider emitting contiguous windows around error lines instead of a global partition. Defer if you want — it's a quality knob, not a correctness one.
  • scripts/capture-learnings-privacy.ts:61,71 the sk-[A-Za-z0-9\-]{20,} and generic-credential catch-all patterns are broad enough to match benign base64/hex blobs in a diff. The bias is fail-closed (over-block enriched evidence, never leak), which is the right direction — but if enrichmentBlockedBySecret telemetry climbs in production, that's the knob to tighten. No action needed now; the counts-only telemetry already gives you the signal to watch.
  • fetchMergedPrsInWindow issues a second pulls.list pagination alongside harvestCandidates's own fetch. The inline rationale (avoid entangling the filter logic, bounded by LOOKBACK_DAYS) is sound and honest, so this is a noted tradeoff rather than a flaw.

Missing tests

None. New behavior — trigger discrimination, within-run dedup precedence (review-heavy wins), CI transition detection across CheckRun/StatusContext, secret block/redact, scan-availability fail-closed, and log-redaction — all carry coverage.

Risk assessment: LOW

Scheduled/dispatch-only workflow with read-scoped permissions and no untrusted-input → run: interpolation. The blast radius is bounded: worst case under a logic error is dropped enriched evidence (fail-closed) or a low-value proposal issue a human still reviews before it reaches docs/solutions/. The privacy gate is load-bearing and proven so by a mutation test. The accepted value-gate risk (corpus adoption unmeasured) is recorded in the requirements doc — eyes open.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 28064390701
Cache hit
Session ses_1092934e3ffen5NQkmGlPzW6Xn

Emit contiguous windows around each error line instead of a global error-first
partition, so each error keeps the surrounding stack-trace lines that make the
failure legible; merge overlapping windows and fall back to the log head when no
error line is present.
@marcusrbrown

Copy link
Copy Markdown
Collaborator Author

Applied the actionable NBC: buildLogExcerpt now emits contiguous windows around each error line (a few lines before, more after) and merges overlapping windows, so each error keeps its surrounding stack-trace context instead of being globally partitioned away from it. Falls back to the log head when no error line is present. Added direct unit tests for context-preservation, the no-error fallback, and budget truncation. The other two NBCs (broad secret patterns watched via the enrichmentBlockedBySecret counter; the two pulls.list fetches) are accepted as-is per your note.

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after the dismissing push (e0af2cc). You took the one actionable note and turned it.

buildLogExcerpt no longer globally partitions error lines away from their context. It now emits contiguous windows (3 lines before, 6 after each error), merges overlapping/adjacent windows, preserves original order, drops an ... elision marker between non-contiguous segments, and falls back to the log head when no error line is present — all before the budget truncation. The privacy ordering invariant holds: this still runs in the I/O shell upstream of the pure-core redact-then-scan, so the windowed excerpt is what gets gated, not raw log text. A stack trace now reaches the agent as a legible unit instead of a scatter of orphaned error lines.

The three new unit tests (context-preservation, no-error head fallback, budget truncation) land exactly where the behavior lives. Suite is green at 225 (was 222). The other two NBCs — broad secret patterns and the second pulls.list — were correctly read as accepted tradeoffs; the enrichmentBlockedBySecret counter is the telescope you watch them through.

The spine I praised last round is untouched: pure core / I/O shell split, fail-closed gating per docs/solutions/best-practices/pure-core-privacy-gates-shared-module-2026-06-22.md, SHA-pinned actions, contents: read, file-based digest, agent denied issues: write.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

  • scripts/capture-learnings-privacy.ts:61,71 — broad sk- / generic-credential patterns remain a fail-closed over-block risk. Accepted; watch enrichmentBlockedBySecret.
  • fetchMergedPrsInWindow issues a second pulls.list alongside harvestCandidates. Accepted tradeoff, bounded by LOOKBACK_DAYS, documented inline.

Missing tests

None. The log-windowing change carries direct coverage; all prior behavior (trigger discrimination, dedup precedence, transition detection, secret block/redact, scan-availability) remains covered. 225 passing.

Risk assessment: LOW

Scheduled/dispatch-only, read-scoped, no untrusted input reaching a run: block. Worst case under a logic error is dropped enriched evidence (fail-closed) or a low-value proposal a human reviews before it reaches docs/solutions/. The windowing change is pure and confined to evidence quality — it cannot widen the leak surface. Ship it.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 28065498429
Cache hit
Session ses_1092934e3ffen5NQkmGlPzW6Xn

@marcusrbrown marcusrbrown merged commit d5a105e into main Jun 24, 2026
14 checks passed
@marcusrbrown marcusrbrown deleted the feat/c2-failed-then-fixed branch June 24, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants