perf(d-3 followup): harden compare.exs against schema drift (standards#99) by hyperpolymath · Pull Request #30 · hyperpolymath/http-capability-gateway

hyperpolymath · 2026-06-02T06:19:41Z

Summary

Phase D-3 follow-up under the single-lane HCG tier-2 channel (standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate defensive D-3 follow-up, not coupled to D-4 collection": once bench/baseline.json _status is flipped to active (which the D-4 ritual eventually does), two directions of schema drift between bench/results.json and bench/baseline.json silently passed the gate. This PR closes both.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only; the D-4 maintainer-dispatch rebaseline workflow plus the _status flip to active still pend under #99 after this lands. Same posture as PRs #14 (D-2), #22 (D-3), #26 (D-4 bootstrap).

The gap

The comparator's old emit_table/3 iterated over results scenarios only and let check_regression(nil, …) fall through to "no baseline", which never bubbled up to the :regressed exit:

Results-only scenario (new harness scenario landed without a rebaseline) → Map.get(baseline, name) returns nil → check_regression(nil, …) returns "no baseline" → reduce keeps acc = :ok → exit 0. The gate has no anchor for the new scenario, so it cannot meaningfully report regression for it, but it should say so instead of silently passing.
Baseline-only scenario (the harness dropped a scenario the baseline still claims) → never enters the Enum.reduce(stats, …) body at all → invisible in the report and exit 0. A scenario was removed (or the harness crashed before emitting it) and the gate did not notice.

In scaffold-placeholder mode the existing code was fine — every row was tagged "scaffold" and the build exited 0 unconditionally — but it gave operators no preview of how the eventual active-mode verdict would look.

What changed

bench/compare.exs
- emit_table/4 now iterates the union of scenario names from results.statistics and baseline.scenarios (sorted lexicographically), so both drift directions are visible.
- Both directions are surfaced inline:
  - results-only → MISSING IN BASELINE
  - baseline-only → MISSING IN RESULTS
- enforce: bool opt replaces the previous nil baseline sentinel — compare/2 now always passes Map.get(baseline, "scenarios", %{}) and uses enforce: false in scaffold-placeholder mode, enforce: true in active mode.
- In enforce: false rows are displayed as scaffold (would fail: MISSING IN BASELINE) etc. so a rebaseline PR previews the active-mode verdict; the build still exits 0.
- In enforce: true rows are displayed as bare MISSING IN BASELINE / MISSING IN RESULTS / REGRESSED and the comparator exits 1 if any row is drift.
- The now-unreachable check_regression(nil, _, _, _, _) -> "no baseline" clause is removed.
- Latent crash fixed: when baseline values are TODO sentinels (or any non-number), num/1 returns nil, and the old (bp50 && p50 && p50 > bp50 * t50) or (…) or (…) chain raised BadBooleanError because nil or nil is not a valid or. Inner && already short-circuits to nil; the outer joins are now || so the whole expression short-circuits consistently. Previously masked because scaffold mode never reached check_regression; the new flow does, so this had to be fixed in the same PR.
docs/perf-contract.md
- New ## Schema drift section between ## Regression-alert tolerance and ## Baseline lifecycle documents the two directions, the active vs scaffold display difference, and the fail-closed semantic.
- Updated SCAFFOLD-MODE banner inside compare.exs to mention that drift is now surfaced inline.

Behaviour matrix (smoke-tested)

Mode (`_status`)	results.json	baseline.json	Status column	Exit
scaffold	A only	(A absent)	`scaffold (would fail: MISSING IN BASELINE)`	0
scaffold	(B absent)	B only	`scaffold (would fail: MISSING IN RESULTS)`	0
scaffold	C	C (TODO)	`scaffold`	0
scaffold	D	D (real, over tol.)	`scaffold (would fail: REGRESSED)`	0
active	A only	(A absent)	`MISSING IN BASELINE`	1
active	(B absent)	B only	`MISSING IN RESULTS`	1
active	C	C (TODO)	`ok` (TODO parses as nil → no breach)	0
active	D	D (real, over tol.)	`REGRESSED`	1
active	E	E (real, within tol.)	`ok`	0

The behaviour pivots on _status in bench/baseline.json — no code change is needed to arm the schema checks once the D-4 rebaseline + active flip lands.

Local verification

Smoke-tested via a synthetic-fixture harness against the four named cases (active-with-drift → :regressed, scaffold-with-drift → :ok with (would fail: …) rows, active-clean → :ok, active-TODO-sentinels → :ok no crash). Output matched expected status strings and exit returns in every case.

Build was not verified end-to-end — the session environment has Elixir 1.14 only, no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports the file is already formatted and Code.string_to_quoted!/1 round-trips under 1.14. The existing perf-regression.yml workflow exercises the comparator end-to-end on CI.

Test plan

CI green: Perf Regression workflow runs mix run bench/compare.exs end-to-end and posts a scaffold-mode markdown report (still non-blocking — _status is scaffold-placeholder).
CI green: existing workflows (governance, hypatia-scan, dogfood-gate, codeql, scorecard) unaffected.
CI green: mix test still passes — bench/compare.exs is not exercised by mix test; no production-code change in this PR.
Manual (post-merge, owner): when D-4 rebaseline PR lands real numbers, the scaffold-mode Status column should still read scaffold for every scenario unless a true drift is present.
Manual (post-active-flip): when _status is flipped to active, the comparator exits 1 on any MISSING IN BASELINE, MISSING IN RESULTS, or REGRESSED row.

Downstream unblock

The boj-server rollout-prerequisite checklist in docs/integration/hcg-tier2-rollout-runbook.md §1.1 lists "Phase D-3 (gate armed)" and "Phase D-4 (numbers populated)" as the remaining open items gating Phase E rollout. This PR doesn't tick either box directly — neither requires schema-drift hardening — but it hardens the gate before it gets armed, so the first time _status flips to active the gate already covers the failure modes a future scenario rename / removal would otherwise hide.

Owner merges; not for admin-merge.

🤖 Generated with Claude Code

Generated by Claude Code

…s#99) Phase D-3 follow-up under the single-lane HCG tier-2 channel (standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate defensive D-3 follow-up, not coupled to D-4 collection": when bench/baseline.json `_status` is flipped to `active`, a scenario present in results.json but absent from baseline.json (a new harness scenario landed without rebaseline) silently passed the gate, and a scenario present in baseline.json but absent from results.json (the harness dropped a scenario without rebaselining) was never even checked. Both directions of schema drift now fail-closed in active mode and surface as informational "scaffold (would fail: ...)" rows in scaffold-placeholder mode so a rebaseline PR previews the active-mode verdict before the gate is armed. The comparator now iterates the union of scenario names across results and baseline rather than the results map alone, and uses a single `enforce: bool` opt to pivot between scaffold and active mode (replaces the previous `nil` sentinel). check_regression/5 also has a latent crash fixed in the process — when baseline values are TODO sentinels (or any non-number), num/1 returns nil and the `or` chain raises BadBooleanError; the inner `&&` short-circuit already returns nil for unknowns, so the outer joins are switched from `or` to `||` to match. Previously this was masked by scaffold mode never reaching check_regression at all (the `nil` sentinel skipped it); the new flow exposes that path in scaffold mode too. docs/perf-contract.md gains a "Schema drift" section explaining the two directions, the active vs scaffold display difference, and the fail-closed semantic. The behaviour pivots on `_status` in bench/baseline.json — no code change is needed to arm the schema checks once Phase D-4 maintainer-only rebaseline + active flip lands. Smoke-tested locally against synthetic results/baseline fixtures (four cases: active+drift→regressed, scaffold+drift→ok-with-warnings, active+clean→ok, active+TODO-sentinels→ok-no-crash). Build is not verified end-to-end — the session environment has Elixir 1.14 only, no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports the file is already formatted and Code.string_to_quoted!/1 round-trips under 1.14. Repo CI (`Perf Regression`, governance, hypatia-scan, dogfood-gate, codeql, scorecard) is the verification gate. Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#99 NOT Closes #99: joint-close is owner-only; D-4 baseline collection plus the `_status` flip to active still pend under #99 after this lands. Same posture as PRs #14, #22, #26. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-06-02T06:20:30Z

🔍 Hypatia Security Scan

Findings: 65 issues detected

Severity	Count
🔴 Critical	6
🟠 High	17
🟡 Medium	42

⚠️ Action Required: Critical security issues found!

View findings

[
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in codeql.yml",
    "type": "missing_timeout_minutes",
    "file": "codeql.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in governance.yml",
    "type": "missing_timeout_minutes",
    "file": "governance.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

hyperpolymath marked this pull request as ready for review June 2, 2026 09:05

hyperpolymath merged commit 4cda8d7 into main Jun 2, 2026
18 checks passed

hyperpolymath deleted the perf/d-3-compare-schema-drift branch June 2, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(d-3 followup): harden compare.exs against schema drift (standards#99)#30

perf(d-3 followup): harden compare.exs against schema drift (standards#99)#30
hyperpolymath merged 1 commit into
mainfrom
perf/d-3-compare-schema-drift

hyperpolymath commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hyperpolymath commented Jun 2, 2026

Summary

The gap

What changed

Behaviour matrix (smoke-tested)

Local verification

Test plan

Downstream unblock

Uh oh!

github-actions Bot commented Jun 2, 2026

🔍 Hypatia Security Scan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant