From 9a40257fa468814ad6492ce5072a82311cfa3029 Mon Sep 17 00:00:00 2001
From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com>
Date: Tue, 2 Jun 2026 06:18:54 +0000
Subject: [PATCH] perf(d-3 followup): harden compare.exs against schema drift
 (standards#99)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Phase D-3 follow-up under the single-lane HCG tier-2 channel
(standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate
defensive D-3 follow-up, not coupled to D-4 collection": when
bench/baseline.json `_status` is flipped to `active`, a scenario
present in results.json but absent from baseline.json (a new harness
scenario landed without rebaseline) silently passed the gate, and a
scenario present in baseline.json but absent from results.json (the
harness dropped a scenario without rebaselining) was never even
checked. Both directions of schema drift now fail-closed in active
mode and surface as informational "scaffold (would fail: ...)" rows
in scaffold-placeholder mode so a rebaseline PR previews the
active-mode verdict before the gate is armed.

The comparator now iterates the union of scenario names across
results and baseline rather than the results map alone, and uses a
single `enforce: bool` opt to pivot between scaffold and active mode
(replaces the previous `nil` sentinel). check_regression/5 also has
a latent crash fixed in the process — when baseline values are TODO
sentinels (or any non-number), num/1 returns nil and the `or` chain
raises BadBooleanError; the inner `&&` short-circuit already returns
nil for unknowns, so the outer joins are switched from `or` to `||`
to match. Previously this was masked by scaffold mode never reaching
check_regression at all (the `nil` sentinel skipped it); the new
flow exposes that path in scaffold mode too.

docs/perf-contract.md gains a "Schema drift" section explaining the
two directions, the active vs scaffold display difference, and the
fail-closed semantic. The behaviour pivots on `_status` in
bench/baseline.json — no code change is needed to arm the schema
checks once Phase D-4 maintainer-only rebaseline + active flip lands.

Smoke-tested locally against synthetic results/baseline fixtures
(four cases: active+drift→regressed, scaffold+drift→ok-with-warnings,
active+clean→ok, active+TODO-sentinels→ok-no-crash). Build is not
verified end-to-end — the session environment has Elixir 1.14 only,
no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports
the file is already formatted and Code.string_to_quoted!/1 round-trips
under 1.14. Repo CI (`Perf Regression`, governance, hypatia-scan,
dogfood-gate, codeql, scorecard) is the verification gate.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only; D-4 baseline collection
plus the `_status` flip to active still pend under #99 after this
lands. Same posture as PRs #14, #22, #26.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 bench/compare.exs     | 80 +++++++++++++++++++++++++++++++------------
 docs/perf-contract.md | 26 ++++++++++++++
 2 files changed, 85 insertions(+), 21 deletions(-)

diff --git a/bench/compare.exs b/bench/compare.exs
index de0b1a8..17b1c75 100644
--- a/bench/compare.exs
+++ b/bench/compare.exs
@@ -47,6 +47,7 @@ defmodule Bench.Compare do
   defp compare(results, baseline) do
     status = Map.get(baseline, "_status", "unknown")
     tolerance = Map.get(baseline, "tolerance", %{})
+    baseline_scenarios = Map.get(baseline, "scenarios", %{})
 
     IO.puts("# Phase D — Performance Regression Report")
     IO.puts("")
@@ -59,15 +60,18 @@ defmodule Bench.Compare do
           "> SCAFFOLD MODE — bench/baseline.json has not been populated yet " <>
             "(Phase D-4 collects the real baseline). This run is informational " <>
             "only; the gate is **non-blocking** until baseline.json `_status` " <>
-            "is flipped to `active`."
+            "is flipped to `active`. Schema drift (a scenario present in " <>
+            "results.json but absent from baseline.json, or vice versa) is " <>
+            "surfaced inline as `scaffold (would fail: ...)` so a rebaseline " <>
+            "PR previews the active-mode verdict before the gate is armed."
         )
 
         IO.puts("")
-        emit_table(results, nil, tolerance)
+        emit_table(results, baseline_scenarios, tolerance, enforce: false)
         System.halt(0)
 
       "active" ->
-        emit_table(results, baseline["scenarios"], tolerance)
+        emit_table(results, baseline_scenarios, tolerance, enforce: true)
         |> case do
           :ok -> System.halt(0)
           :regressed -> System.halt(1)
@@ -80,34 +84,70 @@ defmodule Bench.Compare do
   end
 
   # ── Pretty-print + regression check ────────────────────────────────────────
+  #
+  # Iterates the UNION of scenario names from results and baseline so neither
+  # schema-drift direction is silent:
+  #   • results-only scenario → "MISSING IN BASELINE" (new harness scenario
+  #     landed without a rebaseline; the regression gate has no anchor for it).
+  #   • baseline-only scenario → "MISSING IN RESULTS" (the harness dropped a
+  #     scenario the baseline still claims; the gate must not silently pass).
+  # Both directions fail-closed when `enforce: true` (active mode) and surface
+  # as informational `scaffold (would fail: ...)` rows when `enforce: false`
+  # (scaffold-placeholder mode) — see docs/perf-contract.md § Schema drift.
+
+  defp emit_table(results, baseline_scenarios, tolerance, opts) do
+    enforce = Keyword.fetch!(opts, :enforce)
 
-  defp emit_table(results, baseline_scenarios, tolerance) do
     # Benchee JSON shape: top-level "statistics" -> per-scenario map.
     stats = Map.get(results, "statistics", %{})
 
+    result_names = stats |> Map.keys() |> MapSet.new()
+    baseline_names = baseline_scenarios |> Map.keys() |> MapSet.new()
+    all_names = result_names |> MapSet.union(baseline_names) |> Enum.sort()
+
     IO.puts("| Scenario | p50 (µs) | p95 (µs) | p99 (µs) | Status |")
     IO.puts("|----------|----------|----------|----------|--------|")
 
-    Enum.reduce(stats, :ok, fn {name, scenario_stats}, acc ->
-      p50 = percentile_us(scenario_stats, "50")
-      p95 = percentile_us(scenario_stats, "95")
-      p99 = percentile_us(scenario_stats, "99")
+    Enum.reduce(all_names, :ok, fn name, acc ->
+      in_results = MapSet.member?(result_names, name)
+      in_baseline = MapSet.member?(baseline_names, name)
+
+      {p50, p95, p99} =
+        if in_results do
+          scenario_stats = Map.fetch!(stats, name)
+
+          {percentile_us(scenario_stats, "50"), percentile_us(scenario_stats, "95"),
+           percentile_us(scenario_stats, "99")}
+        else
+          {nil, nil, nil}
+        end
+
+      {raw_status, drift?} =
+        cond do
+          in_results and not in_baseline ->
+            {"MISSING IN BASELINE", true}
+
+          in_baseline and not in_results ->
+            {"MISSING IN RESULTS", true}
 
-      status =
-        case baseline_scenarios do
-          nil ->
-            "scaffold"
+          true ->
+            base = Map.fetch!(baseline_scenarios, name)
+            regressed? = check_regression(base, p50, p95, p99, tolerance) == "REGRESSED"
+            {if(regressed?, do: "REGRESSED", else: "ok"), regressed?}
+        end
 
-          map ->
-            base = Map.get(map, name)
-            check_regression(base, p50, p95, p99, tolerance)
+      display_status =
+        cond do
+          enforce -> raw_status
+          drift? -> "scaffold (would fail: #{raw_status})"
+          true -> "scaffold"
         end
 
-      IO.puts("| #{name} | #{fmt(p50)} | #{fmt(p95)} | #{fmt(p99)} | #{status} |")
+      IO.puts("| #{name} | #{fmt(p50)} | #{fmt(p95)} | #{fmt(p99)} | #{display_status} |")
 
       cond do
         acc == :regressed -> :regressed
-        status == "REGRESSED" -> :regressed
+        enforce and drift? -> :regressed
         true -> acc
       end
     end)
@@ -126,8 +166,6 @@ defmodule Bench.Compare do
   defp fmt(n) when is_float(n), do: :erlang.float_to_binary(n, decimals: 2)
   defp fmt(n), do: to_string(n)
 
-  defp check_regression(nil, _, _, _, _), do: "no baseline"
-
   defp check_regression(base, p50, p95, p99, tolerance) do
     bp50 = num(base["p50_us"])
     bp95 = num(base["p95_us"])
@@ -138,8 +176,8 @@ defmodule Bench.Compare do
     t99 = Map.get(tolerance, "p99_max_ratio", 1.50)
 
     breached =
-      (bp50 && p50 && p50 > bp50 * t50) or
-        (bp95 && p95 && p95 > bp95 * t95) or
+      (bp50 && p50 && p50 > bp50 * t50) ||
+        (bp95 && p95 && p95 > bp95 * t95) ||
         (bp99 && p99 && p99 > bp99 * t99)
 
     if breached, do: "REGRESSED", else: "ok"
diff --git a/docs/perf-contract.md b/docs/perf-contract.md
index 356c9ab..dcf067f 100644
--- a/docs/perf-contract.md
+++ b/docs/perf-contract.md
@@ -81,6 +81,32 @@ The CI gate (`.github/workflows/perf-regression.yml`) fails a PR when
 Tolerances are looser as the percentile gets noisier — Phase D-4 will
 tighten these once intra-run variance is characterised.
 
+## Schema drift
+
+The comparator iterates the **union** of scenario names from
+`bench/results.json` and `bench/baseline.json`. Either direction of
+schema drift fails the build in `active` mode:
+
+- **`MISSING IN BASELINE`** — a scenario in `results.json` (the harness
+  just emitted it) has no entry in `baseline.json`. A new scenario
+  landed without a rebaseline; the gate has no anchor for it and
+  cannot meaningfully report regression. Rebaseline before merging.
+- **`MISSING IN RESULTS`** — a scenario in `baseline.json` is absent
+  from `results.json` (the harness skipped or dropped it). Either the
+  harness regressed silently, or a scenario was removed without
+  rebaselining the file. The gate must not silently pass.
+
+Both directions are surfaced inline in the markdown table (in the
+`Status` column). In `scaffold-placeholder` mode they appear as
+informational `scaffold (would fail: MISSING IN BASELINE)` /
+`scaffold (would fail: MISSING IN RESULTS)` rows so a rebaseline PR
+previews the eventual active-mode verdict before the gate is armed;
+the build still passes. In `active` mode they appear as bare
+`MISSING IN BASELINE` / `MISSING IN RESULTS` and exit the comparator
+with status 1. Behaviour pivots on the single `_status` flag in
+`bench/baseline.json` — no code change is needed to arm the schema
+checks once the gate goes live.
+
 ## Baseline lifecycle
 
 The baseline lives in `bench/baseline.json`. Its `_status` field gates