ci: add index-integrity probe to distribution install smoke (#3218 regression guard)#3234
Conversation
…gression guard) The daily install-smoke installs the CLI per channel and asserts the version, but that only samples once at 14:00 UTC. The #3218 cross-channel clobber empties the stable apt index for most of the day (a bleeding-edge snapshot wipes it minutes after each stable publish; it's only briefly populated), so a clobber can land outside the install legs' sample window and slip past them — and when they do fail, "apt install failed" doesn't name the cause. Add a fast, container-free `index-integrity` job that probes the PUBLISHED apt and yum dist indexes directly: stable must be non-empty and name the current GA (per arch for apt), and bleeding-edge must stay non-empty (the clobber was bidirectional — apt-wheels#5 scopes regen per-channel to prevent both directions). An empty/missing/stale index fails with a message that names the regression. Reliability: fetches use `curl --retry ... --retry-all-errors` so a transient blip can't false-red the guardian, and all grep checks feed from a here-string (`grep -q PAT <<<"$body"`) rather than `printf | grep -q` — under `set -o pipefail` the latter false-fails when grep -q early-exits and SIGPIPEs the upstream printf of a 100KB index. Verified: 3/3 green against the live indexes, and a negative test (bogus version) correctly fails. Refs: #3218, wheels-dev/apt-wheels#5 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Peter Amiri <petera@pai.com>
There was a problem hiding this comment.
Wheels Bot — Reviewer
TL;DR — This PR adds an index-integrity job to the distribution install smoke workflow that probes the published apt/yum dist indexes directly, catching the #3218 cross-channel clobber deterministically rather than relying on the once-a-day install legs to happen to sample inside the brief post-publish window. The script is defensively written (injection-safe EXPECTED via env, retrying curl, here-strings to dodge the pipefail/SIGPIPE false-red), self-tests via the existing pull_request trigger, and is documented thoroughly inline. Verdict: comment — no correctness, cross-engine, or security issues; two minor regex-hardening nits below, neither blocking.
Correctness
No blocking issues. The accumulate-then-exit 1 pattern (set -uo pipefail without -e, note() setting fail=1, final [ "$fail" -ne 0 ]) correctly reports all failing indexes in one run instead of bailing on the first. The body="$(fetch … || true)" + [ -z "$body" ] guard correctly turns a post-retry curl failure into a named EMPTY failure rather than a bare non-zero exit. The grep … | head -1 || true on line 158 has the same SIGPIPE-under-pipefail shape the comment block warns about, but it is correctly neutralised by || true and the value is already captured by the command substitution — consistent, not a bug.
Conventions
Two minor, non-blocking hardening nits — both about treating a version string as a regex:
-
Line 136 (
grep -q "Version: ${EXPECTED}") and line 166 (grep -q "ver=\"${assert_ver}\""):EXPECTEDis semver, so its dots are regex any-char metacharacters. Since it isresolve-validated to^[0-9]+\.[0-9]+\.[0-9]+$there is no injection risk, but the unescaped dots mean4.0.5would also match a hypothetical4x0y5stanza — a (very contrived) false-positive that could let a bad index pass. Usinggrep -qFmakes the match exact (grep -qF "Version: ${EXPECTED}"/grep -qF "ver=\"${assert_ver}\""). -
Line 158 (
grep -oE 'repodata/[a-f0-9]+-primary\.xml\.gz') hardcodes gzip. Moderncreaterepo_ccan emit.zstprimary metadata, which would make this probe false-red (noloc→ "no primary metadata"). Not actionable today — you verified 3/3 green against the live.gzindexes — but worth a one-line comment so a future yum-wheels compression switch points back here rather than looking like a real clobber.
Cross-engine
N/A — pure GitHub Actions / bash, no CFML touched.
Tests
Appropriate for a CI-only change: the existing pull_request: paths: [distribution-install-smoke.yml] self-test trigger (lines 50–54) runs the new job live against the real indexes on this very PR, and the PR body reports 3/3 green plus a negative test (bogus version correctly fails). No framework spec applies.
Docs
No changelog fragment needed — this is workflow-internal tooling, not a user-facing fix/feat. The inline comment block (lines 17–32, 95–101) is updated to describe both layers accurately.
Commits
ci: add index-integrity probe to distribution install smoke (#3218 regression guard) — valid ci type, no scope, subject not ALL-CAPS, header well under 100 chars. Conforms to commitlint.config.js.
Security
Clean. EXPECTED is semver-validated in resolve and passed via env: (line 105), never interpolated into the shell; it only reaches a grep pattern (regex-pattern context, hardened further by the -F suggestion above). CB derives from github.run_id/run_attempt. No secrets, no untrusted interpolation, permissions: contents: read only.
…ctly Two corrections to the index-integrity job after finding the real #3218 failure mode is CDN edge-cache staleness, not just the origin clobber: 1. Drop the `?cb=` cache-buster. apt/dnf fetch the plain URLs that hit Cloudflare's edge; a cache-buster hit R2 origin instead, so the probe went green while real `apt install` failed on a stale edge-cached Packages.gz. Test the SAME plain URLs clients use. 2. Verify hash-consistency, exactly what the package managers verify: parse the SHA256 the (plain) Release records for each binary-<arch>/Packages.gz, fetch the (plain) Packages.gz, and compare — this is precisely the "File has unexpected size" check that failed in #3218. yum: verify the served primary.xml.gz matches the hash repomd references. Then gunzip and confirm the content lists the GA (catches an empty-but-internally-consistent index). Binary correctness: gzip blobs are fetched to a temp file and hashed/gunzipped from the file — never captured into a shell variable, since `$(...)` is text-only and silently corrupts binary (strips trailing newlines, drops NULs), which would compute a bogus hash. Paired with apt-wheels#6 / yum-wheels#6 (set `no-store` on metadata so the edge stops caching it). Verified against the live indexes: correctly RED on the currently-stale apt stable amd64 edge, GREEN on arm64 / bleeding-edge / yum. Refs: #3218, wheels-dev/apt-wheels#6, wheels-dev/yum-wheels#6 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Peter Amiri <petera@pai.com>
There was a problem hiding this comment.
Wheels Bot — Reviewer
TL;DR — This re-review covers the new 61103bb9 commit, which upgrades the index-integrity job from a plain fetch-and-grep into a hash-consistency probe that mirrors what apt-get update / dnf actually verify: it parses the SHA256 the (plain) Release records for binary-<arch>/Packages.gz, fetches that same plain URL, compares the served hash to the recorded one, then gunzips and asserts the GA is listed — and does the dnf analogue via repomd.xml → <sha>-primary.xml.gz. This directly reproduces the #3218 "File has unexpected size" failure mode (stale edge-cached index vs. fresh Release) that a naive non-empty check would miss, and the deliberate "no cache-buster, fetch the plain edge URLs" decision is now correct and well-documented. Verdict: comment — no correctness, cross-engine, or security issues; the two regex-hardening nits from the prior round still stand plus one trivial dead-variable cleanup, none blocking.
Correctness
No blocking issues. The new hash-consistency layer is sound:
- The
awkextraction of the per-archPackages.gzSHA256 (line 152) correctly gates on the^SHA256:section header, resets on the next^[A-Z]section, and matches the path in$3(data lines begin with a space, so the path is reliably field 3). The binary-to-temp-file discipline (sha256fon a path, never piped — comment at lines 132–136) correctly avoids command-substitution corrupting the gzip blob. - The failure ordering is right: a failed
Packages.gzfetch leaves$tmpempty, soexp != actfires the hash-mismatch branch (line 161) before the version branch — i.e. a torn/stale publish is reported as exactly that, not as "doesn't list the GA." Theact:-<empty>placeholder is cosmetic (an empty file still hashes to a value), not a bug. set -uo pipefailwithout-eplus the accumulate-then-exit 1pattern still correctly reports all failing indexes in one run. Theawk … | head -1on line 152 has no|| true, but with-eabsent the captured value is unaffected and the ignored pipeline status is harmless — consistent with the SIGPIPE reasoning in the comment block.
Conventions
Three minor, non-blocking nits — two carry over from the prior round and remain accurate against the new code:
- Line 162 (
grep -q "Version: ${EXPECTED}") and line 204 (grep -q "ver=\"${assert_ver}\""): both treat a semver string as a regex, so its dots are any-char metacharacters.EXPECTEDisresolve-validated to^[0-9]+\.[0-9]+\.[0-9]+$(no injection risk), but the unescaped dots mean4.0.5would also match a hypothetical4x0y5stanza — a contrived false-positive that could let a bad index pass. Both patterns are fixed strings, sogrep -qF "Version: ${EXPECTED}"/grep -qF "ver=\"${assert_ver}\""makes the match exact at no cost. - Line 187 (
grep -oE 'repodata/[a-f0-9]+-primary\.xml\.gz') and the paired filename-as-sha256 assumption hardcode gzip. Moderncreaterepo_ccan emit.zstprimary metadata, which would make this probe false-red ("no primary metadata"). Not actionable today — verified green against the live.gzindexes — but a one-line comment would point a future yum-wheels compression switch back here instead of looking like a real clobber. - Line 185 (
local repomd loc exp gz act content):gzis declared but never assigned or read. Trivial — drop it from thelocallist.
Cross-engine
N/A — pure GitHub Actions / bash, no CFML touched.
Tests
Appropriate for a CI-only change. The pull_request: branches:[develop] paths:[distribution-install-smoke.yml] self-test trigger (lines 50–54) runs the rewritten index-integrity job live against the real indexes on this very PR — needs: resolve, no if: gate, so it executes here. The PR body reports 3/3 green plus a negative test (bogus version fails), so the guard isn't a no-op. No framework spec applies.
Docs
No changelog fragment needed — this is workflow-internal CI tooling, not a user-facing fix/feat. The header comment block (lines 12–32) and the inline env:/run: comments accurately describe both the hash-consistency logic and the deliberate plain-URL / no-cache-buster decision.
Commits
Both commits conform to commitlint.config.js: ci: add index-integrity probe … and ci: probe plain (edge) URLs + hash-consistency, mirroring apt/dnf exactly — valid ci type, no scope, subjects not ALL-CAPS, headers under 100 chars. Subjects describe the "why" (reproduce the real apt/dnf verification path) rather than just the "what."
Security
Clean. EXPECTED is semver-validated in resolve (line 83) and reaches the script only via env: (line 105), never interpolated into a shell command — it lands in a grep regex-pattern context (hardened further by the -F suggestion above). The workflow_dispatch INPUT is likewise env-passed and semver-gated before use. No secrets, no untrusted interpolation, permissions: contents: read only.
Why
apt install wheelson the stable channel was failing (#3218) because a cross-channel index clobber empties the stable aptPackagesindex for most of the day. Root cause fixed in wheels-dev/apt-wheels#5 (per-channel regen). This PR adds the regression guard to the daily smoke so it can't silently come back.The gap in the existing smoke
The install legs install the CLI per channel and assert the version — but they sample once at 14:00 UTC. The clobber is transient: a bleeding-edge snapshot wipes the stable index minutes after each stable publish, so stable is empty most of the day and only briefly populated. A clobber can land outside the install legs' window and slip past them; and "apt install failed" doesn't name the cause.
What this adds
A fast, container-free
index-integrityjob that probes the published apt + yum dist indexes directly:Version: <GA>).repomd.xml→primary.xml.gzlists the GA.A missing/empty/stale index fails with a message that names the #3218 regression.
Reliability (verified locally against the live indexes)
curl --retry 4 --retry-all-errors --max-time 60so a transient blip can't false-red a daily guardian.grep -q PAT <<<"$body"), notprintf | grep -q. Underset -o pipefailthe pipe form false-fails:grep -qearly-exits on first match → SIGPIPEs the upstreamprintfof the 100KB index → pipefail reports the pipeline as failed even though grep matched. (This bit me while building it — caught and fixed.)pipefail; negative test (bogus expected version) correctly fails, so the guard isn't a no-op.The
pull_requestself-test trigger runs this workflow on this PR, so CI here exercises the new job live.Refs: #3218, wheels-dev/apt-wheels#5