Skip to content

ci: add index-integrity probe to distribution install smoke (#3218 regression guard)#3234

Merged
bpamiri merged 2 commits into
developfrom
peter/smoke-index-integrity
Jun 19, 2026
Merged

ci: add index-integrity probe to distribution install smoke (#3218 regression guard)#3234
bpamiri merged 2 commits into
developfrom
peter/smoke-index-integrity

Conversation

@bpamiri

@bpamiri bpamiri commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Why

apt install wheels on the stable channel was failing (#3218) because a cross-channel index clobber empties the stable apt Packages index for most of the day. Root cause fixed in wheels-dev/apt-wheels#5 (per-channel regen). This PR adds the regression guard to the daily smoke so it can't silently come back.

The gap in the existing smoke

The install legs install the CLI per channel and assert the version — but they sample once at 14:00 UTC. The clobber is transient: a bleeding-edge snapshot wipes the stable index minutes after each stable publish, so stable is empty most of the day and only briefly populated. A clobber can land outside the install legs' window and slip past them; and "apt install failed" doesn't name the cause.

What this adds

A fast, container-free index-integrity job that probes the published apt + yum dist indexes directly:

  • apt stable (amd64 and arm64): index non-empty and lists the current GA (Version: <GA>).
  • yum stable: repomd.xmlprimary.xml.gz lists the GA.
  • apt + yum bleeding-edge: non-empty (versions float, so non-empty only) — the clobber was bidirectional, so both directions are guarded.

A missing/empty/stale index fails with a message that names the #3218 regression.

Reliability (verified locally against the live indexes)

  • Fetches use curl --retry 4 --retry-all-errors --max-time 60 so a transient blip can't false-red a daily guardian.
  • All grep checks feed from a here-string (grep -q PAT <<<"$body"), not printf | grep -q. Under set -o pipefail the pipe form false-fails: grep -q early-exits on first match → SIGPIPEs the upstream printf of the 100KB index → pipefail reports the pipeline as failed even though grep matched. (This bit me while building it — caught and fixed.)
  • 3/3 green against the live indexes under pipefail; negative test (bogus expected version) correctly fails, so the guard isn't a no-op.

The pull_request self-test trigger runs this workflow on this PR, so CI here exercises the new job live.

Refs: #3218, wheels-dev/apt-wheels#5

…gression guard)

The daily install-smoke installs the CLI per channel and asserts the version,
but that only samples once at 14:00 UTC. The #3218 cross-channel clobber empties
the stable apt index for most of the day (a bleeding-edge snapshot wipes it
minutes after each stable publish; it's only briefly populated), so a clobber
can land outside the install legs' sample window and slip past them — and when
they do fail, "apt install failed" doesn't name the cause.

Add a fast, container-free `index-integrity` job that probes the PUBLISHED apt
and yum dist indexes directly: stable must be non-empty and name the current GA
(per arch for apt), and bleeding-edge must stay non-empty (the clobber was
bidirectional — apt-wheels#5 scopes regen per-channel to prevent both
directions). An empty/missing/stale index fails with a message that names the
regression.

Reliability: fetches use `curl --retry ... --retry-all-errors` so a transient
blip can't false-red the guardian, and all grep checks feed from a here-string
(`grep -q PAT <<<"$body"`) rather than `printf | grep -q` — under `set -o
pipefail` the latter false-fails when grep -q early-exits and SIGPIPEs the
upstream printf of a 100KB index. Verified: 3/3 green against the live indexes,
and a negative test (bogus version) correctly fails.

Refs: #3218, wheels-dev/apt-wheels#5

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Peter Amiri <petera@pai.com>

@wheels-bot wheels-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer

TL;DR — This PR adds an index-integrity job to the distribution install smoke workflow that probes the published apt/yum dist indexes directly, catching the #3218 cross-channel clobber deterministically rather than relying on the once-a-day install legs to happen to sample inside the brief post-publish window. The script is defensively written (injection-safe EXPECTED via env, retrying curl, here-strings to dodge the pipefail/SIGPIPE false-red), self-tests via the existing pull_request trigger, and is documented thoroughly inline. Verdict: comment — no correctness, cross-engine, or security issues; two minor regex-hardening nits below, neither blocking.

Correctness

No blocking issues. The accumulate-then-exit 1 pattern (set -uo pipefail without -e, note() setting fail=1, final [ "$fail" -ne 0 ]) correctly reports all failing indexes in one run instead of bailing on the first. The body="$(fetch … || true)" + [ -z "$body" ] guard correctly turns a post-retry curl failure into a named EMPTY failure rather than a bare non-zero exit. The grep … | head -1 || true on line 158 has the same SIGPIPE-under-pipefail shape the comment block warns about, but it is correctly neutralised by || true and the value is already captured by the command substitution — consistent, not a bug.

Conventions

Two minor, non-blocking hardening nits — both about treating a version string as a regex:

  • Line 136 (grep -q "Version: ${EXPECTED}") and line 166 (grep -q "ver=\"${assert_ver}\""): EXPECTED is semver, so its dots are regex any-char metacharacters. Since it is resolve-validated to ^[0-9]+\.[0-9]+\.[0-9]+$ there is no injection risk, but the unescaped dots mean 4.0.5 would also match a hypothetical 4x0y5 stanza — a (very contrived) false-positive that could let a bad index pass. Using grep -qF makes the match exact (grep -qF "Version: ${EXPECTED}" / grep -qF "ver=\"${assert_ver}\"").

  • Line 158 (grep -oE 'repodata/[a-f0-9]+-primary\.xml\.gz') hardcodes gzip. Modern createrepo_c can emit .zst primary metadata, which would make this probe false-red (no loc → "no primary metadata"). Not actionable today — you verified 3/3 green against the live .gz indexes — but worth a one-line comment so a future yum-wheels compression switch points back here rather than looking like a real clobber.

Cross-engine

N/A — pure GitHub Actions / bash, no CFML touched.

Tests

Appropriate for a CI-only change: the existing pull_request: paths: [distribution-install-smoke.yml] self-test trigger (lines 50–54) runs the new job live against the real indexes on this very PR, and the PR body reports 3/3 green plus a negative test (bogus version correctly fails). No framework spec applies.

Docs

No changelog fragment needed — this is workflow-internal tooling, not a user-facing fix/feat. The inline comment block (lines 17–32, 95–101) is updated to describe both layers accurately.

Commits

ci: add index-integrity probe to distribution install smoke (#3218 regression guard) — valid ci type, no scope, subject not ALL-CAPS, header well under 100 chars. Conforms to commitlint.config.js.

Security

Clean. EXPECTED is semver-validated in resolve and passed via env: (line 105), never interpolated into the shell; it only reaches a grep pattern (regex-pattern context, hardened further by the -F suggestion above). CB derives from github.run_id/run_attempt. No secrets, no untrusted interpolation, permissions: contents: read only.

…ctly

Two corrections to the index-integrity job after finding the real #3218
failure mode is CDN edge-cache staleness, not just the origin clobber:

1. Drop the `?cb=` cache-buster. apt/dnf fetch the plain URLs that hit
   Cloudflare's edge; a cache-buster hit R2 origin instead, so the probe went
   green while real `apt install` failed on a stale edge-cached Packages.gz.
   Test the SAME plain URLs clients use.

2. Verify hash-consistency, exactly what the package managers verify: parse the
   SHA256 the (plain) Release records for each binary-<arch>/Packages.gz, fetch
   the (plain) Packages.gz, and compare — this is precisely the "File has
   unexpected size" check that failed in #3218. yum: verify the served
   primary.xml.gz matches the hash repomd references. Then gunzip and confirm
   the content lists the GA (catches an empty-but-internally-consistent index).

Binary correctness: gzip blobs are fetched to a temp file and hashed/gunzipped
from the file — never captured into a shell variable, since `$(...)` is
text-only and silently corrupts binary (strips trailing newlines, drops NULs),
which would compute a bogus hash.

Paired with apt-wheels#6 / yum-wheels#6 (set `no-store` on metadata so the edge
stops caching it). Verified against the live indexes: correctly RED on the
currently-stale apt stable amd64 edge, GREEN on arm64 / bleeding-edge / yum.

Refs: #3218, wheels-dev/apt-wheels#6, wheels-dev/yum-wheels#6

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Peter Amiri <petera@pai.com>

@wheels-bot wheels-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheels Bot — Reviewer

TL;DR — This re-review covers the new 61103bb9 commit, which upgrades the index-integrity job from a plain fetch-and-grep into a hash-consistency probe that mirrors what apt-get update / dnf actually verify: it parses the SHA256 the (plain) Release records for binary-<arch>/Packages.gz, fetches that same plain URL, compares the served hash to the recorded one, then gunzips and asserts the GA is listed — and does the dnf analogue via repomd.xml<sha>-primary.xml.gz. This directly reproduces the #3218 "File has unexpected size" failure mode (stale edge-cached index vs. fresh Release) that a naive non-empty check would miss, and the deliberate "no cache-buster, fetch the plain edge URLs" decision is now correct and well-documented. Verdict: comment — no correctness, cross-engine, or security issues; the two regex-hardening nits from the prior round still stand plus one trivial dead-variable cleanup, none blocking.

Correctness

No blocking issues. The new hash-consistency layer is sound:

  • The awk extraction of the per-arch Packages.gz SHA256 (line 152) correctly gates on the ^SHA256: section header, resets on the next ^[A-Z] section, and matches the path in $3 (data lines begin with a space, so the path is reliably field 3). The binary-to-temp-file discipline (sha256f on a path, never piped — comment at lines 132–136) correctly avoids command-substitution corrupting the gzip blob.
  • The failure ordering is right: a failed Packages.gz fetch leaves $tmp empty, so exp != act fires the hash-mismatch branch (line 161) before the version branch — i.e. a torn/stale publish is reported as exactly that, not as "doesn't list the GA." The act:-<empty> placeholder is cosmetic (an empty file still hashes to a value), not a bug.
  • set -uo pipefail without -e plus the accumulate-then-exit 1 pattern still correctly reports all failing indexes in one run. The awk … | head -1 on line 152 has no || true, but with -e absent the captured value is unaffected and the ignored pipeline status is harmless — consistent with the SIGPIPE reasoning in the comment block.

Conventions

Three minor, non-blocking nits — two carry over from the prior round and remain accurate against the new code:

  • Line 162 (grep -q "Version: ${EXPECTED}") and line 204 (grep -q "ver=\"${assert_ver}\""): both treat a semver string as a regex, so its dots are any-char metacharacters. EXPECTED is resolve-validated to ^[0-9]+\.[0-9]+\.[0-9]+$ (no injection risk), but the unescaped dots mean 4.0.5 would also match a hypothetical 4x0y5 stanza — a contrived false-positive that could let a bad index pass. Both patterns are fixed strings, so grep -qF "Version: ${EXPECTED}" / grep -qF "ver=\"${assert_ver}\"" makes the match exact at no cost.
  • Line 187 (grep -oE 'repodata/[a-f0-9]+-primary\.xml\.gz') and the paired filename-as-sha256 assumption hardcode gzip. Modern createrepo_c can emit .zst primary metadata, which would make this probe false-red ("no primary metadata"). Not actionable today — verified green against the live .gz indexes — but a one-line comment would point a future yum-wheels compression switch back here instead of looking like a real clobber.
  • Line 185 (local repomd loc exp gz act content): gz is declared but never assigned or read. Trivial — drop it from the local list.

Cross-engine

N/A — pure GitHub Actions / bash, no CFML touched.

Tests

Appropriate for a CI-only change. The pull_request: branches:[develop] paths:[distribution-install-smoke.yml] self-test trigger (lines 50–54) runs the rewritten index-integrity job live against the real indexes on this very PR — needs: resolve, no if: gate, so it executes here. The PR body reports 3/3 green plus a negative test (bogus version fails), so the guard isn't a no-op. No framework spec applies.

Docs

No changelog fragment needed — this is workflow-internal CI tooling, not a user-facing fix/feat. The header comment block (lines 12–32) and the inline env:/run: comments accurately describe both the hash-consistency logic and the deliberate plain-URL / no-cache-buster decision.

Commits

Both commits conform to commitlint.config.js: ci: add index-integrity probe … and ci: probe plain (edge) URLs + hash-consistency, mirroring apt/dnf exactly — valid ci type, no scope, subjects not ALL-CAPS, headers under 100 chars. Subjects describe the "why" (reproduce the real apt/dnf verification path) rather than just the "what."

Security

Clean. EXPECTED is semver-validated in resolve (line 83) and reaches the script only via env: (line 105), never interpolated into a shell command — it lands in a grep regex-pattern context (hardened further by the -F suggestion above). The workflow_dispatch INPUT is likewise env-passed and semver-gated before use. No secrets, no untrusted interpolation, permissions: contents: read only.

@bpamiri bpamiri merged commit 61c594e into develop Jun 19, 2026
20 of 22 checks passed
@bpamiri bpamiri deleted the peter/smoke-index-integrity branch June 19, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant