Skip to content

feat(install): opt-in ANYSCAN_INSTALL_KERNEL_BACKPORT for kernel 6.16+ ena_xdp_zc#73

Merged
skullcrushercmd merged 1 commit intomainfrom
feat/anyscan-install-kernel-backport-knob
Apr 28, 2026
Merged

feat(install): opt-in ANYSCAN_INSTALL_KERNEL_BACKPORT for kernel 6.16+ ena_xdp_zc#73
skullcrushercmd merged 1 commit intomainfrom
feat/anyscan-install-kernel-backport-knob

Conversation

@skullcrushercmd
Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in kernel backport upgrade path so an AnyScan worker host can run kernel 6.16+ with the in-flight ena_xdp_zc ENA driver patches that AF_XDP driver-mode zerocopy on ENA needs. Mirrors the PR #71 wire-up pattern: a single env knob declared in install-external-deps.sh + package-worker-bundle.sh + deploy.sh and documented in runtime.worker.env.template.

Why

PR #65 §10 plus the c6in.metal live bench in PR #65 issuecomment-4336192354 (anygpt-42) showed:

Config Aggregate avg vs projection
AF_XDP 8-NIC, cap=4, threads=8 22.43 M peak / 19.20 M avg pps Below the 30–50 M projection

The cause is in the bench notes: "ENA on kernel 6.12.74 supports only drv+copy; zerocopy was tested across all 8 NICs and rejected with Operation not supported. Driver-side zerocopy patches in newer kernels (6.16+ ena_xdp_zc, in-flight upstream) would close the projection gap."

Debian bookworm stock kernel is 6.12. Running 6.16+ on a worker today means installing a backport image. This PR adds the opt-in path; the kernel patches themselves are out of scope.

Design

Single env knob: ANYSCAN_INSTALL_KERNEL_BACKPORT (default 0).

  • 0 → install/deploy paths leave the running kernel untouched. Existing AMIs are unchanged.
  • 1 → install/deploy paths install linux-image-cloud-amd64 from bookworm-backports when the running kernel is older than 6.16. The script never auto-reboots — the kernel image is staged on disk and the operator schedules the reboot themselves. Post-install (or skip-install when already 6.16+), probes /sys/module/ena/version + dmesg for ena_xdp_zc support and warns if absent.

Override knobs for non-default channels (e.g. an internal Debian mirror): ANYSCAN_KERNEL_BACKPORT_{MIN_VERSION,PACKAGE,SUITE,SOURCES_LIST,MIRROR}.

Files touched

File Change
install-external-deps.sh Env knob; kernel_version_at_least() helper; probe_ena_xdp_zc() helper; install_kernel_backport_if_requested() helper called right after install_afxdp_build_deps. Handles root + non-interactive-sudo + non-Debian (no apt-get) gracefully.
package-worker-bundle.sh Env knob declared (mirrors the AF_XDP pattern); bundle README records install_kernel_backport so a downstream operator sees the producer's intent. Bundles do not contain a kernel image — the install happens at deploy time on the target host.
deploy.sh Env knob, mirrored helpers, install call placed after the EUID==0 root check. deploy.sh enforces root, so the helper takes a simpler path (no sudo branch).
runtime.worker.env.template Documents the knob + its overrides in a block alongside the existing ANYSCAN_USE_AF_XDP block.
tools/test-install-external-deps-kernel-backport.sh New bash unit test, 18 assertions across 4 cases.

Test plan

  • bash -n install-external-deps.sh package-worker-bundle.sh deploy.sh tools/test-install-external-deps-afxdp.sh tools/test-install-external-deps-kernel-backport.sh → 5/5 ok
  • tools/test-install-external-deps-kernel-backport.sh18/18 assertions pass across the four operator paths:
    1. ANYSCAN_INSTALL_KERNEL_BACKPORT unset (default 0) → no apt-get install, no apt source list.
    2. ANYSCAN_INSTALL_KERNEL_BACKPORT=1 + running kernel already 6.16 → no apt-get install; "already meets" message printed; ena_xdp_zc probe runs.
    3. ANYSCAN_INSTALL_KERNEL_BACKPORT=1 + running kernel 6.12 + apt-get on PATH → apt-get update + apt-get install -t bookworm-backports linux-image-cloud-amd64 fire; apt source list written; REBOOT REQUIRED notice printed.
    4. ANYSCAN_INSTALL_KERNEL_BACKPORT=1 + running kernel 6.12 + apt-get NOT on PATH → graceful skip, no apt-get invocations, no source list written.
  • tools/test-install-external-deps-afxdp.sh10/10 pass (regression check: PR fix(build): wire ANYSCAN_USE_AF_XDP=1 through install-external-deps + package-worker-bundle + deploy #71 still green).
  • cargo build --workspace → clean (only pre-existing dead-code warnings on anyscan-api.rs, unchanged from main).

Out of scope

Per the anygpt-44 task brief:

  • AMI rebuild
  • auto-reboot
  • the ena driver patches themselves (in-flight upstream)

References

🤖 Generated with Claude Code

…+ ena_xdp_zc

PR 65 issuecomment-4336192354 (anygpt-42 live bench on c6in.metal)
showed ENA on kernel 6.12.74 only supports AF_XDP drv+copy mode, capping
the 8-NIC cap=4 throughput at 22.43M peak / 19.20M avg pps — vs the
30-50M projection in plans/2026-04-27-portscan-afxdp-plan-v1.md §10
that AF_XDP zerocopy was supposed to deliver. The ena_xdp_zc patch
series is in-flight upstream against kernel 6.16+; running it requires
a newer kernel than the Debian bookworm stock 6.12.

This adds an opt-in upgrade path so an operator can stage the backport
kernel image at build/deploy time. Defaults match the PR 71 wire-up
shape (knob declared in install-external-deps.sh, package-worker-bundle.sh,
deploy.sh; documented in runtime.worker.env.template).

Design:
- ANYSCAN_INSTALL_KERNEL_BACKPORT defaults to 0 so existing AMIs are
  unchanged.
- ANYSCAN_INSTALL_KERNEL_BACKPORT=1 makes the install/deploy paths
  install linux-image-cloud-amd64 from bookworm-backports when the
  running kernel is older than 6.16. The script never auto-reboots —
  the new image is staged on disk and the operator schedules the
  reboot themselves.
- Probes /sys/module/ena/version + dmesg for ena_xdp_zc support post
  install (and skips the install if the running kernel is already
  6.16+); warns if absent so the operator knows whether the
  CURRENTLY-RUNNING kernel will deliver zerocopy.
- Override the package / suite / source list path / mirror via
  matching ANYSCAN_KERNEL_BACKPORT_* variables.

Files touched:
- install-external-deps.sh: env knob, kernel_version_at_least() helper,
  probe_ena_xdp_zc() helper, install_kernel_backport_if_requested()
  helper called right after install_afxdp_build_deps. The function
  takes the root or non-interactive-sudo branch; on a non-Debian host
  (no apt-get) it skips with a one-line note.
- package-worker-bundle.sh: same env knob declaration; bundle README
  records install_kernel_backport so a downstream operator can see
  the producer's intent. Bundles do not contain a kernel image — the
  install happens at deploy time on the target host.
- deploy.sh: same env knob, mirrored helpers, install call placed
  after the EUID==0 root check. deploy.sh enforces root, so the helper
  takes a simpler path (no sudo branch).
- runtime.worker.env.template: documents the knob and its overrides
  alongside the existing ANYSCAN_USE_AF_XDP block.
- tools/test-install-external-deps-kernel-backport.sh: new bash unit
  test that stubs uname/apt-get/id/tee/make/git/ldd/readelf on PATH
  and asserts the four operator paths (18 assertions, all pass):
    1. knob=0 (default)                              -> no apt-get
       install for the backport package, no apt source list.
    2. knob=1 + running kernel already 6.16+         -> no apt-get
       install; "already meets" message; ena_xdp_zc probe runs.
    3. knob=1 + running kernel 6.12 + apt-get on PATH -> apt-get
       update + apt-get install -t bookworm-backports
       linux-image-cloud-amd64 fire; apt source list written;
       REBOOT REQUIRED notice printed.
    4. knob=1 + running kernel 6.12 + apt-get NOT on PATH -> graceful
       skip, no apt-get invocations recorded, no source list written.

Verification:
- bash -n install-external-deps.sh package-worker-bundle.sh deploy.sh
  tools/test-install-external-deps-afxdp.sh
  tools/test-install-external-deps-kernel-backport.sh -> 5/5 ok
- tools/test-install-external-deps-kernel-backport.sh -> 18/18 pass
- tools/test-install-external-deps-afxdp.sh             -> 10/10 pass
  (regression check: PR 71 still green)
- cargo build --workspace -> clean (only pre-existing dead-code
  warnings on anyscan-api.rs, unchanged from main)

Out of scope (per anygpt-44 task brief):
- AMI rebuild
- auto-reboot
- the ena driver patches themselves (in-flight upstream)

References: PR 65, PR 65 issuecomment-4336192354 (constraint trace),
PR 71 (AF_XDP build wire-up pattern this mirrors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skullcrushercmd skullcrushercmd merged commit 2acb8bc into main Apr 28, 2026
@skullcrushercmd skullcrushercmd deleted the feat/anyscan-install-kernel-backport-knob branch April 28, 2026 17:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 13b415cba3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +120 to +124
for cmd in bash env mktemp install sed cat dirname basename \
python3 printf chmod cp rm mkdir touch ln awk grep sort \
head tail tr find readlink openssl true false; do
if resolved="$(command -v "$cmd" 2>/dev/null)"; then
ln -sf "$resolved" "$stub_dir/$cmd"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve host PATH when building test stub PATH

The test harness replaces PATH with stub_dir only, but it seeds that directory using raw command -v results (including python3). On hosts where python3 resolves to a shim (for example pyenv/asdf), the shim needs helper binaries that are no longer on PATH, so install-external-deps.sh can hang or fail before assertions run. This makes the new test non-portable and can break CI depending on how Python is installed.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant