Skip to content

fix(build): wire ANYSCAN_USE_DPDK=1 through install-external-deps + bundle + deploy + adapter#81

Merged
skullcrushercmd merged 1 commit intomainfrom
perf/dpdk-impl-wireup
Apr 28, 2026
Merged

fix(build): wire ANYSCAN_USE_DPDK=1 through install-external-deps + bundle + deploy + adapter#81
skullcrushercmd merged 1 commit intomainfrom
perf/dpdk-impl-wireup

Conversation

@skullcrushercmd
Copy link
Copy Markdown
Contributor

Summary

Phase 2 wire-up for the DPDK io_engine landing in AnyVM-Tech/anyscan-engine-c#4. Mirrors PR #71's AF_XDP wire-up shape across the install / bundle / deploy / adapter / install-time-probe chain so the engine repo's USE_DPDK=1 build flag actually reaches every producer of a worker bundle, and so the runtime --io-engine=dpdk knob plumbed through ANYSCAN_SCANNER_IO_ENGINE has DPDK code to dispatch to.

Why DPDK now: AWS ENA on kernel ≤6.12.74 forces AF_XDP into drv+copy mode, capping c6in.metal at ~22M pps aggregate (memory: anyscan_afxdp_ena_constraint, also PR #65 issuecomment-4338158487 — 6.19.11 STILL does not have ena_xdp_zc). DPDK bypasses the kernel ENA driver entirely via vfio-pci and removes the syscall-kick + lower-half-channels-only ZC constraint. Plan: plans/2026-04-28-portscan-dpdk-impl-v1.md (merged in #72).

Companion engine PR

This PR is the AnyScan-side half. The engine-side half is AnyVM-Tech/anyscan-engine-c#4 — that's the PR that adds --io-engine=dpdk support to the bundled scanner. Without it landing first, this PR's USE_DPDK=1 build flag has nothing to compile in.

What's in the PR

Build-flag plumbing (mirrors PR #71)

  • install-external-deps.shANYSCAN_USE_DPDK env knob, binary_has_dpdk_linkage probe (librte_eal.so via ldd → readelf -d), install_dpdk_build_deps (libdpdk-dev + dpdk apt-get, fail-open), cache short-circuit invalidation when cached binary lacks DPDK linkage, vulnscanner_make_args extension, post-build assertion.
  • package-worker-bundle.sh — same env knob, linkage probe, rebuild_scanner_with_dpdk helper, bundle_engine_make_args, README.txt use_dpdk field. Composes with USE_AF_XDP=1 USE_PFRING_ZC=1 — earliest matching rebuild block produces a binary linked against every requested engine in a single make invocation.
  • deploy.sh — same env knob, linkage probe, make_args extension, pre-DPDK cached-binary drop, post-build assertion.

Runtime probe

  • install-worker-bundle.sh::probe_dpdk_runtime_available — five gates:
    1. installed scanner binary at \$VULNSCANNER_BIN_DEST was USE_DPDK-built (closes the same gap PR fix(build): wire ANYSCAN_USE_PFRING_ZC=1 through install-external-deps + package-worker-bundle + deploy + adapter #75 review flagged for pfring_zc)
    2. librte_eal.so loadable via ldconfig
    3. vfio_pci kernel module loaded
    4. at least one hugepage reserved in /sys/kernel/mm/hugepages/*
    5. /dev/vfio/vfio present (kernel-side prerequisite that vfio-pci's bind step would have created)
  • apply_dpdk_availability writes ANYSCAN_DPDK_AVAILABLE to /etc/agentd/runtime.env always (true OR false) so a partial upgrade can't leave a stale true in place.

Adapter

  • vulnscanner-zmap-adapter.py::SUPPORTED_IO_ENGINES gains \"dpdk\".
  • _IO_ENGINE_AVAILABILITY_KEYS maps \"dpdk\"ANYSCAN_DPDK_AVAILABLE so the same fall-back-with-warning path the AF_XDP / PF_RING ZC plumbing already exercises picks up dpdk for free.

Host setup script (NEW)

  • tools/setup-dpdk.sh (~370 LOC) — bind / unbind / status subcommands.
    • Idempotent: re-running on an already-bound system is a no-op.
    • Reversible: unbind returns the NICs to ena and frees hugepages.
    • Hard-coded refusal rules (cannot be bypassed):
      • Refuses to bind eth0 (agentd control-plane interface).
      • Refuses to bind the only NIC (would leave the host without kernel networking).
    • Reserves 1 GiB hugepages first, falls back to 2 MiB. Disables transparent hugepages on bind (DPDK + THP fragments the static hugepage pool).
    • Resolves iface names → BDFs via /sys/class/net/<iface>/device.

Documentation

  • runtime.worker.env.template — full DPDK section documenting ANYSCAN_USE_DPDK (build-time), ANYSCAN_DPDK_AVAILABLE (install probe), ANYSCAN_DPDK_PCI_BDFS (CSV of BDFs/ifaces), ANYSCAN_DPDK_HUGEPAGES_GB (default 4).

Tests (NEW)

  • tools/test-install-external-deps-dpdk.sh (~270 LOC) — mirrors test-install-external-deps-afxdp.sh. Four cases × multiple assertions, hermetic (stubs make/git/ldd/readelf).
  • test_vulnscanner_adapter_io_engine.py — 7 new DPDK assertions covering with-runtime-available, without-runtime-fallback-with-warning, missing-availability-var, uppercase-normalization, and cross-engine availability isolation. Updated test_invalid_value_falls_back_to_af_packet_with_warning to use `fake_engine` instead of `dpdk` (dpdk is now valid).

Verification

Test Result
`tools/test-install-external-deps-afxdp.sh` 11/11 pass (regression OK)
`tools/test-install-external-deps-pfring-zc.sh` 10/10 pass (regression OK)
`tools/test-install-external-deps-dpdk.sh` 10/10 pass
`python3 -m unittest discover` (full repo) 116/116 pass
`bash -n` on every modified shell script clean
`tools/setup-dpdk.sh status` runs clean

Out of scope

These follow as separate work per the plan's §8 rollout:

  • Phase 2 systemd unit edit adding CAP_SYS_RAWIO/CAP_IPC_LOCK/CAP_NET_ADMIN to anyscan-worker.service. Documented in the env template; until that lands operators must add caps manually before flipping the runtime knob.
  • Live c6in.metal bench — separate worker (plan §5.3). The plan projects 50–100M pps; per memory anyscan_aws_pps_allowance AWS may enforce a quota that hits before the engine-side ceiling.
  • AMI rebuild + bundle-default flip to dpdk for c6in.metal tier specifically (plan §8 step 7).
  • mlx5 / non-AWS hardware support — separate research worker.

Test plan

  • All 22 io_engine + dpdk dispatch smoke tests pass on default + USE_AF_XDP + USE_DPDK + USE_AF_XDP+USE_DPDK builds (engine PR fix(api): run hosted-agent bundle build off the tokio worker pool #4).
  • tools/test-install-external-deps-dpdk.sh: 10/10 cases (default, fresh-build, force-rebuild, cached-skip).
  • python3 -m unittest discover: 116 tests pass — 32 in test_vulnscanner_adapter_io_engine, 7 of those DPDK-specific.
  • Regression: test-install-external-deps-afxdp.sh (11/11) and -pfring-zc.sh (10/10) still pass.
  • tools/setup-dpdk.sh status runs cleanly with no NICs bound.
  • Live setup-dpdk.sh bind / unbind against a c6in.metal-class host — separate worker, see plan §5.5.
  • Live c6in.metal bench — separate worker (plan §5.3).

Refs: `plans/2026-04-28-portscan-dpdk-impl-v1.md` (§3.10 wire-up, §3.11 NIC-binding decision, §4.3 kernel feature checks, §5.7 unit test shape)

…undle + deploy + adapter

Phase 2 wire-up for the DPDK io_engine landing in
AnyVM-Tech/anyscan-engine-c PR #4. Mirrors PR #71's AF_XDP wire-up shape
across the install / bundle / deploy / adapter / install-time-probe
chain so the engine repo's USE_DPDK=1 build flag actually reaches every
producer of a worker bundle, and so the runtime --io-engine=dpdk knob
plumbed through ANYSCAN_SCANNER_IO_ENGINE has DPDK code to dispatch to.

Why DPDK now: AWS ENA on kernel ≤6.12.74 forces AF_XDP into drv+copy
mode, capping c6in.metal at ~22M pps aggregate (memory:
anyscan_afxdp_ena_constraint, also PR #65 issuecomment-4338158487 —
6.19.11 STILL does not have ena_xdp_zc). DPDK bypasses the kernel ENA
driver entirely via vfio-pci and removes the syscall-kick + lower-half
-channels-only ZC constraint.

What lands here:
  - install-external-deps.sh: ANYSCAN_USE_DPDK env knob;
    binary_has_dpdk_linkage probe (librte_eal.so via ldd → readelf -d);
    install_dpdk_build_deps (libdpdk-dev + dpdk apt-get, fail-open);
    cache short-circuit invalidation when cached binary lacks DPDK
    linkage; vulnscanner_make_args extension; post-build assertion.
  - package-worker-bundle.sh: same env knob, linkage probe,
    rebuild_scanner_with_dpdk helper, bundle_engine_make_args, README.txt
    use_dpdk field. Composes with USE_AF_XDP=1 USE_PFRING_ZC=1 — the
    earliest matching rebuild block produces a binary linked against
    every requested engine in a single make invocation.
  - deploy.sh: same env knob, linkage probe, make_args extension,
    pre-DPDK cached-binary drop, post-build assertion.
  - install-worker-bundle.sh: binary_has_dpdk_linkage,
    probe_dpdk_runtime_available (5 gates: scanner USE_DPDK-built,
    librte_eal.so loadable, vfio_pci kernel module, hugepages reserved
    in /sys/kernel/mm/hugepages/*, /dev/vfio/vfio present),
    apply_dpdk_availability writing ANYSCAN_DPDK_AVAILABLE.
  - vulnscanner-zmap-adapter.py: SUPPORTED_IO_ENGINES gains "dpdk";
    _IO_ENGINE_AVAILABILITY_KEYS maps "dpdk" → ANYSCAN_DPDK_AVAILABLE
    so the same fall-back-with-warning path the AF_XDP / PF_RING ZC
    plumbing already exercises picks up dpdk for free.
  - runtime.worker.env.template: full DPDK section documenting
    ANYSCAN_USE_DPDK (build-time), ANYSCAN_DPDK_AVAILABLE (install
    probe), ANYSCAN_DPDK_PCI_BDFS (BDF / iface CSV), and
    ANYSCAN_DPDK_HUGEPAGES_GB (default 4).
  - tools/setup-dpdk.sh (NEW, ~370 LOC): bind / unbind / status
    subcommands. Reserves hugepages (1 GiB pages preferred, falls back
    to 2 MiB), modprobe vfio-pci, dpdk-devbind.py --bind=vfio-pci.
    Idempotent (re-runs are no-ops). Reversible (`unbind` returns the
    NICs to ena and frees hugepages). Refuses to bind eth0 (agentd
    control-plane interface) and refuses to bind the only NIC. THP
    gets switched to "never" on bind (DPDK + THP fragments the static
    hugepage pool).
  - tools/test-install-external-deps-dpdk.sh (NEW, ~270 LOC): mirrors
    test-install-external-deps-afxdp.sh. Four cases × multiple
    assertions: default unset → no USE_DPDK=1 in make argv; opt-in +
    missing scanner → USE_DPDK=1; opt-in + cached non-DPDK binary →
    make clean + USE_DPDK=1; opt-in + cached DPDK-linked binary → no
    rebuild. Stubs make/git/ldd/readelf so it runs hermetically.
  - test_vulnscanner_adapter_io_engine.py: 7 new DPDK assertions
    covering the dpdk-with-runtime-available, dpdk-without-runtime
    -fall-back-with-warning, missing-availability-var, uppercase
    normalization, and cross-engine availability isolation cases.
    Updated test_invalid_value_falls_back_to_af_packet_with_warning
    to use "fake_engine" instead of "dpdk" — dpdk is now valid.

Verification (on Debian bookworm with libdpdk-dev 24.11 installed):
  - tools/test-install-external-deps-afxdp.sh: 11/11 (regression OK).
  - tools/test-install-external-deps-pfring-zc.sh: 10/10 (regression OK).
  - tools/test-install-external-deps-dpdk.sh: 10/10.
  - python3 -m unittest discover: 116/116 (32 in
    test_vulnscanner_adapter_io_engine, of which 7 are DPDK-specific).
  - All bash scripts parse cleanly via `bash -n`.
  - tools/setup-dpdk.sh status runs cleanly (no NICs bound, expected).

Engine PR for io_engine_dpdk: AnyVM-Tech/anyscan-engine-c#4

Out of scope (separate workers per the plan):
  - Phase 2 systemd unit edit adding CAP_SYS_RAWIO/CAP_IPC_LOCK/
    CAP_NET_ADMIN to anyscan-worker.service. Documented in the env
    template. Until that lands operators must add caps manually before
    flipping the runtime knob.
  - Live c6in.metal bench (plan §5.3).
  - AMI rebuild.
  - mlx5 / non-AWS hardware support.

Refs: plans/2026-04-28-portscan-dpdk-impl-v1.md (§3.10 wire-up, §3.11
NIC-binding decision, §4.3 kernel feature checks, §5.7 unit test shape).
      anygpt-50

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skullcrushercmd skullcrushercmd merged commit 4faa236 into main Apr 28, 2026
@skullcrushercmd skullcrushercmd deleted the perf/dpdk-impl-wireup branch April 28, 2026 19:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73dbf98418

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread install-worker-bundle.sh
Comment on lines +503 to +504
if [ ! -e /dev/vfio/vfio ]; then
# The vfio control char device is created by the vfio-pci module
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate DPDK readiness on a bound VFIO device, not /dev/vfio/vfio

probe_dpdk_runtime_available treats /dev/vfio/vfio as proof that a NIC is bound for DPDK, but that node can exist as soon as VFIO is loaded even when no PCI device is attached to vfio-pci. In that state this probe can write ANYSCAN_DPDK_AVAILABLE=true, so the adapter forwards --io-engine=dpdk and the scanner can still fail at runtime due to zero usable DPDK ports. The check should verify at least one bound device (for example via /sys/bus/pci/drivers/vfio-pci/* or a /dev/vfio/<group> node) rather than only the control device.

Useful? React with 👍 / 👎.

Comment thread tools/setup-dpdk.sh
Comment on lines +203 to +207
printf '[!] %s: 1 GiB hugepages reservation fell short (got %s, wanted %s); falling back to 2 MiB.\n' \
"$SCRIPT_NAME" "$current" "$target_gb" >&2
fi
fi
if [ -d "$hp2m_dir" ]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Subtract partial 1GiB reservations before 2MiB fallback

When 1GiB hugepage reservation partially succeeds, the fallback path immediately requests the full target_gb * 512 2MiB pages without accounting for already-reserved 1GiB pages. On fragmented hosts this can over-reserve memory beyond the configured target (e.g., partial 1GiB success plus full 2MiB fallback), which can unnecessarily starve system memory during bind. The fallback should either clear partial 1GiB pages first or request only the remaining capacity.

Useful? React with 👍 / 👎.

Comment thread tools/setup-dpdk.sh
Comment on lines +127 to +131
if [ "$entry" = "eth0" ]; then
printf '[!] %s: skipping eth0 (agentd control-plane interface, never bound to vfio-pci).\n' "$SCRIPT_NAME" >&2
continue
fi
# Looks like a BDF (e.g. 0000:00:06.0)?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject control-plane NIC even when provided as PCI BDF

The eth0 safety rule only triggers for the literal token eth0; if the same interface is provided as a PCI BDF, it bypasses this guard and can still be bound to vfio-pci. In multi-NIC environments where another IPv4 NIC exists, count_remaining_kernel_nics may still pass, allowing the control-plane interface to be detached despite the script’s stated hard safety rule. Resolve each candidate BDF back to interface name(s) and enforce the control-plane rejection there as well.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant