[draft] docs(plans): DPDK userspace-networking integration plan (Phase 1)#72
Conversation
Phase 1 design document for adding a DPDK io_engine to the bundled C scanner (AnyVM-Tech/anyscan-engine-c). Mirrors PR #65's AF_XDP plan structure across §1-§10. Why now: PR #65's AF_XDP work landed but the c6in.metal bench revealed ENA on kernel <=6.12.74 forces drv+copy (not drv+zerocopy), capping the 8-NIC ceiling at ~22 M pps — short of the 30-50 M pps projection. DPDK via vfio-pci bypasses the ENA kernel driver entirely, projecting 50-100 M pps realistic on c6in.metal. This supersedes PR #63's deferral recommendation (which was conditioned on AF_XDP clearing the throughput target — it did not). Plan scope: - engine repo: ~1,100 LOC (send-dpdk.c, recv-dpdk.c, dpdk-eal.c, dpdk-defs.h, vtable slot in engine.c, USE_DPDK Makefile block) - AnyScan-side wire-up: ~765 LOC (mirrors PR #71's ANYSCAN_USE_AF_XDP pattern across install-external-deps.sh / package-worker-bundle.sh / deploy.sh / runtime.worker.env.template / adapter.py + new tools/setup-dpdk.sh for hugepages and vfio-pci bind/unbind) - NIC-binding decision: dedicated-DPDK-NIC pattern. eth0 stays on kernel for agentd heartbeat; ENIs eth1..eth7 (c6in.metal) go to vfio-pci. Single-NIC instances are DPDK-ineligible by design. - Effort: 12-15 days implementation + canary, ~3-4 weeks total. Phase 2 implementation is gated on user/orchestrator approval after this plan PR merges. No engine C code, no runtime config, no submodule bumps in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0ddceef64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| 1. Pre-`parse_arguments`: scan argv for `--io-engine=dpdk`. If found, locate `--`, split argv at that position. Stash the EAL slice for later. | ||
| 2. `parse_arguments` runs against the truncated scanner argv (the existing `getopt_long` loop is unmodified). | ||
| 3. After `parse_arguments`, if `config.io_engine == IO_ENGINE_DPDK`, call `dpdk_eal_bringup(config, eal_argv, eal_argc)` which: | ||
| - Calls `rte_eal_init(eal_argc, eal_argv)`. |
There was a problem hiding this comment.
Preserve argv[0] when passing EAL args to rte_eal_init
This sequencing drops the program-name element from the EAL argv: after splitting on --, the EAL slice starts at -l (per the example), but rte_eal_init(argc, argv) expects an argv shaped like a normal main() argument vector with argv[0] present. If implemented as written, EAL option parsing can skip/mis-handle the first flag and lead to wrong core/device selection or init failure in DPDK mode.
Useful? React with 👍 / 👎.
| apt-get install -y --no-install-recommends \ | ||
| libdpdk23 dpdk | ||
| ``` |
There was a problem hiding this comment.
Use installable Ubuntu DPDK runtime package names
The runtime install command uses libdpdk23 (and later libdpdk21), but Ubuntu’s DPDK runtime is packaged as dpdk plus split librte-* runtime libraries (with libdpdk-dev for build-time). Following this plan as-is will make apt-get install fail on the worker path, which then prevents the DPDK availability probe from succeeding and blocks the feature rollout.
Useful? React with 👍 / 👎.
Phase 1 — Design + plan only. No engine C code changes.
Phase 2 implementation is gated on explicit user/orchestrator approval after this plan PR merges. Drafted as part of anygpt-47.
Why
PR #65 landed an AF_XDP integration plan; PR #71 wired the build flag through. The subsequent c6in.metal bench (recorded in memory `anyscan_afxdp_ena_constraint`) showed AWS ENA on kernel ≤6.12.74 forces `drv+copy` (not `drv+zerocopy`), capping the eight-NIC c6in.metal at ~22 M pps aggregate — 2.66× the AF_PACKET baseline but only 22-44% of the AF_XDP plan's 30-50 M pps projection.
PR #63's earlier DPDK scope memo deferred this work on the premise that AF_XDP would clear the throughput target without owning a scanner fork. That premise is invalidated.
DPDK via `vfio-pci` bypasses the ENA kernel driver entirely — no `XDP_ZEROCOPY` cooperation, no `sendto(MSG_DONTWAIT)` wakeup kicks, no kernel-channel constraint. Realistic projection on c6in.metal: 50-100 M pps.
The fork half is also already done — `AnyVM-Tech/anyscan-engine-c` exists (PR #65 / #71 work landed there) and already has a working `io_engine_vtable_t` dispatch with three slots (`af_packet`, `pfring_zc`, `af_xdp`). DPDK slots into the same shape.
What this PR adds
A single new file: `plans/2026-04-28-portscan-dpdk-impl-v1.md` (690 lines).
The plan is comprehensive and mergeable as a reference doc — the user has it in-tree without committing to implementation.
Sections (mirrors PR #65's AF_XDP plan structure):
NIC-binding decision (the question the brief asked)
Recommendation: dedicated-DPDK-NIC.
§3.11 has the full reasoning.
LOC estimate
~3.2× the scope of PR #65's AF_XDP plan (580 LOC). The brief's framing ("highest theoretical ceiling but the largest engineering scope") matches.
Coordination
Out of scope (explicit)
Reviewer ask
Please verify:
Disposition
Draft. Do not merge until review approval.
Close + delete branch if a different path (e.g. host-tier change instead of DPDK) is chosen.
Promote to non-draft once review feedback is addressed.
🤖 Generated with Claude Code