Skip to content

[draft] docs(plans): DPDK userspace-networking integration plan (Phase 1)#72

Merged
skullcrushercmd merged 1 commit intomainfrom
perf/portscan-dpdk-impl-plan
Apr 28, 2026
Merged

[draft] docs(plans): DPDK userspace-networking integration plan (Phase 1)#72
skullcrushercmd merged 1 commit intomainfrom
perf/portscan-dpdk-impl-plan

Conversation

@skullcrushercmd
Copy link
Copy Markdown
Contributor

Phase 1 — Design + plan only. No engine C code changes.

Phase 2 implementation is gated on explicit user/orchestrator approval after this plan PR merges. Drafted as part of anygpt-47.

Why

PR #65 landed an AF_XDP integration plan; PR #71 wired the build flag through. The subsequent c6in.metal bench (recorded in memory `anyscan_afxdp_ena_constraint`) showed AWS ENA on kernel ≤6.12.74 forces `drv+copy` (not `drv+zerocopy`), capping the eight-NIC c6in.metal at ~22 M pps aggregate — 2.66× the AF_PACKET baseline but only 22-44% of the AF_XDP plan's 30-50 M pps projection.

PR #63's earlier DPDK scope memo deferred this work on the premise that AF_XDP would clear the throughput target without owning a scanner fork. That premise is invalidated.

DPDK via `vfio-pci` bypasses the ENA kernel driver entirely — no `XDP_ZEROCOPY` cooperation, no `sendto(MSG_DONTWAIT)` wakeup kicks, no kernel-channel constraint. Realistic projection on c6in.metal: 50-100 M pps.

The fork half is also already done — `AnyVM-Tech/anyscan-engine-c` exists (PR #65 / #71 work landed there) and already has a working `io_engine_vtable_t` dispatch with three slots (`af_packet`, `pfring_zc`, `af_xdp`). DPDK slots into the same shape.

What this PR adds

A single new file: `plans/2026-04-28-portscan-dpdk-impl-v1.md` (690 lines).

The plan is comprehensive and mergeable as a reference doc — the user has it in-tree without committing to implementation.

Sections (mirrors PR #65's AF_XDP plan structure):

NIC-binding decision (the question the brief asked)

Recommendation: dedicated-DPDK-NIC.

  • agentd needs kernel networking on eth0 for control-plane heartbeat / remote-update / journal shipping. Binding eth0 to `vfio-pci` would cut the worker off from the orchestrator.
  • c6in.metal already has 8 ENIs. eth1..eth7 are scan-only; binding seven NICs to `vfio-pci` while leaving eth0 on the kernel is a clean partition. No new NICs need to be added.
  • Single-NIC tiers are explicitly not eligible for DPDK in v1. Skip rule lives in `tools/setup-dpdk.sh` (refuses to bind eth0 or the only NIC) AND in `install-worker-bundle.sh::probe_dpdk_runtime_available` (sets `ANYSCAN_DPDK_AVAILABLE=false` when NIC count < 2). Degrades cleanly to AF_XDP/AF_PACKET on those tiers — no operator-visible failure.
  • Bind-at-runtime would force a complex "swap eth1 from kernel to DPDK then back" dance for every scan, with the kernel routing table changing under live agentd. Not worth the complexity.

§3.11 has the full reasoning.

LOC estimate

Component Est. LOC
Engine repo (`anyscan-engine-c`): `send-dpdk.c` + `recv-dpdk.c` + `dpdk-eal.c` + `dpdk-defs.h` + vtable / Makefile / CLI deltas ~1,100
AnyScan repo: `tools/setup-dpdk.sh` + 5 modified scripts + adapter Python + 1 new bash test ~765
Grand total Phase 2 ~1,865

~3.2× the scope of PR #65's AF_XDP plan (580 LOC). The brief's framing ("highest theoretical ceiling but the largest engineering scope") matches.

Coordination

  • ✅ Stayed out of `anyscan_rate_controller.py` (anygpt-33 owns it). §6 risk register flags one item that needs anygpt-33 coordination during Phase 2 (AIMD ceiling parameter — the DPDK ceiling is higher than AF_XDP's, so this risk is more acute).
  • ✅ Did not touch `/etc/anyscan/runtime.env` or anything ops-owned.
  • ✅ Did not touch the AnyGPT submodule pointer.
  • ✅ Did not modify the engine repo (anyscan-engine-c) in this PR.

Out of scope (explicit)

  • Writing the DPDK C code (Phase 2).
  • Bumping the AnyGPT submodule pointer.
  • Editing prod systemd units or `runtime.env`.
  • Modifying `anyscan-engine-c` source.

Reviewer ask

Please verify:

  1. The plan is comprehensive enough that a Phase 2 worker can execute task-by-task without needing additional context.
  2. The NIC-binding decision (§3.11) is acceptable — if not, the open question to re-litigate is whether DPDK becomes viable on smaller-tier instances via runtime bind/unbind.
  3. The ~3-4 week effort envelope is the right sizing call for the 50-100 M pps payoff vs. AF_XDP's confirmed ~22 M pps.
  4. §3.4's EAL argv passthrough convention (`-- -l 0-7 --socket-mem ...`) is acceptable, vs alternatives like a single `--dpdk-eal-args="-l 0-7 --socket-mem ..."` flag.

Disposition

Draft. Do not merge until review approval.

Close + delete branch if a different path (e.g. host-tier change instead of DPDK) is chosen.

Promote to non-draft once review feedback is addressed.

🤖 Generated with Claude Code

Phase 1 design document for adding a DPDK io_engine to the bundled C
scanner (AnyVM-Tech/anyscan-engine-c). Mirrors PR #65's AF_XDP plan
structure across §1-§10.

Why now: PR #65's AF_XDP work landed but the c6in.metal bench revealed
ENA on kernel <=6.12.74 forces drv+copy (not drv+zerocopy), capping the
8-NIC ceiling at ~22 M pps — short of the 30-50 M pps projection. DPDK
via vfio-pci bypasses the ENA kernel driver entirely, projecting
50-100 M pps realistic on c6in.metal.

This supersedes PR #63's deferral recommendation (which was conditioned
on AF_XDP clearing the throughput target — it did not).

Plan scope:
- engine repo: ~1,100 LOC (send-dpdk.c, recv-dpdk.c, dpdk-eal.c,
  dpdk-defs.h, vtable slot in engine.c, USE_DPDK Makefile block)
- AnyScan-side wire-up: ~765 LOC (mirrors PR #71's ANYSCAN_USE_AF_XDP
  pattern across install-external-deps.sh / package-worker-bundle.sh /
  deploy.sh / runtime.worker.env.template / adapter.py + new
  tools/setup-dpdk.sh for hugepages and vfio-pci bind/unbind)
- NIC-binding decision: dedicated-DPDK-NIC pattern. eth0 stays on
  kernel for agentd heartbeat; ENIs eth1..eth7 (c6in.metal) go to
  vfio-pci. Single-NIC instances are DPDK-ineligible by design.
- Effort: 12-15 days implementation + canary, ~3-4 weeks total.

Phase 2 implementation is gated on user/orchestrator approval after
this plan PR merges. No engine C code, no runtime config, no submodule
bumps in this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skullcrushercmd skullcrushercmd marked this pull request as ready for review April 28, 2026 18:54
@skullcrushercmd skullcrushercmd merged commit 83bd58a into main Apr 28, 2026
@skullcrushercmd skullcrushercmd deleted the perf/portscan-dpdk-impl-plan branch April 28, 2026 18:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0ddceef64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +233 to +236
1. Pre-`parse_arguments`: scan argv for `--io-engine=dpdk`. If found, locate `--`, split argv at that position. Stash the EAL slice for later.
2. `parse_arguments` runs against the truncated scanner argv (the existing `getopt_long` loop is unmodified).
3. After `parse_arguments`, if `config.io_engine == IO_ENGINE_DPDK`, call `dpdk_eal_bringup(config, eal_argv, eal_argc)` which:
- Calls `rte_eal_init(eal_argc, eal_argv)`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve argv[0] when passing EAL args to rte_eal_init

This sequencing drops the program-name element from the EAL argv: after splitting on --, the EAL slice starts at -l (per the example), but rte_eal_init(argc, argv) expects an argv shaped like a normal main() argument vector with argv[0] present. If implemented as written, EAL option parsing can skip/mis-handle the first flag and lead to wrong core/device selection or init failure in DPDK mode.

Useful? React with 👍 / 👎.

Comment on lines +500 to +502
apt-get install -y --no-install-recommends \
libdpdk23 dpdk
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use installable Ubuntu DPDK runtime package names

The runtime install command uses libdpdk23 (and later libdpdk21), but Ubuntu’s DPDK runtime is packaged as dpdk plus split librte-* runtime libraries (with libdpdk-dev for build-time). Following this plan as-is will make apt-get install fail on the worker path, which then prevents the DPDK availability probe from succeeding and blocks the feature rollout.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant