Add UDP GSO/GRO support (Linux) and --no-gsro switch#1925
Add UDP GSO/GRO support (Linux) and --no-gsro switch#1925gegles wants to merge 4 commits intoesnet:masterfrom
Conversation
|
FYI @bmah888 or @jefposkanzer, any way someone can review and hopefully merge this? |
|
Is anybody able to review this? Who are the active/official maintainers for this project? |
|
@marcosfsch, FYI (as I saw you're basing your work on this branch), I added a fix for the loss calculation... |
| break; | ||
| #endif | ||
| #if defined(HAVE_UDP_SEGMENT) || defined(HAVE_UDP_GRO) | ||
| case OPT_NO_GSRO: |
There was a problem hiding this comment.
I don't have experience with using gso/gro, but since there are two separate options for setsockopt(), it may be better to also allow setting these options separately. This can be done by adding an optional_argument:
--no-gsro [<GSO>][/<GRO>], where GSO/GRO are boolean (0/1, T/F, E/N (enable/disable), etc.). The default will be that gso/gro will not be supported as it is implemented now (with the name of the option, this probably means they are set to true by default). For example of how to define and parse option with optional arguments see --cntl-ka (OPT_CNTL_KA).
There was a problem hiding this comment.
@davidBar-On I hear what you’re saying, and I’m open to changing the option’s behavior. For example, I originally had two distinct options to explicitly enable or disable each feature (--no-gso and --no-gro). But then I started thinking: if the kernel supports them and they’re implemented correctly, why wouldn’t we want both enabled by default? The packets on the wire are identical, and the benefits—whether higher throughput or lower CPU usage—are significant.
That said, if the consensus is to provide more knobs and finer-grained control, I’m fine with that too.
src/iperf_api.c
Outdated
| test->settings->gso_bf_size = (test->settings->gso_bf_size / test->settings->gso_dg_size) * test->settings->gso_dg_size; | ||
| } else { | ||
| /* If gso_dg_size is 0 (unlimited bandwidth), use default UDP datagram size */ | ||
| test->settings->gso_dg_size = 1472; /* Standard UDP payload size for Ethernet MTU */ |
There was a problem hiding this comment.
Why not using DEFAULT_UDP_BLKSIZE instead of 1472? If it is important that 1472 will be used, suggest to add #define DEFAULT_GSO_DG_BLKSIZE 1472.
There was a problem hiding this comment.
Thanks for the catch. I addressed it in 569ab0b
I removed the hard-coded 1472 and switched the GSO fallback to use the existing DEFAULT_UDP_BLKSIZE.
This avoids the magic number and keeps a conservative default that’s safer across common IPv4/IPv6 paths when MSS isn’t available. Also verified that when users specify
-l/--length, that value drives both the UDP block size and gso_dg_size; otherwise gso_dg_size follows the computed blksize, only falling back to DEFAULT_UDP_BLKSIZE in the unlimited (0) case.
| test->settings->socket_bufsize = j_p->valueint; | ||
| if ((j_p = iperf_cJSON_GetObjectItemType(j, "len", cJSON_Number)) != NULL) | ||
| test->settings->blksize = j_p->valueint; | ||
| #ifdef HAVE_UDP_SEGMENT |
There was a problem hiding this comment.
Since get_parameters() is run by the Server and --no-gsro is a Client only parameter, then there should be no related logic here. Instead, all the related values should be sent in send_parameters() (by the Client), and get_parameters() will just get them and set the related variables.
There was a problem hiding this comment.
Thanks for pointing this out. Addressed in e63ff17.
I moved the GSO/GRO policy decisions out of get_parameters() and into the client side:
- The client now includes GSO/GRO intent and sizes in
send_parameters(). The server’sget_parameters()just reads those values and doesn’t recompute fromblksize. - Each endpoint independently attempts to enable the feature on its own sockets. If the local kernel rejects it (e.g., setsockopt(UDP_SEGMENT/UDP_GRO) fails), we log and disable that feature locally. This correctly handles the asymmetric case (sender uses GSO, receiver uses GRO) and
kernel capability differences. - For compatibility with older clients that don’t send GSO fields, the server derives
gso_dg_sizefromblksizeas a fallback to preserve prior behavior. If the client sends explicit values (including--no-gsrosendinggso=0/gro=0), the server honors them as-is.
This keeps --no-gsro a client-only knob, avoids server-side policy decisions, and still respects local kernel capabilities at runtime. Build and unit tests pass locally.
Yes — I reviewed the prior related PRs. Most are quite old/stale, not rebased onto current master, and miss key pieces we’ve addressed here:
This PR is up to date, rebased, and integrates:
If there’s a specific older PR with a feature we should carry forward, I’m happy to cross‑check and fold it in. |
|
@davidBar-On Any further comment? or could this be approved/merged? ;-) |
@gegles, I am not from the iperf3 maintenance team, so I don't know if or when this PR can be merged into the the mainline. As I wrote, my own opinion is that backward compatibility must be kept, even if ideally your approach is the better. However, this is only my private opinion and I cannot speak for the iperf3 team. |
|
Hi! Thank you for the pull request. We've looked at it briefly, but we might take a little more time to evaluate this change, especially since it's so large and it's enabled by default. We like the performance increase, but we're concerned about the maintainability over time, so we'd like to take a closer look at it. |
My pleasure! Yeah, totally understand. Please feel free to change/tweak anything or let me know how I can help. Ultimately, I see no reasons why it shouldn't be on by default wherever available, but I am also fine to reverse the logic if we want to take a more careful approach first... LMK. |
|
@swlars, just FYI, I rebased on latest main (no changes, just a minor conflict resolution. Any idea when you'll be able to review/merge this? LMK. thx! |
|
@swlars, just FYI, I rebased on latest main (no changes, just a minor conflict resolution. Any idea when you'll be able to review/merge this? LMK. thx! |
|
@swlars, just FYI, I rebased on latest main. Any idea when you'll be able to review/merge this? LMK. thx! |
This change adds first-class support for Linux UDP Generic Segmentation Offload (GSO) and Generic Receive Offload (GRO) in iperf3. At configure time, the build detects availability of the UDP_SEGMENT and UDP_GRO socket options via <linux/udp.h> and enables code paths accordingly. On capable systems, these features are now enabled by default for UDP tests. A new command-line flag, --no-gsro, allows users to disable GSO and GRO even when supported by the kernel. Help text is included in usage_longstr. Additional changes: - Updated iperf_settings to track GSO/GRO state and buffer/segment sizes. - Added a warning if the configured UDP block size exceeds the TCP MSS. - Ensured behavior is unchanged on systems without GSO/GRO support. GSO can reduce CPU overhead on send by offloading UDP segmentation to the kernel/NIC. GRO can reduce per-packet processing cost on receive by coalescing incoming UDP segments. Together they can improve throughput and efficiency in high-rate UDP tests on modern Linux systems. # Conflicts: # src/iperf_api.c # src/iperf_api.h
Parse coalesced GRO payloads using the negotiated blksize stride and account loss/jitter per datagram. Avoids inflated “loss” when kernel GRO hints are unreliable. No change to GSO send behavior.
Use the existing DEFAULT_UDP_BLKSIZE for GSO datagram-size fallback instead of the literal 1472. Rationale: avoids a magic number and keeps a conservative, widely safe default across IPv4/IPv6 when the control socket MSS is unavailable. In normal operation UDP blksize is derived from the control TCP MSS; this constant fallback is only used when MSS cannot be determined or when the computed gso_dg_size ends up 0 (unlimited case). Behavior notes: - If the user sets -l/--length, that value drives both UDP block size and gso_dg_size. - Otherwise, gso_dg_size tracks the chosen blksize; it falls back to DEFAULT_UDP_BLKSIZE only when the computed value is 0. Files: - src/iperf_api.c: update two fallback sites - src/iperf_client_api.c: update fallback site
…pts and applies locally Client remains the source of truth for UDP GSO/GRO policy (including --no-gsro). During parameter exchange, the client now sends GSO/GRO flags and sizes in JSON, and the server simply consumes those values without recomputing. Kernel capability gating stays local and authoritative: each endpoint attempts to enable GSO/GRO on its own sockets via setsockopt, and if the kernel rejects it, we log and flip the local flag off. This handles the case where only one side supports the feature (GSO for the sender, GRO for the receiver). Backward compatibility: if talking to an older client that doesn’t send GSO fields, the server derives gso_dg_size from blksize and adjusts gso_bf_size, falling back to DEFAULT_UDP_BLKSIZE if zero. This preserves previous behavior without overriding explicit client intent. Behavior details: - --no-gsro on the client sends gso=0 and gro=0, so the server won’t try to enable them. - If -l/--length is provided, blksize (and therefore gso_dg_size when enabled) follows that value; otherwise default logic applies. Files: src/iperf_api.c (send_parameters adds gso/gro fields; get_parameters reads them and removes server-side recompute unless needed for compatibility).
|
@swlars, just FYI, I rebased on latest main. Any idea when you'll be able to review/merge this? LMK. thx! |
|
Hi! Thanks for the pull request, we're going back and forth on whether to include this feature because we're concerned about the amount of support it might need in the future. We're not sure how we would test and maintain it, and while there's significant community support, our main focus is supporting ESnet and they currently have no need for it. Is there a specific use case you're considering? |
I see the concern around defaults, but I think the growing importance of UDP throughput makes a strong case for enabling this by default. With protocols like QUIC/HTTP-3 becoming common, UDP is no longer a niche or advanced use case. Having iperf exercise UDP paths in a way that better reflects real-world traffic helps ensure results are meaningful without requiring users to know which additional flags to set. Since the behavior only applies when supported by the system and remains configurable, enabling it by default seems like a reasonable step forward while still leaving room to adjust if real-world feedback suggests refinements are needed. |
|
I'm still not clear on your use case. I think you are suggesting that GRO/GSO is a useful addition to iperf3 because it can mimic IP fragmentation and also because QUIC and HTTP3 use UDP and might use something similar to GRO and GSO with large data payloads? We're still not sure about adding this feature because it is not in line with our current priorities, but if we were to take it, it would need to be (1) off by default, (2) we would prefer some kind of explicit test, and (3) the code duplication would need to be reduced. |
I think there’s a bit of a misunderstanding here about the scope and motivation. This is not a niche use case, and it’s not specific to QUIC or HTTP/3. Anyone with a reasonably high-speed link (≈2 Gbps and above — more and more common with 10/40/100G NICs) quickly runs into a wall when trying to measure maximum UDP throughput. That wall is usually CPU and syscall overhead, not a networking or NIC limitation. The cost of issuing enormous numbers of This problem exists regardless of protocol. QUIC/HTTPv3 just happens to expose it clearly because it is UDP-based, but the issue applies to any high-rate UDP workload. This is exactly what UDP GSO/GRO was designed to address:
This is not mimicking IP fragmentation. The packets emitted on the wire are bit-for-bit identical to what would be sent without GSO/GRO. The only difference is that aggregation happens inside the kernel rather than in userspace. As a result, this is 100% compatible with peers that do not support GSO/GRO. From the code perspective, I’m completely open to changes:
That said, I do expect that as understanding of this grows, enabling it by default will eventually make more sense — because without it, iperf cannot realistically measure max UDP throughput on modern high-speed systems due to CPU constraints rather than network ones. |
|
Hi @gegles, I work with @swlars on iperf3. We've been talking about this quite PR quite a bit. One of the challenges we have with our involvement with iperf3 is that we're only two people and we both have a limited amount of time and attention to put into iperf3. So we prioritize our time into the interests of our primary consumers, which are roughly ESnet and perfSONAR, then R&E networking, and then the Internet community at large. For the most part we're focused on fixing bugs rather than adding new features, particularly features that we (with again our limited time) will need to support and maintain in the future. Basically it's taking us a little while to understand and get comfortable with the code, particularly since iperf3 is already an overly-complicated program with lots of interactions between different features. I'm curious about how GSO/GRO is typically used by applications. I appreciate the fact that the PR code results in packets on the wire that are identical to those produced without GSO, so that we have compatibility between hosts enabling and disabling the feature. I see that the way you do this is through assembling a bunch of iperf3 payloads into a single large buffer and letting GSO send parts of the buffer (prepended with IP and UDP headers). Is it more typical to 1) send a bunch of small messages concatenated in a buffer or to 2) try to send a single large message as a UDP payload? About specific changes to code (some of those were in @swlars earlier feedback):
|
|
@bmah888 @swlars Thanks again for your review and the detailed feedback! Just to add some background here for anyone reading who may not be familiar with the terms: GSO (Generic Segmentation Offload) and GRO (Generic Receive Offload) are Linux kernel features that help reduce CPU overhead on high-rate UDP traffic. With GSO, userspace hands large “super-packets” to the kernel that the NIC (or kernel) efficiently segments into MTU-sized packets, and with GRO the kernel coalesces incoming packets before they go up the stack. Together they can significantly improve throughput and lower CPU utilization on capable hardware/OS configurations. There’s a concise explanation in the kernel docs that covers this: I understand the concern raised about finer-grained control of these options (e.g., having separate switches or optional arguments for GSO vs GRO). I’m absolutely open to modifying the CLI/API so that people can enable/disable them independently if that’s the consensus direction. I agree that making the knobs clearer and consistent with other iperf options (like how At the same time, the intent behind the current implementation was to enable both GSO and GRO by default where supported — because the wire format doesn’t change and the performance/CPU benefits justify it in many high-rate UDP scenarios — but still allow an explicit opt-out ( To help scope the next step, could you clarify whether, assuming these interface/design changes are made, you’d be comfortable with merging this? I’d like to avoid a situation where I put more effort into adjustments only to have the feature indefinitely blocked without a clear maintainership decision. I’m happy to iterate on direction — just trying to get some alignment so contributions are effective. Thanks again for engaging here — looking forward to your thoughts! |
License
This code is distributed under the terms of the BSD license, as per the LICENSE file in the iperf3 source tree.
Summary
Add UDP GSO/GRO support (Linux) and
--no-gsroswitchDescription
Addresses #1831
This change adds first-class support for Linux UDP GSO (Generic Segmentation Offload) and GRO (Generic Receive Offload) in iperf3. At configure time we detect the availability of the socket options and wire them into the UDP data path; at runtime the feature is enabled by default when the kernel supports it, with an opt-out CLI switch.
Highlights
<linux/udp.h>constantsUDP_SEGMENTandUDP_GROand definesHAVE_UDP_SEGMENT/HAVE_UDP_GROaccordingly.struct iperf_settings) to track GSO/GRO enablement and buffer/segment sizes (with sane maxima).--no-gsroto disable both GSO and GRO when they would otherwise be on by default; help text is included inusage_longstr.Why it matters
On modern NICs/kernels, GSO lets iperf3 hand larger UDP payloads to the kernel, which segments them efficiently, reducing CPU overhead on send; GRO coalesces received UDP segments, reducing per-packet processing on receive. Together they can materially increase achievable throughput and lower CPU utilization during high-rate UDP tests on capable Linux systems.
User-visible changes
--no-gsro(disable default GSO/GRO enablement).Platform notes
UDP_SEGMENT/UDP_GRO). Other platforms are unaffected.Backwards compatibility
--no-gsro.Testing done
HAVE_UDP_SEGMENT/HAVE_UDP_GROas expected).--no-gsrocomparison to confirm feature gating.On a 100Gbps NIC link between 2 test machines I am able to send and receive with no loss at around 24Gbps:
If I try to send at max speed, I can send at around 48Gbps, but still only receive at 24Gbps:
Documentation & follow-ups
--no-gsrotosrc/iperf3.1options list.