Conversation
af2e7c8 to
40f4a64
Compare
|
@rjarry @christophefontaine should we create a github issue for tracking this problem? |
Yes, so we don't forget it.
Even if it needs some polishing, I don't think we can do much better. It will need an ugly wart somewhere, either in the rx or tx paths. |
|
Would it be possible to have a virtio-rx node? (or maybe some node between port_rx and iface_input) |
In // select the appropriate node process callback
if (ctx->iface->mode == GR_IFACE_MODE_BOND) {
if (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP)
node->process = rx_bond_offload_process;
else
node->process = rx_bond_process;
} else {
if (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP)
node->process = rx_offload_process;
else
node->process = rx_process;
} |
|
Yes this is a good approach. Zero cost for other drivers. |
40f4a64 to
94ba660
Compare
|
I revamped the code, and fixed other combinations. |
94ba660 to
5fe7867
Compare
5fe7867 to
5ab23ee
Compare
|
Rebased after unrelated failure in evpn smoke test. |
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughAdds a Critical bug: on 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/infra/control/graph.c (1)
163-176:⚠️ Potential issue | 🔴 CriticalVirtio process selection doesn't verify VLAN strip capability is negotiated.
The
virtio_offloadsflag is set based on driver name and TCP checksum capability only, but the selection logic at lines 163–176 uses this flag without verifying that VLAN strip offload was actually negotiated. If a virtio device has TCP checksum capability but lacks VLAN strip capability,virtio_offloadsis true butport->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIPis false, causing the virtio path to be selected for a device that cannot provide hardware VLAN stripping. The virtio process functions rely on theRTE_MBUF_F_RX_VLAN_STRIPPEDflag and setvlan_id = 0when the flag is absent, resulting in broken VLAN handling.Either check negotiated offloads in the selection condition:
if (port->virtio_offloads && (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP))Or require VLAN strip capability when setting
virtio_offloadsin port.c:605–606.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/graph.c` around lines 163 - 176, The virtio path is being selected based solely on port->virtio_offloads which doesn't guarantee VLAN strip was negotiated; update the selection to require negotiated VLAN strip by changing the conditions that assign node->process for virtio to check both port->virtio_offloads and (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP) (i.e., use if (port->virtio_offloads && (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP)) for both the bonded and non-bonded branches), or alternatively tighten the code that sets port->virtio_offloads so it is only true when VLAN strip is also negotiated (modify the port.c logic around virtio_offloads assignment); ensure the final node->process selections still choose rx_virtio_process/rx_bond_virtio_process only when VLAN strip capability is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/datapath/port_rx.c`:
- Around line 154-155: The bounds check and write are unsafe: replace the
single-segment check using rte_pktmbuf_data_len(m) and the direct use of
rte_pktmbuf_mtod_offset() with a check that ensures two bytes are available (use
rte_pktmbuf_pkt_len(m) >= csum_offset + sizeof(uint16_t)) and either restrict to
single-segment mbufs (assert rte_pktmbuf_is_contiguous or check
rte_pktmbuf_data_len(m) >= csum_offset + 2) or implement a helper that writes a
uint16_t across segments; update the write site that currently uses
rte_pktmbuf_mtod_offset(m, uint16_t *, csum_offset) = csum to call that helper
(or perform a safe memcpy into the proper segment(s)) so no out-of-bounds write
occurs and multi-segment mbufs are handled or explicitly documented as
unsupported.
---
Outside diff comments:
In `@modules/infra/control/graph.c`:
- Around line 163-176: The virtio path is being selected based solely on
port->virtio_offloads which doesn't guarantee VLAN strip was negotiated; update
the selection to require negotiated VLAN strip by changing the conditions that
assign node->process for virtio to check both port->virtio_offloads and
(port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP) (i.e., use if
(port->virtio_offloads && (port->rx_offloads & RTE_ETH_RX_OFFLOAD_VLAN_STRIP))
for both the bonded and non-bonded branches), or alternatively tighten the code
that sets port->virtio_offloads so it is only true when VLAN strip is also
negotiated (modify the port.c logic around virtio_offloads assignment); ensure
the final node->process selections still choose
rx_virtio_process/rx_bond_virtio_process only when VLAN strip capability is
present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5d652098-22ad-4ab6-b7e4-e6f9cf02263e
📒 Files selected for processing (5)
modules/infra/control/gr_port.hmodules/infra/control/graph.cmodules/infra/control/port.cmodules/infra/datapath/gr_rxtx.hmodules/infra/datapath/port_rx.c
| uint32_t ptype, hdr_len; | ||
| uint16_t csum = 0, csum_offset; |
There was a problem hiding this comment.
I know it does not matter for checksum computation, but I am always confused when we store network order stuff into a uint16_t variable. Just for the sake of clarity could we use rte_be16t csum?
Also, can, you reorder definitions to have a reverse xmas tree ?
There was a problem hiding this comment.
I would expect "make format" to handle formatting by itself..
There was a problem hiding this comment.
It does not reorder lines since this may break compilation. There is no "reverse xmas tree" option unfortunately :(
There was a problem hiding this comment.
On the rte_be16_t topic, the DPDK API takes a uint16_t as input.
rte_be16_t is exactly aligned, so it would probably not break anything.
I just see little point.
There was a problem hiding this comment.
I would prefer if we stick to rte_be16_t to be consistent with TCP/UDP header field definitions.
https://github.com/DPDK/dpdk/blob/v25.11/lib/net/rte_tcp.h#L32
https://github.com/DPDK/dpdk/blob/v25.11/lib/net/rte_udp.h#L28
|
|
||
| port->port_id = port_id; | ||
| if (rte_eth_dev_info_get(port_id, &info) < 0) | ||
| return errno_set(EINVAL); |
There was a problem hiding this comment.
Can this call even fail at this point? If it does, at least preserve the returned error code:
if ((ret = rte_eth_dev_info_get(port_id, &info)) < 0)
return errno_set(-ret);There was a problem hiding this comment.
No, it can't, but this helper is marked with warn_unused_result.
There was a problem hiding this comment.
ok, then let's preserve the return value for clarity.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
modules/infra/datapath/port_rx.c (1)
154-155:⚠️ Potential issue | 🔴 CriticalOff-by-one error: bounds check insufficient for 2-byte write.
Writing a
uint16_tatcsum_offsetrequires valid memory for 2 bytes (csum_offsetandcsum_offset + 1), but the check only ensures 1 byte is available.🐛 Proposed fix
- if (rte_pktmbuf_data_len(m) >= csum_offset + 1) + if (rte_pktmbuf_data_len(m) >= csum_offset + sizeof(uint16_t)) *rte_pktmbuf_mtod_offset(m, uint16_t *, csum_offset) = csum;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/datapath/port_rx.c` around lines 154 - 155, The bounds check before writing a uint16_t is off-by-one: update the condition that currently uses "rte_pktmbuf_data_len(m) >= csum_offset + 1" to ensure two bytes are available (e.g., compare against csum_offset + sizeof(uint16_t) or + 2) so the write performed via rte_pktmbuf_mtod_offset(m, uint16_t *, csum_offset) = csum does not overflow; adjust the check in the same block in port_rx.c where the write occurs to validate data length >= csum_offset + sizeof(uint16_t) (or equivalent) before performing the 2-byte store.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@modules/infra/datapath/port_rx.c`:
- Around line 154-155: The bounds check before writing a uint16_t is off-by-one:
update the condition that currently uses "rte_pktmbuf_data_len(m) >= csum_offset
+ 1" to ensure two bytes are available (e.g., compare against csum_offset +
sizeof(uint16_t) or + 2) so the write performed via rte_pktmbuf_mtod_offset(m,
uint16_t *, csum_offset) = csum does not overflow; adjust the check in the same
block in port_rx.c where the write occurs to validate data length >= csum_offset
+ sizeof(uint16_t) (or equivalent) before performing the 2-byte store.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9f7972e0-5d4a-4626-bc48-40ec80a5a51d
📒 Files selected for processing (5)
modules/infra/control/gr_port.hmodules/infra/control/graph.cmodules/infra/control/port.cmodules/infra/datapath/gr_rxtx.hmodules/infra/datapath/port_rx.c
25d1509 to
bf36db3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modules/infra/control/port.c (1)
603-604:⚠️ Potential issue | 🔴 CriticalCapture the
rte_eth_dev_info_get()error code before returning.At Line 603, the call result is not stored, but Line 604 returns
-ret. On failure this can return a stale/incorrect value (including success), masking init errors.🔧 Suggested fix
- if (rte_eth_dev_info_get(port_id, &info) < 0) - return errno_set(-ret); + if ((ret = rte_eth_dev_info_get(port_id, &info)) < 0) + return errno_set(-ret);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/port.c` around lines 603 - 604, Store and use the actual return value from rte_eth_dev_info_get instead of returning a stale -ret; assign the function result to a local int (e.g., ret = rte_eth_dev_info_get(port_id, &info)), check if ret < 0 and then return errno_set(-ret). Update the code around the call to rte_eth_dev_info_get so the returned error code is the one propagated (referencing rte_eth_dev_info_get, errno_set, port_id, and info).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/datapath/port_rx.c`:
- Around line 156-160: The code currently sets m->ol_flags to
RTE_MBUF_F_RX_L4_CKSUM_GOOD unconditionally even when the checksum write inside
the if (rte_pktmbuf_data_len(m) >= csum_offset + 2) fails; modify the logic so
that the checksum flags (m->ol_flags &= ~RTE_MBUF_F_RX_L4_CKSUM_MASK;
m->ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD) are only applied when the checksum
was actually written (i.e., inside the successful branch where
rte_pktmbuf_mtod_offset(m, rte_be16_t *, csum_offset) = csum is executed); in
the else branch clear any RX_L4_CKSUM_GOOD bit or set an appropriate bad/unknown
state instead so packets that skipped the write are not marked GOOD.
---
Duplicate comments:
In `@modules/infra/control/port.c`:
- Around line 603-604: Store and use the actual return value from
rte_eth_dev_info_get instead of returning a stale -ret; assign the function
result to a local int (e.g., ret = rte_eth_dev_info_get(port_id, &info)), check
if ret < 0 and then return errno_set(-ret). Update the code around the call to
rte_eth_dev_info_get so the returned error code is the one propagated
(referencing rte_eth_dev_info_get, errno_set, port_id, and info).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 97c53c91-7ded-499f-8bcb-c7b3ed79325d
📒 Files selected for processing (5)
modules/infra/control/gr_port.hmodules/infra/control/graph.cmodules/infra/control/port.cmodules/infra/datapath/gr_rxtx.hmodules/infra/datapath/port_rx.c
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/infra/control/graph.c
In case L4 checksums have been negotiated at the qemu level, L4 packets may be received from virtio ports with a L4_CKSUM_NONE status, which means that the checksum is incorrect in the data, yet this packet should be treated as valid (described in the DPDK API). The problem is that grout does not implement L4 checksum handling and just lets this packet go through the stack, leaving the checksum as invalid. Add dedicated process helpers for net/virtio ports and switch to them if csum were enabled for this virtio port. Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Robin Jarry <rjarry@redhat.com>
bf36db3 to
bd412c0
Compare
In case L4 checksums offload have been negotiated at the qemu level, L4 packets may be received from virtio ports with a L4_NONE status, which means that the checksum is incorrect in the data, yet this packet should be treated as valid (described in the DPDK API).
The problem is that grout does not implement L4 checksum handling and just lets this packet go through the stack, leaving the checksum as invalid.
This patch is just a hack.
A good workaround for now is to disable csum as the virtio level via some libvirt xml, like:
...Fix L4 checksum for virtio port with offload capability
Technical impact: add a virtio-specific RX path that computes and marks L4 checksums for TCP/UDP packets when the virtio driver negotiated L4 checksum offload but packets arrive with L4 status set to NONE.
Changes
modules/infra/control/gr_port.h
modules/infra/control/port.c
modules/infra/control/graph.c
This confines the driver-specific selection to graph initialization.
modules/infra/datapath/gr_rxtx.h
modules/infra/datapath/port_rx.c
Behavioral result