ip,ip6: allow cross-family routes#553
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/ip/control/nexthop.c`:
- Around line 41-48: The nh4_resolve_cb currently unconditionally reads dst from
an IPv4 header (rte_ipv4_hdr) which is unsafe when called for IPv6 packets via
the nexthop-AF dispatch; update nh4_resolve_cb to determine the packet address
family first (inspect mbuf/packet metadata or ether type) and extract the
correct destination address into dst (use IPv4 dst from struct rte_ipv4_hdr for
AF_INET and the equivalent IPv6 destination for AF_INET6) before any
GR_NH_F_LINK link-route lookup/insert; similarly apply the same AF-aware dst
derivation to the related resolver code referenced around the other dispatch
(lines ~244-247) so link-route keys use the true packet destination rather than
raw bytes from the wrong header.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 83b3292d-da08-4089-b585-cd77404d84b4
📒 Files selected for processing (31)
modules/infra/control/gr_nh_control.hmodules/infra/control/l3_nexthop.cmodules/infra/datapath/gr_l3.hmodules/ip/control/gr_ip4_control.hmodules/ip/control/nexthop.cmodules/ip/datapath/gr_ip4_datapath.hmodules/ip/datapath/icmp_output.cmodules/ip/datapath/ip_error.cmodules/ip/datapath/ip_hold.cmodules/ip/datapath/ip_input.cmodules/ip/datapath/ip_loadbalance.cmodules/ip/datapath/ip_local.cmodules/ip/datapath/ip_output.cmodules/ip6/control/gr_ip6_control.hmodules/ip6/control/nexthop.cmodules/ip6/control/router_advert.cmodules/ip6/datapath/gr_ip6_datapath.hmodules/ip6/datapath/icmp6_output.cmodules/ip6/datapath/ip6_error.cmodules/ip6/datapath/ip6_hold.cmodules/ip6/datapath/ip6_input.cmodules/ip6/datapath/ip6_loadbalance.cmodules/ip6/datapath/ip6_local.cmodules/ip6/datapath/ip6_output.cmodules/ipip/datapath_out.cmodules/l2/datapath/vxlan_output.cmodules/policy/datapath/dnat44_dynamic.cmodules/policy/datapath/dnat44_static.cmodules/srv6/datapath/srv6_local.cmodules/srv6/datapath/srv6_output.csmoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
- modules/ip/control/gr_ip4_control.h
- modules/ip6/control/gr_ip6_control.h
6cceeac to
9cc1cb2
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modules/infra/control/l3_nexthop.c (1)
20-27:⚠️ Potential issue | 🟠 MajorMissing validation for
resolveandresubmitcallbacks.The registration function validates
cleanup_routesandsolicitbut not the newresolveandresubmitcallbacks. Callers inip_hold.c(line 60) andnexthop.c(line 130) dereference these without NULL checks after only verifyingops != NULL. If a future module omits these callbacks, it will crash with a NULL function pointer dereference.🛡️ Proposed fix
void nexthop_af_ops_register(addr_family_t af, const struct nexthop_af_ops *ops) { if (!gr_af_valid(af)) ABORT("invalid af value %hhu", af); - if (ops == NULL || ops->cleanup_routes == NULL || ops->solicit == NULL) + if (ops == NULL || ops->cleanup_routes == NULL || ops->solicit == NULL + || ops->resolve == NULL || ops->resubmit == NULL) ABORT("invalid af ops"); if (af_ops[af] != NULL) ABORT("duplicate af ops %s", gr_af_name(af)); af_ops[af] = ops; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/infra/control/l3_nexthop.c` around lines 20 - 27, The registration function nexthop_af_ops_register currently checks ops, cleanup_routes, and solicit but not the new callbacks resolve and resubmit; update the validation in nexthop_af_ops_register to also verify ops->resolve and ops->resubmit are non-NULL (and include them in the ABORT error message), so that af_ops[af] is only set when all required callbacks (cleanup_routes, solicit, resolve, resubmit) are present and callers in ip_hold.c and nexthop.c won't dereference NULL function pointers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modules/infra/control/l3_nexthop.c`:
- Around line 20-27: The registration function nexthop_af_ops_register currently
checks ops, cleanup_routes, and solicit but not the new callbacks resolve and
resubmit; update the validation in nexthop_af_ops_register to also verify
ops->resolve and ops->resubmit are non-NULL (and include them in the ABORT error
message), so that af_ops[af] is only set when all required callbacks
(cleanup_routes, solicit, resolve, resubmit) are present and callers in
ip_hold.c and nexthop.c won't dereference NULL function pointers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 83c94c59-67ee-46a9-8964-b481521e9d38
📒 Files selected for processing (9)
modules/infra/control/gr_nh_control.hmodules/infra/control/l3_nexthop.cmodules/ip/control/gr_ip4_control.hmodules/ip/control/nexthop.cmodules/ip/datapath/ip_hold.cmodules/ip6/control/gr_ip6_control.hmodules/ip6/control/nexthop.cmodules/ip6/datapath/ip6_hold.csmoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
- modules/ip6/control/gr_ip6_control.h
- modules/ip/control/gr_ip4_control.h
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/ip/datapath/ip_hold.c
9cc1cb2 to
ca6eaba
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/control/gr_nh_control.h`:
- Around line 134-147: The AF-op registration currently only checks solicit and
cleanup_routes, leaving resolve and resubmit unvalidated; update the
implementation of nexthop_af_ops_register to also require that the provided
struct nexthop_af_ops has non-NULL resolve and resubmit members before
registering (i.e., check ops->resolve and ops->resubmit alongside ops->solicit
and ops->cleanup_routes), and return an error (or refuse registration) if either
is NULL so future callers cannot register incomplete nexthop_af_ops instances
that would crash on use.
In `@modules/infra/control/loopback.c`:
- Around line 154-165: The tun PI header is parsed but never removed from the
mbuf, causing the L3 stack to see PI bytes as the IP header; after you read pi
(from data) and determine proto (cases on pi->proto / RTE_ETHER_TYPE_IPV4/IPv6)
call rte_pktmbuf_adj(mbuf, sizeof(*pi)) to strip the PI from the mbuf before
handing it to L3, check the return (NULL indicates failure) and on failure log
an error and goto err, and keep the existing packet_type assignments
(mbuf->packet_type = RTE_PTYPE_L3_IPV4 / RTE_PTYPE_L3_IPV6) for the stripped
packet.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8b8a928f-3ceb-434c-96c8-f92d576cee93
📒 Files selected for processing (32)
modules/infra/control/gr_nh_control.hmodules/infra/control/l3_nexthop.cmodules/infra/control/loopback.cmodules/infra/datapath/gr_l3.hmodules/ip/control/gr_ip4_control.hmodules/ip/control/nexthop.cmodules/ip/datapath/gr_ip4_datapath.hmodules/ip/datapath/icmp_output.cmodules/ip/datapath/ip_error.cmodules/ip/datapath/ip_hold.cmodules/ip/datapath/ip_input.cmodules/ip/datapath/ip_loadbalance.cmodules/ip/datapath/ip_local.cmodules/ip/datapath/ip_output.cmodules/ip6/control/gr_ip6_control.hmodules/ip6/control/nexthop.cmodules/ip6/control/router_advert.cmodules/ip6/datapath/gr_ip6_datapath.hmodules/ip6/datapath/icmp6_output.cmodules/ip6/datapath/ip6_error.cmodules/ip6/datapath/ip6_hold.cmodules/ip6/datapath/ip6_input.cmodules/ip6/datapath/ip6_loadbalance.cmodules/ip6/datapath/ip6_local.cmodules/ip6/datapath/ip6_output.cmodules/ipip/datapath_out.cmodules/l2/datapath/vxlan_output.cmodules/policy/datapath/dnat44_dynamic.cmodules/policy/datapath/dnat44_static.cmodules/srv6/datapath/srv6_local.cmodules/srv6/datapath/srv6_output.csmoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
- modules/ip6/control/gr_ip6_control.h
- modules/ip/control/gr_ip4_control.h
✅ Files skipped from review due to trivial changes (1)
- modules/policy/datapath/dnat44_static.c
🚧 Files skipped from review as they are similar to previous changes (5)
- modules/infra/control/l3_nexthop.c
- modules/srv6/datapath/srv6_local.c
- modules/srv6/datapath/srv6_output.c
- modules/ip/datapath/ip_input.c
- modules/ip/datapath/ip_local.c
ca6eaba to
409982f
Compare
409982f to
3690207
Compare
christophefontaine
left a comment
There was a problem hiding this comment.
small nit, and api renaming, otherwise lgtm.
| mbuf->packet_type |= RTE_PTYPE_L3_IPV4; | ||
| mbuf->packet_type &= ~RTE_PTYPE_L3_IPV6; |
There was a problem hiding this comment.
nit: Do we ever check beyond L3 ?
In which case, we could do an assignation instead of manipulating the bit mask ?
mbuf->packet_type = RTE_PTYPE_L3_IPV4;
WDYT ?
There was a problem hiding this comment.
We don't read anything else at the moment. We could overwrite the whole bit mask. Don't you foresee that we'll use packet_type for something else in the future?
There was a problem hiding this comment.
On the ingress, we may leverage the hardware to offload the tunnel decap, if that is technically possible, and we don't rely on this for vxlan tunnels.
I don't see any additional usecase – as of now – on the egress path.
This was just a nit, nothing critical.
| } | ||
|
|
||
| mbuf->packet_type |= RTE_PTYPE_L3_IPV6; | ||
| mbuf->packet_type &= ~RTE_PTYPE_L3_IPV4; |
| af_ops[af] = ops; | ||
| } | ||
|
|
||
| const struct nexthop_af_ops *nexthop_af_ops_get_nh(const struct nexthop *nh) { |
There was a problem hiding this comment.
nexthop_af_ops_get_nh --> nexthop_af_ops_from_nh ?
| return af_ops[l3->af]; | ||
| } | ||
|
|
||
| const struct nexthop_af_ops *nexthop_af_ops_get_mbuf(const struct rte_mbuf *m) { |
There was a problem hiding this comment.
nexthop_af_ops_get_mbuf --> nexthop_af_ops_from_mbuf ?
3690207 to
ac5035e
Compare
The packet_type is determined by reading data[0] and checking if it looks like an IPv6 version nibble. However, data points to the start of the buffer which includes the struct tun_pi header (4 bytes). The first byte is tun_pi.flags which is always 0, so the check always falls through to RTE_PTYPE_L3_IPV4 even for IPv6 packets. Since ip6_output sets packet_type with |= RTE_PTYPE_L3_IPV6, loopback originated IPv6 packets end up with both RTE_PTYPE_L3_IPV4 and RTE_PTYPE_L3_IPV6 flags set, which is incorrect. Use tun_pi.proto instead of guessing the version from the first byte. Use readv() to read the tun_pi header into a stack variable and keep it out of the mbuf data. In the datapath, use mbuf->packet_type (already set by the control plane) instead of parsing tun_pi from the mbuf buffer. This fix is required for the next commit where ip_hold/ip6_hold will dispatch resubmit callbacks based on packet_type. Fixes: 67d0bc3 ("loopback: fix packet type of packet received from linux") Fixes: cdfe435 ("loopback: rebuild ip/ip6 header") Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Packets can change address family during processing. For example, srv6_output encapsulates an IPv4 packet inside an IPv6 header and forwards it to ip6_output. At that point the mbuf still carries RTE_PTYPE_L3_IPV4 from ip_output, and ip6_output adds RTE_PTYPE_L3_IPV6 with |=, leaving both flags set. Completely reset the packet_type field in ip*_output. We only use it for this specific purpose and never rely on drivers accurately initializing it in the Rx path. In the next commits, we will add the possibility of cross-family routes (e.g. IPv4 routes with an IPv6 nexthop). This means an IPv4 packet may be held in queue waiting for the resolution of an IPv6 nexthop. And it requires the IPv4 and IPv6 nexthop functions to dispatch packets based on their packet type. Fixes: 09d4d5e ("ip: enforce mbuf packet_type") Fixes: 16b3065 ("ip6: enforce mbuf packet_type") Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Both ip_output_mbuf_data and ip6_output_mbuf_data have the exact same layout: a pointer to the resolved nexthop. Replace them with a single l3_mbuf_data type defined in a new gr_l3.h header shared across all modules. This is a prerequisite to allow cross address family nexthops where an IPv4 packet can reference an IPv6 nexthop and vice versa. Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
When a packet reaches ip_hold or ip6_hold, the resolve callback is hardcoded to the same address family as the packet. With cross-family nexthops, an IPv4 packet may be waiting on an IPv6 neighbor (or the reverse). Add resolve and resubmit callbacks to nexthop_af_ops and use them in the hold nodes and probe input handlers. Provide two helpers to look up the right af_ops: nexthop_af_ops_from_nh() (from a nexthop L3 info) and nexthop_af_ops_from_mbuf() (from the packet type). The resolve callback is dispatched based on the nexthop address family so that neighbor resolution uses the correct protocol (ARP or NDP). The resubmit callback is dispatched based on the packet address family so that resolved packets re-enter the correct output node. Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Add a test that configures IPv4 routes with IPv6 link-local nexthops. Two namespaces exchange IPv4 traffic where MAC resolution goes through NDP on the peer link-local addresses. The test verifies both that ping succeeds and that the nexthops contain the correct MAC. Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
FRR variant of ip_forward_ip6nh_test that verifies IPv4 forwarding with IPv6 link-local nexthops when routes are pushed through the FRR dplane plugin. The set_ip_route helper now strips the optional interface name from the nexthop argument before using it in the jq match expression. Since set_ip_route is also used without an explicit interface, we only match on the address part to keep it working in both cases. Signed-off-by: Robin Jarry <rjarry@redhat.com> Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
ac5035e to
9b7d8e1
Compare
Allow defining IPv4 routes with IPv6 L3 nexthops that are resolved using NDP, and vice versa. This is useful when peers only have link-local IPv6 addresses but need to forward IPv4 traffic.
The per-family ip_output_mbuf_data and ip6_output_mbuf_data types are unified into a single l3_mbuf_data since they have the same layout. The resolve and resubmit callbacks in the hold path are made indirect through nexthop_af_ops so that neighbor resolution follows the nexthop's address family while packet resubmission follows the packet's address family.
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests