Skip to content

ip,ip6: allow cross-family routes#553

Merged
christophefontaine merged 6 commits intoDPDK:mainfrom
rjarry:cross-family-routes
Mar 16, 2026
Merged

ip,ip6: allow cross-family routes#553
christophefontaine merged 6 commits intoDPDK:mainfrom
rjarry:cross-family-routes

Conversation

@rjarry
Copy link
Copy Markdown
Collaborator

@rjarry rjarry commented Mar 13, 2026

Allow defining IPv4 routes with IPv6 L3 nexthops that are resolved using NDP, and vice versa. This is useful when peers only have link-local IPv6 addresses but need to forward IPv4 traffic.

The per-family ip_output_mbuf_data and ip6_output_mbuf_data types are unified into a single l3_mbuf_data since they have the same layout. The resolve and resubmit callbacks in the hold path are made indirect through nexthop_af_ops so that neighbor resolution follows the nexthop's address family while packet resubmission follows the packet's address family.

Summary by CodeRabbit

  • New Features

    • Automatic resubmission of held packets when nexthops become reachable via per-address-family resolve/resubmit hooks.
  • Refactor

    • Consolidated per-packet next-hop tracking into a unified L3 mbuf data area and unified nexthop operations across address families.
  • Bug Fixes

    • More robust tunnel/loopback packet protocol handling and packet-type detection.
  • Tests

    • Added smoke tests validating IPv6 nexthop forwarding and FRR cross-family nexthop behavior.

@coderabbitai

This comment was marked as resolved.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/ip/control/nexthop.c`:
- Around line 41-48: The nh4_resolve_cb currently unconditionally reads dst from
an IPv4 header (rte_ipv4_hdr) which is unsafe when called for IPv6 packets via
the nexthop-AF dispatch; update nh4_resolve_cb to determine the packet address
family first (inspect mbuf/packet metadata or ether type) and extract the
correct destination address into dst (use IPv4 dst from struct rte_ipv4_hdr for
AF_INET and the equivalent IPv6 destination for AF_INET6) before any
GR_NH_F_LINK link-route lookup/insert; similarly apply the same AF-aware dst
derivation to the related resolver code referenced around the other dispatch
(lines ~244-247) so link-route keys use the true packet destination rather than
raw bytes from the wrong header.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83b3292d-da08-4089-b585-cd77404d84b4

📥 Commits

Reviewing files that changed from the base of the PR and between 4b2eb2c and 6cceeac.

📒 Files selected for processing (31)
  • modules/infra/control/gr_nh_control.h
  • modules/infra/control/l3_nexthop.c
  • modules/infra/datapath/gr_l3.h
  • modules/ip/control/gr_ip4_control.h
  • modules/ip/control/nexthop.c
  • modules/ip/datapath/gr_ip4_datapath.h
  • modules/ip/datapath/icmp_output.c
  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_hold.c
  • modules/ip/datapath/ip_input.c
  • modules/ip/datapath/ip_loadbalance.c
  • modules/ip/datapath/ip_local.c
  • modules/ip/datapath/ip_output.c
  • modules/ip6/control/gr_ip6_control.h
  • modules/ip6/control/nexthop.c
  • modules/ip6/control/router_advert.c
  • modules/ip6/datapath/gr_ip6_datapath.h
  • modules/ip6/datapath/icmp6_output.c
  • modules/ip6/datapath/ip6_error.c
  • modules/ip6/datapath/ip6_hold.c
  • modules/ip6/datapath/ip6_input.c
  • modules/ip6/datapath/ip6_loadbalance.c
  • modules/ip6/datapath/ip6_local.c
  • modules/ip6/datapath/ip6_output.c
  • modules/ipip/datapath_out.c
  • modules/l2/datapath/vxlan_output.c
  • modules/policy/datapath/dnat44_dynamic.c
  • modules/policy/datapath/dnat44_static.c
  • modules/srv6/datapath/srv6_local.c
  • modules/srv6/datapath/srv6_output.c
  • smoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
  • modules/ip/control/gr_ip4_control.h
  • modules/ip6/control/gr_ip6_control.h

Comment thread modules/ip/control/nexthop.c
Comment thread modules/ip6/control/nexthop.c
@rjarry rjarry force-pushed the cross-family-routes branch from 6cceeac to 9cc1cb2 Compare March 13, 2026 15:33
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modules/infra/control/l3_nexthop.c (1)

20-27: ⚠️ Potential issue | 🟠 Major

Missing validation for resolve and resubmit callbacks.

The registration function validates cleanup_routes and solicit but not the new resolve and resubmit callbacks. Callers in ip_hold.c (line 60) and nexthop.c (line 130) dereference these without NULL checks after only verifying ops != NULL. If a future module omits these callbacks, it will crash with a NULL function pointer dereference.

🛡️ Proposed fix
 void nexthop_af_ops_register(addr_family_t af, const struct nexthop_af_ops *ops) {
 	if (!gr_af_valid(af))
 		ABORT("invalid af value %hhu", af);
-	if (ops == NULL || ops->cleanup_routes == NULL || ops->solicit == NULL)
+	if (ops == NULL || ops->cleanup_routes == NULL || ops->solicit == NULL
+	    || ops->resolve == NULL || ops->resubmit == NULL)
 		ABORT("invalid af ops");
 	if (af_ops[af] != NULL)
 		ABORT("duplicate af ops %s", gr_af_name(af));
 	af_ops[af] = ops;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/infra/control/l3_nexthop.c` around lines 20 - 27, The registration
function nexthop_af_ops_register currently checks ops, cleanup_routes, and
solicit but not the new callbacks resolve and resubmit; update the validation in
nexthop_af_ops_register to also verify ops->resolve and ops->resubmit are
non-NULL (and include them in the ABORT error message), so that af_ops[af] is
only set when all required callbacks (cleanup_routes, solicit, resolve,
resubmit) are present and callers in ip_hold.c and nexthop.c won't dereference
NULL function pointers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modules/infra/control/l3_nexthop.c`:
- Around line 20-27: The registration function nexthop_af_ops_register currently
checks ops, cleanup_routes, and solicit but not the new callbacks resolve and
resubmit; update the validation in nexthop_af_ops_register to also verify
ops->resolve and ops->resubmit are non-NULL (and include them in the ABORT error
message), so that af_ops[af] is only set when all required callbacks
(cleanup_routes, solicit, resolve, resubmit) are present and callers in
ip_hold.c and nexthop.c won't dereference NULL function pointers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83c94c59-67ee-46a9-8964-b481521e9d38

📥 Commits

Reviewing files that changed from the base of the PR and between 6cceeac and 9cc1cb2.

📒 Files selected for processing (9)
  • modules/infra/control/gr_nh_control.h
  • modules/infra/control/l3_nexthop.c
  • modules/ip/control/gr_ip4_control.h
  • modules/ip/control/nexthop.c
  • modules/ip/datapath/ip_hold.c
  • modules/ip6/control/gr_ip6_control.h
  • modules/ip6/control/nexthop.c
  • modules/ip6/datapath/ip6_hold.c
  • smoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
  • modules/ip6/control/gr_ip6_control.h
  • modules/ip/control/gr_ip4_control.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • modules/ip/datapath/ip_hold.c

@rjarry rjarry force-pushed the cross-family-routes branch from 9cc1cb2 to ca6eaba Compare March 13, 2026 21:06
@rjarry rjarry changed the title ip,ip6: allow cross-family nexthops ip,ip6: allow cross-family routes Mar 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modules/infra/control/gr_nh_control.h`:
- Around line 134-147: The AF-op registration currently only checks solicit and
cleanup_routes, leaving resolve and resubmit unvalidated; update the
implementation of nexthop_af_ops_register to also require that the provided
struct nexthop_af_ops has non-NULL resolve and resubmit members before
registering (i.e., check ops->resolve and ops->resubmit alongside ops->solicit
and ops->cleanup_routes), and return an error (or refuse registration) if either
is NULL so future callers cannot register incomplete nexthop_af_ops instances
that would crash on use.

In `@modules/infra/control/loopback.c`:
- Around line 154-165: The tun PI header is parsed but never removed from the
mbuf, causing the L3 stack to see PI bytes as the IP header; after you read pi
(from data) and determine proto (cases on pi->proto / RTE_ETHER_TYPE_IPV4/IPv6)
call rte_pktmbuf_adj(mbuf, sizeof(*pi)) to strip the PI from the mbuf before
handing it to L3, check the return (NULL indicates failure) and on failure log
an error and goto err, and keep the existing packet_type assignments
(mbuf->packet_type = RTE_PTYPE_L3_IPV4 / RTE_PTYPE_L3_IPV6) for the stripped
packet.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b8a928f-3ceb-434c-96c8-f92d576cee93

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc1cb2 and ca6eaba.

📒 Files selected for processing (32)
  • modules/infra/control/gr_nh_control.h
  • modules/infra/control/l3_nexthop.c
  • modules/infra/control/loopback.c
  • modules/infra/datapath/gr_l3.h
  • modules/ip/control/gr_ip4_control.h
  • modules/ip/control/nexthop.c
  • modules/ip/datapath/gr_ip4_datapath.h
  • modules/ip/datapath/icmp_output.c
  • modules/ip/datapath/ip_error.c
  • modules/ip/datapath/ip_hold.c
  • modules/ip/datapath/ip_input.c
  • modules/ip/datapath/ip_loadbalance.c
  • modules/ip/datapath/ip_local.c
  • modules/ip/datapath/ip_output.c
  • modules/ip6/control/gr_ip6_control.h
  • modules/ip6/control/nexthop.c
  • modules/ip6/control/router_advert.c
  • modules/ip6/datapath/gr_ip6_datapath.h
  • modules/ip6/datapath/icmp6_output.c
  • modules/ip6/datapath/ip6_error.c
  • modules/ip6/datapath/ip6_hold.c
  • modules/ip6/datapath/ip6_input.c
  • modules/ip6/datapath/ip6_loadbalance.c
  • modules/ip6/datapath/ip6_local.c
  • modules/ip6/datapath/ip6_output.c
  • modules/ipip/datapath_out.c
  • modules/l2/datapath/vxlan_output.c
  • modules/policy/datapath/dnat44_dynamic.c
  • modules/policy/datapath/dnat44_static.c
  • modules/srv6/datapath/srv6_local.c
  • modules/srv6/datapath/srv6_output.c
  • smoke/ip_forward_ip6nh_test.sh
💤 Files with no reviewable changes (2)
  • modules/ip6/control/gr_ip6_control.h
  • modules/ip/control/gr_ip4_control.h
✅ Files skipped from review due to trivial changes (1)
  • modules/policy/datapath/dnat44_static.c
🚧 Files skipped from review as they are similar to previous changes (5)
  • modules/infra/control/l3_nexthop.c
  • modules/srv6/datapath/srv6_local.c
  • modules/srv6/datapath/srv6_output.c
  • modules/ip/datapath/ip_input.c
  • modules/ip/datapath/ip_local.c

Comment thread modules/infra/control/gr_nh_control.h Outdated
Comment thread modules/infra/control/loopback.c Outdated
Comment thread smoke/ip_forward_ip6nh_test.sh
@rjarry rjarry force-pushed the cross-family-routes branch from ca6eaba to 409982f Compare March 16, 2026 08:50
Comment thread modules/infra/control/loopback.c Outdated
@rjarry rjarry force-pushed the cross-family-routes branch from 409982f to 3690207 Compare March 16, 2026 10:30
Copy link
Copy Markdown
Collaborator

@christophefontaine christophefontaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, and api renaming, otherwise lgtm.

Comment thread modules/ip/datapath/ip_output.c Outdated
Comment on lines +80 to +81
mbuf->packet_type |= RTE_PTYPE_L3_IPV4;
mbuf->packet_type &= ~RTE_PTYPE_L3_IPV6;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we ever check beyond L3 ?
In which case, we could do an assignation instead of manipulating the bit mask ?

mbuf->packet_type = RTE_PTYPE_L3_IPV4;

WDYT ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't read anything else at the moment. We could overwrite the whole bit mask. Don't you foresee that we'll use packet_type for something else in the future?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the ingress, we may leverage the hardware to offload the tunnel decap, if that is technically possible, and we don't rely on this for vxlan tunnels.
I don't see any additional usecase – as of now – on the egress path.

This was just a nit, nothing critical.

Comment thread modules/ip6/datapath/ip6_output.c Outdated
}

mbuf->packet_type |= RTE_PTYPE_L3_IPV6;
mbuf->packet_type &= ~RTE_PTYPE_L3_IPV4;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread modules/infra/control/l3_nexthop.c Outdated
af_ops[af] = ops;
}

const struct nexthop_af_ops *nexthop_af_ops_get_nh(const struct nexthop *nh) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nexthop_af_ops_get_nh --> nexthop_af_ops_from_nh ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, better

Comment thread modules/infra/control/l3_nexthop.c Outdated
return af_ops[l3->af];
}

const struct nexthop_af_ops *nexthop_af_ops_get_mbuf(const struct rte_mbuf *m) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nexthop_af_ops_get_mbuf --> nexthop_af_ops_from_mbuf ?

@grout-bot grout-bot force-pushed the cross-family-routes branch from 3690207 to ac5035e Compare March 16, 2026 12:46
rjarry added 6 commits March 16, 2026 14:34
The packet_type is determined by reading data[0] and checking if it
looks like an IPv6 version nibble. However, data points to the start
of the buffer which includes the struct tun_pi header (4 bytes). The
first byte is tun_pi.flags which is always 0, so the check always
falls through to RTE_PTYPE_L3_IPV4 even for IPv6 packets.

Since ip6_output sets packet_type with |= RTE_PTYPE_L3_IPV6, loopback
originated IPv6 packets end up with both RTE_PTYPE_L3_IPV4 and
RTE_PTYPE_L3_IPV6 flags set, which is incorrect.

Use tun_pi.proto instead of guessing the version from the first byte.
Use readv() to read the tun_pi header into a stack variable and keep it
out of the mbuf data. In the datapath, use mbuf->packet_type (already
set by the control plane) instead of parsing tun_pi from the mbuf
buffer.

This fix is required for the next commit where ip_hold/ip6_hold will
dispatch resubmit callbacks based on packet_type.

Fixes: 67d0bc3 ("loopback: fix packet type of packet received from linux")
Fixes: cdfe435 ("loopback: rebuild ip/ip6 header")
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Packets can change address family during processing. For example,
srv6_output encapsulates an IPv4 packet inside an IPv6 header and
forwards it to ip6_output. At that point the mbuf still carries
RTE_PTYPE_L3_IPV4 from ip_output, and ip6_output adds RTE_PTYPE_L3_IPV6
with |=, leaving both flags set.

Completely reset the packet_type field in ip*_output. We only use it for
this specific purpose and never rely on drivers accurately initializing
it in the Rx path.

In the next commits, we will add the possibility of cross-family routes
(e.g. IPv4 routes with an IPv6 nexthop).

This means an IPv4 packet may be held in queue waiting for the
resolution of an IPv6 nexthop. And it requires the IPv4 and IPv6 nexthop
functions to dispatch packets based on their packet type.

Fixes: 09d4d5e ("ip: enforce mbuf packet_type")
Fixes: 16b3065 ("ip6: enforce mbuf packet_type")
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Both ip_output_mbuf_data and ip6_output_mbuf_data have the exact same
layout: a pointer to the resolved nexthop. Replace them with a single
l3_mbuf_data type defined in a new gr_l3.h header shared across all
modules.

This is a prerequisite to allow cross address family nexthops where
an IPv4 packet can reference an IPv6 nexthop and vice versa.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
When a packet reaches ip_hold or ip6_hold, the resolve callback is
hardcoded to the same address family as the packet. With cross-family
nexthops, an IPv4 packet may be waiting on an IPv6 neighbor (or the
reverse).

Add resolve and resubmit callbacks to nexthop_af_ops and use them in the
hold nodes and probe input handlers. Provide two helpers to look up the
right af_ops: nexthop_af_ops_from_nh() (from a nexthop L3 info) and
nexthop_af_ops_from_mbuf() (from the packet type). The resolve callback
is dispatched based on the nexthop address family so that neighbor
resolution uses the correct protocol (ARP or NDP). The resubmit callback
is dispatched based on the packet address family so that resolved
packets re-enter the correct output node.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
Add a test that configures IPv4 routes with IPv6 link-local nexthops.
Two namespaces exchange IPv4 traffic where MAC resolution goes through
NDP on the peer link-local addresses. The test verifies both that ping
succeeds and that the nexthops contain the correct MAC.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
FRR variant of ip_forward_ip6nh_test that verifies IPv4 forwarding with
IPv6 link-local nexthops when routes are pushed through the FRR dplane
plugin. The set_ip_route helper now strips the optional interface name
from the nexthop argument before using it in the jq match expression.
Since set_ip_route is also used without an explicit interface, we only
match on the address part to keep it working in both cases.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Reviewed-by: Christophe Fontaine <cfontain@redhat.com>
@rjarry rjarry force-pushed the cross-family-routes branch from ac5035e to 9b7d8e1 Compare March 16, 2026 14:03
@christophefontaine christophefontaine merged commit 307cad9 into DPDK:main Mar 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants