Skip to content

MRIB: Multicast RIB implementation#576

Merged
zeeshanlakhani merged 8 commits intomulticast-e2efrom
zl/mrib
Mar 23, 2026
Merged

MRIB: Multicast RIB implementation#576
zeeshanlakhani merged 8 commits intomulticast-e2efrom
zl/mrib

Conversation

@zeeshanlakhani
Copy link
Copy Markdown

@zeeshanlakhani zeeshanlakhani commented Dec 8, 2025

Implements the Multicast Routing Information Base (MRIB) for multicast support. The MRIB follows a two-table architecture (mrib_inmrib_loc) with RPF verification against the unicast RIB when sources are provided.

This PR includes:

  • rdb/src/db.rs: modifications to accomodate MRIB implementation and persistence
  • rdb/src/mrib/mod.rs: the core MRIB implementation with route storage and change notifications
  • rdb/src/mrib/rpf.rs: RPF verification using poptrie for O(1) LPM lookups, with rate-limited rebuilds triggered on unicast RIB changes
  • rdb/src/types.rs: Validated multicast address types with input validation
  • mg-api/src/lib.rs: API v4 (VERSION_MULTICAST_SUPPORT) with new endpoints
  • mgd/src/mrib_admin.rs: HTTP handlers bridging API to MRIB
  • mgadm/src/mrib.rs: CLI for MRIB inspection and configuration

Note that Omicron is source of truth multicast overlay/underlay groups/addresses.

Comment thread rdb/src/types.rs
Comment thread rdb/src/test.rs Outdated
Comment thread rdb/src/test.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mg-api/src/lib.rs
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread mgadm/src/mrib.rs Outdated
Comment thread rdb/src/types.rs Outdated
Comment thread rdb/src/types.rs Outdated
Comment thread rdb/src/types.rs Outdated
Comment thread rdb/src/types.rs Outdated
Comment thread rdb/src/types.rs Outdated
Comment thread rdb/src/types.rs Outdated
@zeeshanlakhani zeeshanlakhani changed the base branch from main to multicast-e2e February 19, 2026 13:54
taspelund and others added 6 commits March 3, 2026 11:27
* rdb: fix silent path loss in Path Ord impl

The Path Ord implementation had a fallback chain that compared
fields like nexthop, shutdown, rib_priority, then delegated to
BgpPathProperties::Ord (which compared origin_as, id, as_path, stale).
When two distinct BGP paths matched on all those fields, Ord returned
Equal and BTreeSet silently dropped one — affecting rib_in, bestpath
selection, and rib_loc.

For example, two unnumbered BGP sessions to the same router on
different interfaces share the same link-local nexthop, router ID,
AS, and AS path — all fields the old Ord compared. A router with
sessions on eth0 and eth1 both receiving 10.0.0.0/24 from AS 65000
would silently lose one of those paths.

This replaces the fallback chain with a match on path type that cleanly
separates identity from attributes:

  - BGP path identity: PeerId only
  - Static path identity: (nexthop, nexthop_interface, vlan_id)
  - Cross-type: BGP sorts after static (Some > None)

All other fields (med, local_pref, origin_as, as_path, shutdown,
rib_priority, etc.) are attributes carried by the path, not identity.
They get updated via BTreeSet::replace() without affecting set
membership.

Remove Ord/PartialOrd from BgpPathProperties since nothing compares
it directly — Path::Ord extracts the peer field itself.

Fixes: #649

* rdb: fix nexthop shutdown path replacement

In the set_nexthop_shutdown() codepath we were calling BTreeSet.insert()
when we really needed to be calling BTreeSet.replace(). The crux of the
issue is that .insert() is a no-op if the set alredy has an element
whose Ord impl returns Ordering::Equal, whereas .replace() will
overwrite the element. The code was semantically setup for a replacement
but was calling for an insertion, and this cleans it up.

Fixes: #651

* cargo fmt
It was observed in a customer deployment that several (16) recv loop
threads were pinning the CPU due to a busy loop that was constantly
handling EAGAIN.
Code analysis showed that non_blocking was set to true only for inbound
connections, as the Listener<BgpConnectionTcp> was configured to be
nonblocking and TCP sockets returned by accept() inherit this setting.
The recv loop thread for each BgpConnectionTcp is structured in a way
that depends on SO_RCVTIMEO being respected in order to rate limit the
busy loop. However, when nonblocking is true SO_RCVTIMEO is ignored and
EAGAIN/EWOULDBLOCK is returned immediately -- short-circuiting the
timeout that rate limits the busy loop.

This explicitly sets nonblocking to false for the new sockets returned
by accept(), which ensures that the timeout works to rate limit the busy
loop as expected.

Fixes: #657
* Set TTL/Hop Limit to 255 on BFD control packets

RFC 5881 describes single-hop BFD for IPv4 and IPv6, and in it there's a
requirement to set the TLL/Hop Limit to 255 for all control packets. We
weren't updating the value on the UdpSocket we use to send our control
packets, allowing the OS to pick its own defaults.

This was discovered when doing manual validation of dual-stack BFD for
static routes with FRR as our peer. IPv4 sessions came all the way up,
but IPv6 sessions never did. Debug logs on the FRR side showed the
following errors, which were the dead giveaway of the issue:
```
2026-03-04 19:25:53 [DEBG] bfdd: [YA0Q5-C0BPV] control-packet: invalid TTL: 60 expected 255 [mhop:no peer:fd00:101::6 local:fd00:101::5 port:4]
2026-03-04 19:25:53 [DEBG] bfdd: [YA0Q5-C0BPV] control-packet: invalid TTL: 60 expected 255 [mhop:no peer:fd00:101::2 local:fd00:101::1 port:2]
2026-03-04 19:25:54 [DEBG] bfdd: [YA0Q5-C0BPV] control-packet: invalid TTL: 60 expected 255 [mhop:no peer:fd00:101::a local:fd00:101::9 port:3]
```

* Read my lips. No. New. UdpSockets.

Stop allocating a new UdpSocket for every BFD packet we transmit. That's
horribly inefficient and unnecessary. Restructures egress() to use
nested loops with different break conditions based on the type of error:
socket errors trigger a new socket creation while channel errors still
break out of the egress function.

Fixes: #655

* mgd: add TTL/Hop Limit unit tests for BFD sockets

* Use BfdEndpoint type alias everywhere

* mgd: use slog::o instead of custom log macro

Reduce the amount of times we supply default args to logger by creating
a child logger with the k/v attributes attached.
Comment thread mgd/src/mrib_admin.rs
Comment thread rdb/src/db.rs Outdated
Comment thread rdb/src/db.rs
Comment thread mgd/src/mrib_admin.rs Outdated
Comment thread rdb/src/mrib/mod.rs Outdated
Comment thread rdb/src/mrib/rpf.rs Outdated
Comment thread rdb/src/mrib/rpf.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mg-api/src/lib.rs Outdated
Comment thread mgadm/src/static_routing.rs Outdated
Comment thread rdb/src/db.rs Outdated
@taspelund
Copy link
Copy Markdown
Contributor

We'll also have to make sure that API updates use the RFC 619/634 style before we can merge into main (#611 + #652)

@zeeshanlakhani
Copy link
Copy Markdown
Author

We'll also have to make sure that API updates use the RFC 619/634 style before we can merge into main (#611 + #652)

I've updated this to be ready to go after #665 goes in, using the new, better approach.

Migrate to a versions crate as laid out in RFD 619.

I also took the opportunity to clean up some of the operation IDs a bunch -- in general we prefer that the latest versions of endpoints do not have versioned identifiers, and older versions do (but restoring the old operation ID is typically not necessary since they are blessed and immutable).
@zeeshanlakhani
Copy link
Copy Markdown
Author

@taspelund overrided the last commit due to API local/blessed checks.

…ni type, sled transactions, merge main (665)

This is setup to come after #665 API modifications, but
uses the up-to-date setup and dropshot API manager.
@zeeshanlakhani zeeshanlakhani merged commit 85f0f85 into multicast-e2e Mar 23, 2026
15 checks passed
zeeshanlakhani added a commit that referenced this pull request Mar 23, 2026
Includes: two-table MRIB (mrib_in -> mrib_loc) with RPF verification, API v8
(MULTICAST_SUPPORT) following RFD 619 pattern atop #665.

PR pointed at @main: #576

This PR sets OPTE to the #924 branch ~ zl/filter-mcast-srcs branch.
@zeeshanlakhani zeeshanlakhani deleted the zl/mrib branch March 23, 2026 20:44
@zeeshanlakhani zeeshanlakhani restored the zl/mrib branch March 25, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants