[multicast] M2P forwarding, OPTE port subscription, and sled-agent propagation by zeeshanlakhani · Pull Request #10070 · oxidecomputer/omicron

zeeshanlakhani · 2026-03-17T03:49:28Z

Complete the multicast data path by adding per-sled M2P (multicast-to- physical) mapping, forwarding entry management, and OPTE port subscription for multicast group members.

Sled-agent:

Add multicast_subscribe / multicast_unsubscribe endpoints (API v29) that configure M2P, forwarding, and OPTE port subscription for a VMM
OPTE port_manager gains set/clear operations for M2P and forwarding
Port subscription cleanup on PortTicket release

Nexus:

New sled.rs (MulticastSledClient) encapsulating all sled-agent multicast interactions: subscribe/unsubscribe, M2P/forwarding propagation and teardown
Groups RPW propagates M2P and forwarding entries to all member sleds after DPD configuration, with convergent retry on failure
Members RPW uses MemberReconcileCtx to thread shared reconciliation state. This handles subscribe on join, unsubscribe on leave, and re-subscribe on migration
Dataplane client updated for bifurcated replication groups

Tests:

Integration tests for M2P/forwarding/subscribe lifecycle
Instance migration multicast re-convergence

zeeshanlakhani · 2026-03-17T05:43:39Z

Note: Both helios/deploy check-opte-ver/check-opte-ver in CI will fail for now. After R19 ships and OPTE PR #924 merges, we can switch from branch = "zl/filter-mcast-srcs" to rev = "<merged-sha>" and bump tools/opte_version + deploy.sh target to 0.40.

common/src/api/internal/shared/mod.rs

…opagation Complete the multicast data path by adding per-sled M2P (multicast-to- physical) mapping, forwarding entry management, and OPTE port subscription for multicast group members. Sled-agent: - Add `multicast_subscribe` / `multicast_unsubscribe` endpoints (API v29) that configure M2P, forwarding, and OPTE port subscription for a VMM - OPTE port_manager gains set/clear operations for M2P and forwarding - Port subscription cleanup on PortTicket release Nexus: - New `sled.rs` (MulticastSledClient) encapsulating all sled-agent multicast interactions: subscribe/unsubscribe, M2P/forwarding propagation and teardown - Groups RPW propagates M2P and forwarding entries to all member sleds after DPD configuration, with convergent retry on failure - Members RPW uses MemberReconcileCtx to thread shared reconciliation state. This handles subscribe on join, unsubscribe on leave, and re-subscribe on migration - Dataplane client updated for bifurcated replication groups Tests: - Integration tests for M2P/forwarding/subscribe lifecycle - Instance migration multicast re-convergence

Move multicast networking types (Mcast2PhysMapping, McastForwardingEntry, etc.) from omicron-common into a new network-types crate with versioned structure. This breaks the dependency cycle between illumos-utils and sled-agent-types-versions (issue #10139). Structure: - network-types/versions/: versioned type definitions (initial v1) - network-types/: re-exports latest versions - Consumers use network_types::multicast::* for current types

Ereports are uniquely identified by the tuple of their reporter's restart ID and the ereport's ENA, as described in [RFD 520][1]. ENAs are ordered within the restart of the reporter, but reporter IDs are randomly-generated UUIDs and do not have a temporal ordering. As I described in [this comment][2] on issue #10073, the approach I'd like to take for loading new ereports to use as inputs to fault management analysis relies on having an ordering of not only the ENA component of the ereport ID but also the ordering of reporter restarts (i.e. "is this restart newer than or older than the one I currently believe to be the latest?"). In order to do this, we need to track the sequence of restart IDs per reporter location. This is done on a location-by-location basis because, well, the laws of physics prevent two sleds/switches/PSCs from occupying the same physical location in the rack at the same time, making their history of reporter restarts inherently orderable in a way that other things, like serial numbers, may be less orderable. This PR adds a table for constructing that ordering and modifies the ereport insert query to actually use it. There's some additional tidiness and refactoring I'd like to do here later (and an OMDB command for printing the ordering table would be nice), but for now, I'd like to be able to actually use this. [1]: https://rfd.shared.oxide.computer/rfd/0520 [2]: #10073 (comment)

As discussed in [this comment][1], it turns out that the approach to sitrep input loading which required the ereporter restart table that I added in #10132 will actually just straight up not work. Unfortunately, we didn't figure that out until after the PR adding that table had merged. We've now found a different way to do this that doesn't require the ordering table. Additionally, the ordering table orders restarts by the order in which ereports were _added to the database_, not the order in which the restarts were actually first observed, which is of debatable usefulness at best. Therefore, we should just get rid of it. Unfortunately, because there was a database migration, we cannot simply push a commit that reverts eb0bb82 and removes all evidence of my shame. Instead, we must have a new migration that just kinda undoes the migration I wrote yesterday, in case anyone has installed a build of the control plane containing eb0bb82 some time in approximately the last two hours. So, undoing the change requires somewhat more ceremony than it would if the database schema has not been touched. This PR does that. [1]: #10073 (comment)

I have another PR that will follow this one to make the set default logic for subnet pools work like IP pools (demote and replace existing default instead of erroring if there's already a default), and as part of that I started doing some test cleanup. I moved that into its own separate PR to make the real PR easier to review. * Replace inline `NexusRequest::object_get`/`object_put`/`object_delete`/`expect_failure`/etc. with the shared `object_get`, `object_put`, `object_delete`, `objects_list_page_authz`, and `*_error helpers` from resource_helpers. * Add `assert_silos_for_pool` and `assert_pools_for_silo` helpers to reduce repeated assertion boilerplate. * Remove stale TODO comments referencing #9453 (not in the tests but just a comment change)

- Remove the footgunny method for fetching "the" default IP Pool for a silo, which is hard to use correctly without specifying an IP version. Also fix call sites. - Add method for returning all the default IP Pools for a Silo, of which there are at most 2. - In the VPC creation saga, ensure that we create an internet gateway pool for all the IP Pools currently linked to the silo as a default. - Add a data migration that renames any existing "default" gateway IP Pools as "default-v4" or "default-v6", as we're now specifying the version. - Add regression test for #10117. - This fixes #10117.

zeeshanlakhani · 2026-03-26T06:51:32Z

~~@jgallagher I started down the better path for network types in relation to #10139, where we could expand on the initial version in a separate PR.~~ For the moment, I'm going to move the types back to omicron_common until #10158 finalizes.

FelixMcFelix

I'm going to be posting comments in chunks given the apparent size of the PR, so please bear with me there if I miss something that's covered by a later file. Apart from that I'm seeing various changes from main interspersed here, so I don't know if something is up with the branch targeting.

FelixMcFelix · 2026-03-25T18:41:07Z

illumos-utils/src/opte/illumos.rs

+
+    #[error("Failed to set multicast-to-physical mapping: {0}")]
+    SetMcastM2p(String),
+
+    #[error("Failed to clear multicast-to-physical mapping: {0}")]
+    ClearMcastM2p(String),
+
+    #[error("Failed to set multicast forwarding: {0}")]
+    SetMcastFwd(String),
+
+    #[error("Failed to clear multicast forwarding: {0}")]
+    ClearMcastFwd(String),
+
+    #[error("Failed to list multicast forwarding: {0}")]
+    ListMcastFwd(String),


I think this is not quite the right way to think about the error type here. What's fundamentally going wrong in most of these these cases is that MulticastUnderlay::new failed, so it's the same class of parameter validation failure (and so one error variant). ListMcastFwd is obviously the exception, but I've written on that below. The other thing is that I think String is probably the wrong error type in OPTE itself for a validation failure.

illumos-utils/src/opte/non_illumos.rs

FelixMcFelix · 2026-03-25T19:02:57Z

illumos-utils/src/opte/port_manager.rs

+    /// Install CIDR allow rules for multicast traffic on a port.
+    fn install_mcast_cidr_allows(
+        &self,
+        hdl: &Handle,
+        port_name: &str,
+    ) -> Result<(), Error> {
+        for (cidr, dir) in mcast_cidr_allow_set() {
+            debug!(
+                self.inner.log,
+                "installing multicast CIDR allow rule";
+                "port" => port_name,
+                "cidr" => %cidr,
+                "direction" => ?dir,
+            );
+            hdl.allow_cidr(port_name, cidr, dir)?;
+        }
+        Ok(())
+    }
+
+    /// Remove CIDR allow rules for multicast traffic from a port.
+    fn remove_mcast_cidr_allows(
+        &self,
+        hdl: &Handle,
+        port_name: &str,
+    ) -> Result<(), Error> {
+        for (cidr, dir) in mcast_cidr_allow_set() {
+            debug!(
+                self.inner.log,
+                "removing multicast CIDR allow rule";
+                "port" => port_name,
+                "cidr" => %cidr,
+                "direction" => ?dir,
+            );
+            hdl.remove_cidr(port_name, cidr, dir)?;
+        }


The add/remove_cidr ioctls are intended for transit IPs, not this. But also this logic shouldn't be necessary? We've already set up the gateway layer in OPTE to handle this according to your description here: https://github.com/oxidecomputer/opte/blob/04c3d5d37d7b919cbf01019d2a17b93ff2df2eb8/lib/oxide-vpc/src/engine/gateway/mod.rs#L43-L56.

I think if we were converting them to individual/aggregated subnet exceptions, we would leave that to OPTE based on the group subscription info it has.

Ah yeah, you're right. I had this as a placeholder, but dealt with it in the gateway code.

FelixMcFelix · 2026-03-25T19:03:38Z

illumos-utils/src/opte/non_illumos.rs

        _: Direction,
    ) -> Result<NoResp, OpteError> {
-        unimplemented!("Not yet used in tests")
+        Ok(NO_RESPONSE)


This should still be marked as unimplemented, I think?

Not necessary with removal based on previous comment.

FelixMcFelix · 2026-03-26T11:58:39Z

dev-tools/releng/src/main.rs

+    // Parse tools/opte_version_override for OPTE_COMMIT and
+    // OPTE_PKG_FALLBACK_VERSION. When OPTE_COMMIT is set, the OPTE package
+    // isn't published to the helios pkg repo yet, so we install the fallback
+    // version via pkg and overlay the xde binary and opteadm from buildomat.
+    let opte_override = parse_opte_version_override(
+        &WORKSPACE_DIR.join("tools/opte_version_override"),
+    )
+    .await?;
+    let effective_opte_version = match &opte_override {
+        Some(ov) => {
+            info!(
+                logger,
+                "OPTE override active: commit={}, fallback pkg version={}",
+                ov.commit,
+                ov.fallback_version,
+            );
+            &ov.fallback_version
+        }
+        None => opte_version.trim(),
+    };
+


I think this is useful, but I think ideally we'd want to pursue this via an override p5p from a given commit, which OPTE's CI does generate. I know Ry looked into this a while back with illumos/image-builder#1, but I think that the extra_packages machinery when building image_cmd might be able to help us?

If we're trying to greater enshrine this so we can run tests more reliably, we should also modify .github/workflows/check-opte-ver.yml (or add a new stage) such that overall CI fails if any of these override flags are set. I don't want us to be able to merge any version workarounds if possible.

FelixMcFelix · 2026-03-26T13:03:41Z

illumos-utils/src/opte/port_manager.rs

+    mcast_subscriptions:
+        Mutex<HashMap<(Uuid, NetworkInterfaceKind), McastPortState>>,


I think it would be easier to keep this state, and that of eip_gateways, in ports. Stuffing it into Port/PortInner might be tricky, since that type is shared with the Zones for a few reasons, so I don't know if something like this will suffice:

struct MutablePortState { fixed_data: Port, eip_gateways: HashSet<Uuid>, mcast: McastPortState, }

I also, one day, want to move EIP bindings down from the instance manager to here to make those pathways less circuitous.

Sorry for kicking off the bad pattern there (mainly born from expediency around that immutable Port type), but I think we should nix it while we can. This should also drastically simplify the need to reason about lock order.

(Your VnicUuid comment also aplies to ports and eip_gateways, that would be nice but it's not necessary here.)

FelixMcFelix · 2026-03-26T13:06:27Z

illumos-utils/src/opte/port_manager.rs

+/// The set of (CIDR, direction) pairs for multicast allow rules.
+/// Covers [`IPV4_MULTICAST_RANGE`] (224.0.0.0/4) and
+/// [`IPV6_MULTICAST_RANGE`] (ff00::/8) in both directions.
+fn mcast_cidr_allow_set() -> [(IpCidr, Direction); 4] {
+    let v4 = IPV4_MULTICAST_RANGE;
+    let v6 = IPV6_MULTICAST_RANGE;
+    let ipv4 = IpCidr::Ip4(Ipv4Cidr::new(
+        oxide_vpc::api::Ipv4Addr::from(v4.addr()),
+        oxide_vpc::api::Ipv4PrefixLen::new(v4.width()).unwrap(),
+    ));
+    let ipv6 = IpCidr::Ip6(Ipv6Cidr::new(
+        oxide_vpc::api::Ipv6Addr::from(v6.addr()),
+        oxide_vpc::api::Ipv6PrefixLen::new(v6.width()).unwrap(),
+    ));
+    [
+        (ipv4, Direction::In),
+        (ipv4, Direction::Out),
+        (ipv6, Direction::In),
+        (ipv6, Direction::Out),
+    ]
+}


This shouldn't be needed assuming the comments in OPTE are correct.

FelixMcFelix · 2026-03-26T13:16:32Z

illumos-utils/src/opte/port_manager.rs

+            hdl.mcast_unsubscribe(&McastUnsubscribeReq {
+                port_name: port_name.clone(),
+                group: (*group_ip).into(),
+            })?;


Maybe comment here that this is effectively infallible (we've verified the IPs are multicast at least implicitly, the operation is idempotent, the port exists).

FelixMcFelix · 2026-03-26T13:17:52Z

illumos-utils/src/opte/port_manager.rs

+                hdl.mcast_subscribe(&McastSubscribeReq {
+                    port_name: port_name.clone(),
+                    group: (*group_ip).into(),
+                    filter: filter.clone(),
+                })?;


Ditto to above, mainly since we don't want to give the impression that this is something that can fail partially.

FelixMcFelix · 2026-03-26T13:22:49Z

illumos-utils/src/opte/port_manager.rs

+                            oxide_vpc::api::Replication::Reserved => {
+                                return Err(Error::ListMcastFwd(
+                                    "OPTE returned Reserved replication \
+                                     mode, which has no defined semantics"
+                                        .to_string(),
+                                ));
+                            }


If this is ever set this way, it would have to be from an operator manually playing with opteadm I assume. Should we be bailing here if that's the case? We should either be listing the system state unconditionally or flagging an error log at worst, such that consumers like converge_forwarding don't get wedged trying to reconcile system state.

zeeshanlakhani requested a review from FelixMcFelix March 17, 2026 03:49

zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch 4 times, most recently from 3efe7fa to 98e9742 Compare March 17, 2026 05:42

zeeshanlakhani marked this pull request as ready for review March 17, 2026 09:02

zeeshanlakhani self-assigned this Mar 23, 2026

zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 98e9742 to b510a9f Compare March 24, 2026 02:40

jgallagher reviewed Mar 24, 2026

View reviewed changes

common/src/api/internal/shared/mod.rs Outdated Show resolved Hide resolved

jgallagher mentioned this pull request Mar 24, 2026

Add support for ICMPv6 protocol filter in firewall rules #10101

Merged

bnaecker mentioned this pull request Mar 24, 2026

Want versioned crate for Omicron networking types #10139

Open

zeeshanlakhani and others added 7 commits March 24, 2026 18:03

merge m2p-forwarding into multicast-e2e

261c499

zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from b510a9f to 1402d4d Compare March 26, 2026 06:42

zeeshanlakhani requested a review from jgallagher March 26, 2026 06:50

zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 1402d4d to 102d7fb Compare March 26, 2026 09:23

cleanup + merge + opte fallback

043af28

zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 102d7fb to 043af28 Compare March 26, 2026 11:51

FelixMcFelix reviewed Mar 26, 2026

View reviewed changes

jgallagher mentioned this pull request Mar 26, 2026

Invert dependency between sled-agent-types and illumos-utils #10158

Merged

		mcast_subscriptions:
		Mutex<HashMap<(Uuid, NetworkInterfaceKind), McastPortState>>,

Conversation

zeeshanlakhani commented Mar 17, 2026

Uh oh!

zeeshanlakhani commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zeeshanlakhani commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FelixMcFelix left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zeeshanlakhani commented Mar 17, 2026 •

edited

Loading

zeeshanlakhani commented Mar 26, 2026 •

edited

Loading