Skip to content

scrimlet-reconcilers: flesh out dpd service NAT reconciler#10442

Open
jgallagher wants to merge 3 commits into
john/feature-sled-agent-scrimlet-reconcilersfrom
john/scrimlet-reconcilers-dpd-nat
Open

scrimlet-reconcilers: flesh out dpd service NAT reconciler#10442
jgallagher wants to merge 3 commits into
john/feature-sled-agent-scrimlet-reconcilersfrom
john/scrimlet-reconcilers-dpd-nat

Conversation

@jgallagher
Copy link
Copy Markdown
Contributor

This is almost all new code (and about 60/40 "real code" / "tests"). Today NAT entry reconciliation is inverted from what this PR implements: dpd fetches NAT entries from Nexus and does reconciliation internally. In this PR, sled-agent does the reconciliation. This requires fetching all NAT entries from dpd (to decide what to remove, if anything). My biggest concerns here are:

  1. Is it correct?
  2. Can we ship this, or do we need to block on dpd: Want a better API for bulk NAT entry operations dendrite#255 or something like it?

This is targeted at the feature / integration branch that's building up the complete sled-agent-scrimlet-reconcilers crate.

@jgallagher jgallagher added the networking Related to the networking. label May 13, 2026
Copy link
Copy Markdown
Contributor

@internet-diglett internet-diglett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +203 to +212
/// * We'd like to eventually replace this with fewer calls, if we add different
/// APIs to `dpd` for applying NAT settings in bulk.
/// <https://github.com/oxidecomputer/dendrite/issues/255>
/// * In practice we expect the number of calls here to be small. On startup we
/// expect ~10 `to_create` calls (one for each service, which is typically 2
/// boundary NTP, 3 Nexus, and 1-5 external DNS), and for every reconciliation
/// attempt after that we expect 0-1 (either no changes, or a single new
/// service has been added or removed; it's possible we'll see multiple, but
/// unlikely given we re-reconcile on every networking config change).
/// * We always want to report the status of every step described by `plan`, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably good for the time being. Once we figure out how we want to resolve #8748, we will likely have the bulk NAT API endpoints implemented in dpd

@internet-diglett
Copy link
Copy Markdown
Contributor

internet-diglett commented May 13, 2026

Although this is different to how we sync service zone nat entries today, the overall pattern of reconciling a diff is similar to how we synchronize other dpd state, so after accounting for the fact that scrimlet sled agents will be doing this for their own service zone nat entries it looks like a sound approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

networking Related to the networking.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants