[fm] mark ereports included in a sitrep as "seen" by hawkw · Pull Request #10156 · oxidecomputer/omicron

hawkw · 2026-03-25T22:48:30Z

In order to implement the FM analysis preparation phase (#10073), we must have a mechanism to load ereports from the database that are considered "new" (i.e. they were not included as inputs to prior sitreps). The design for detecting which ereports to load described in this comment involves having the execution phase eventually mark the database records for those ereports as having been seen so that they are not considered "new" in subsequent preparation phases, which is what this branch implements.

Since this is part of the execution phase, it is not guaranteed to have occurred by the time a new preparation phase has begun. This means that the marking of seen ereports is eventually consistent. This is acceptable in the design proposed in #10073, as the marking is really intended as an optimization. When we actually assemble the inputs to analysis, we will additionally filter out any ereports which are part of the parent sitrep, since they have already been included. The marking is intended as an optimization to allow those ereports to not be held in memory forever, and as long as it will eventually happen, it is okay for it to lag behind the loading of ereports in the preparation phase.

hawkw · 2026-03-26T19:31:26Z

nexus/src/app/background/tasks/fm_rendezvous.rs

        // TODO(eliza): as we start doing other things (i.e. requesting support
        // bundles, updating problems), consider spawning these in their own tasks...
        let alerts = self.create_requested_alerts(&sitrep, opctx).await;
+        let marking = self.mark_ereports_seen(&sitrep, opctx).await;


Now that we are actually doing multiple things (and will be doing even more multiple things when @mergeconflict's work on support bundle requests land) we should probably actually spawn these. However, I think I'm going to do that in a follow-up PR...

hawkw · 2026-03-26T19:44:43Z

nexus/src/app/background/tasks/fm_rendezvous.rs

this is where a bunch of the important parts of the change live, so of course, GitHub has decided to collapse it by default 🙃

smklein · 2026-03-26T23:03:29Z

schema/crdb/dbinit.sql

+     * if this is `true`, the ereport has *definitely* been seen by at least
+     * one committed sitrep at some point in time. if it is `false`, the
+     * ereport may or may not have been included in a sitrep, and you will
+     * have to actually check the sitrep to find out.


nit: Comments say "true"/"false", but this is a UUID - should say nullable.

Also would be nice to identify if this is supposed to be the sitrep "executing" or the sitrep which "first marked the ereport as seen". I expect they'll often, but not always, be the same

Good catch! And, yeah, it is the sitrep that was executing when this row was marked seen, which may not actually be the first sitrep it appeared in. That should be written down here.

smklein · 2026-03-26T23:16:45Z

nexus/db-queries/src/db/datastore/ereport.rs

+        opctx: &OpContext,
+        sitrep_id: SitrepUuid,
+        ereport_ids: impl IntoIterator<Item = EreportId>,
+    ) -> Result<usize, Error> {


Should the be an opctx check on the ability to modify things here?

(looking at parity with the other pub fn to list them...)

smklein · 2026-03-26T23:17:09Z

nexus/db-queries/src/db/datastore/ereport.rs

+    ///
+    /// If any ereports have already been marked as seen with another sitrep ID,
+    /// they are unmodified. Otherwise, this query sets the `marked_seen_in`
+    /// column to the provided sitrep ID.


Maybe a note about the return value meaning?

smklein · 2026-03-26T23:17:39Z

nexus/db-queries/src/db/datastore/ereport.rs

+        //
+        // This bit is kindas screwy: unfortunately, Postgres serialization
+        // does not support bind parameters which are arrays of tuples, so
+        // we must bind two separate arrays. Trust me on this one.


aight IF YOU SAY SO

smklein · 2026-03-26T23:19:59Z

nexus/db-queries/src/db/datastore/ereport.rs

+            enas.push(DbEna::from(ena));
+        }
+
+        // Raw SQL is necessary here as `diesel`'s `.eq_any` doesn't work with


this is a little paranoid of me, but: do we have a test for this checking that it doesn't fall apart when the ererport_ids iterator is empty?

smklein · 2026-03-26T23:21:21Z

nexus/db-queries/src/db/datastore/fm.rs

            .await
            .expect("failed to insert second sitrep");

+        // Delete the original sitrep


This comment looks like it was added here, but it doesn't look like we're doing deletion here?

smklein · 2026-03-26T23:26:31Z

dev-tools/omdb/src/bin/omdb/nexus.rs

+        "    {NOT_ALREADY_MARKED:<WIDTH$}{ereports_not_marked_in_sitrep:>NUM_WIDTH$}"
+    );
+    println!("    {MARKED_SEEN:<WIDTH$}{ereports_marked_seen:>NUM_WIDTH$}");
+    let already_marked = ereports_not_marked_in_sitrep - ereports_marked_seen;


We're sure this won't underflow?

smklein · 2026-03-26T23:39:32Z

nexus/db-queries/src/db/datastore/fm.rs

            self.fm_sitrep_metadata_read_on_conn(id, &conn).await?.into();

-        Ok(Sitrep { metadata, cases })
+        Ok(Sitrep { metadata, cases, ereports_by_id: ereports })


Is this ereports_by_id derivable from cases?

If no: How does it differ?

If yes: would it be a problem if they differ? Should we be checking sitreps for internal consistency?

(We have a blueprint checker called "blippy" for this reason, might be time to make a "slippy")

smklein · 2026-03-26T23:59:34Z

nexus/fm/src/builder.rs

    pub inventory: &'a inventory::Collection,
    pub parent_sitrep: Option<&'a fm::Sitrep>,
    pub sitrep_id: SitrepUuid,
    pub cases: case::AllCases,


I want to make sure I have a solid understanding about how this builder is intended to be used, especially with cases being a pub field right now.

Here's my main concern: conceptually, I'm observing three different phases of sitrep construction + usage:

Preparation (loading inputs from the database)

Planning (building the next sitrep)

Execution (enacting the sitrep)

Within the context of "Marking ereports as seen", I can see:

Planning couples the set of "seen" ereports tightly with cases

Execution tries to mark the erports as "seen" within the database

But I'm not seeing the Preparation phase here, which would load inputs from the database, and use them in the construction of the next builder.

It's possible that "this doesn't exist yet, because we aren't yet using the SitrepBuilder, but I wanted to flag this early: We'll need a way to verify that ereports which should be marked as seen in the database actually have been seen! It's possible that the fm_rendezvous task gets stalled for some reason - we cannot assume that the ereports have been marked "seen" when the next sitrep is created.

Anyway, I bring this up because: with cases being public, it seems wrong to mark the case closed, or to remove an ereport from a case, until it has been marked seen. Otherwise, we could risk the following race condition:

ereport E is processed by sitrep S1

sitrep S1 is the target, but doesn't get enough time to execute fully (or fails to rendezvous)

a new sitrep S2 is the new target, drops the reference to ereport E

ereport E no longer exists in the sitrep -- so we no longer try to mark ereport E as "seen" -- and it gets loaded and evaluated as new

[fm] start ereport seenness marking

855c172

hawkw mentioned this pull request Mar 25, 2026

FM: design for diagnosis inputs/analysis preparation phase #10073

Open

hawkw added 14 commits March 25, 2026 16:01

reticulating

300f4dc

reticulating

cccf92f

reticulating

d2a4e69

continue reticulating

ae985e1

omdeeby

f878d0b

stop yelling

3ddad1f

blarg

348dc93

add migration

c99d78e

docs, explain weird query

e0f8f20

what if you could actually query for unseen ereports

f9fe165

bg task tests

aa8b3d7

idempotency

0532c4b

nicer status

f1ec454

Merge branch 'main' into eliza/ereports-seen

cc5bdbd

hawkw changed the title ~~[fm] start ereport seenness marking~~ [fm] mark ereports included in a sitrep as "seen" Mar 26, 2026

rustfmt

09c0b6a

hawkw marked this pull request as ready for review March 26, 2026 19:28

hawkw requested a review from smklein March 26, 2026 19:28

hawkw self-assigned this Mar 26, 2026

hawkw added the fault-management Everything related to the fault-management initiative (RFD480 and others) label Mar 26, 2026

hawkw added this to the 20 milestone Mar 26, 2026

hawkw requested a review from mergeconflict March 26, 2026 19:29

hawkw commented Mar 26, 2026

View reviewed changes

remember to do the migration thingy

ff093f6

hawkw commented Mar 26, 2026

View reviewed changes

forgot to expectorate

398037d

smklein reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fm] mark ereports included in a sitrep as "seen"#10156

[fm] mark ereports included in a sitrep as "seen"#10156
hawkw wants to merge 18 commits intomainfrom
eliza/ereports-seen

hawkw commented Mar 25, 2026 •

edited

Loading

Uh oh!

hawkw Mar 26, 2026

Uh oh!

hawkw Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

hawkw Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

smklein Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hawkw commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hawkw commented Mar 25, 2026 •

edited

Loading