[fm] mark ereports included in a sitrep as "seen"#10156
Conversation
| // TODO(eliza): as we start doing other things (i.e. requesting support | ||
| // bundles, updating problems), consider spawning these in their own tasks... | ||
| let alerts = self.create_requested_alerts(&sitrep, opctx).await; | ||
| let marking = self.mark_ereports_seen(&sitrep, opctx).await; |
There was a problem hiding this comment.
Now that we are actually doing multiple things (and will be doing even more multiple things when @mergeconflict's work on support bundle requests land) we should probably actually spawn these. However, I think I'm going to do that in a follow-up PR...
There was a problem hiding this comment.
this is where a bunch of the important parts of the change live, so of course, GitHub has decided to collapse it by default 🙃
| * if this is `true`, the ereport has *definitely* been seen by at least | ||
| * one committed sitrep at some point in time. if it is `false`, the | ||
| * ereport may or may not have been included in a sitrep, and you will | ||
| * have to actually check the sitrep to find out. |
There was a problem hiding this comment.
nit: Comments say "true"/"false", but this is a UUID - should say nullable.
Also would be nice to identify if this is supposed to be the sitrep "executing" or the sitrep which "first marked the ereport as seen". I expect they'll often, but not always, be the same
There was a problem hiding this comment.
Good catch! And, yeah, it is the sitrep that was executing when this row was marked seen, which may not actually be the first sitrep it appeared in. That should be written down here.
| opctx: &OpContext, | ||
| sitrep_id: SitrepUuid, | ||
| ereport_ids: impl IntoIterator<Item = EreportId>, | ||
| ) -> Result<usize, Error> { |
There was a problem hiding this comment.
Should the be an opctx check on the ability to modify things here?
(looking at parity with the other pub fn to list them...)
| /// | ||
| /// If any ereports have already been marked as seen with another sitrep ID, | ||
| /// they are unmodified. Otherwise, this query sets the `marked_seen_in` | ||
| /// column to the provided sitrep ID. |
There was a problem hiding this comment.
Maybe a note about the return value meaning?
| // | ||
| // This bit is kindas screwy: unfortunately, Postgres serialization | ||
| // does not support bind parameters which are arrays of tuples, so | ||
| // we must bind two separate arrays. Trust me on this one. |
| enas.push(DbEna::from(ena)); | ||
| } | ||
|
|
||
| // Raw SQL is necessary here as `diesel`'s `.eq_any` doesn't work with |
There was a problem hiding this comment.
this is a little paranoid of me, but: do we have a test for this checking that it doesn't fall apart when the ererport_ids iterator is empty?
| .await | ||
| .expect("failed to insert second sitrep"); | ||
|
|
||
| // Delete the original sitrep |
There was a problem hiding this comment.
This comment looks like it was added here, but it doesn't look like we're doing deletion here?
| " {NOT_ALREADY_MARKED:<WIDTH$}{ereports_not_marked_in_sitrep:>NUM_WIDTH$}" | ||
| ); | ||
| println!(" {MARKED_SEEN:<WIDTH$}{ereports_marked_seen:>NUM_WIDTH$}"); | ||
| let already_marked = ereports_not_marked_in_sitrep - ereports_marked_seen; |
There was a problem hiding this comment.
We're sure this won't underflow?
| self.fm_sitrep_metadata_read_on_conn(id, &conn).await?.into(); | ||
|
|
||
| Ok(Sitrep { metadata, cases }) | ||
| Ok(Sitrep { metadata, cases, ereports_by_id: ereports }) |
There was a problem hiding this comment.
Is this ereports_by_id derivable from cases?
- If no: How does it differ?
- If yes: would it be a problem if they differ? Should we be checking sitreps for internal consistency?
(We have a blueprint checker called "blippy" for this reason, might be time to make a "slippy")
| pub inventory: &'a inventory::Collection, | ||
| pub parent_sitrep: Option<&'a fm::Sitrep>, | ||
| pub sitrep_id: SitrepUuid, | ||
| pub cases: case::AllCases, |
There was a problem hiding this comment.
I want to make sure I have a solid understanding about how this builder is intended to be used, especially with cases being a pub field right now.
Here's my main concern: conceptually, I'm observing three different phases of sitrep construction + usage:
- Preparation (loading inputs from the database)
- Planning (building the next sitrep)
- Execution (enacting the sitrep)
Within the context of "Marking ereports as seen", I can see:
- Planning couples the set of "seen" ereports tightly with cases
- Execution tries to mark the erports as "seen" within the database
But I'm not seeing the Preparation phase here, which would load inputs from the database, and use them in the construction of the next builder.
It's possible that "this doesn't exist yet, because we aren't yet using the SitrepBuilder, but I wanted to flag this early: We'll need a way to verify that ereports which should be marked as seen in the database actually have been seen! It's possible that the fm_rendezvous task gets stalled for some reason - we cannot assume that the ereports have been marked "seen" when the next sitrep is created.
Anyway, I bring this up because: with cases being public, it seems wrong to mark the case closed, or to remove an ereport from a case, until it has been marked seen. Otherwise, we could risk the following race condition:
- ereport E is processed by sitrep S1
- sitrep S1 is the target, but doesn't get enough time to execute fully (or fails to rendezvous)
- a new sitrep S2 is the new target, drops the reference to ereport E
- ereport E no longer exists in the sitrep -- so we no longer try to mark ereport E as "seen" -- and it gets loaded and evaluated as new
In order to implement the FM analysis preparation phase (#10073), we must have a mechanism to load ereports from the database that are considered "new" (i.e. they were not included as inputs to prior sitreps). The design for detecting which ereports to load described in this comment involves having the execution phase eventually mark the database records for those ereports as having been seen so that they are not considered "new" in subsequent preparation phases, which is what this branch implements.
Since this is part of the execution phase, it is not guaranteed to have occurred by the time a new preparation phase has begun. This means that the marking of seen ereports is eventually consistent. This is acceptable in the design proposed in #10073, as the marking is really intended as an optimization. When we actually assemble the inputs to analysis, we will additionally filter out any ereports which are part of the parent sitrep, since they have already been included. The marking is intended as an optimization to allow those ereports to not be held in memory forever, and as long as it will eventually happen, it is okay for it to lag behind the loading of ereports in the preparation phase.