mvp vm attestation by jordanhendricks · Pull Request #1091 · oxidecomputer/propolis

jordanhendricks · 2026-03-27T19:12:21Z

TODO:

understand why we see spurious attestation failures (@jordanhendricks) (edit: this was the system working as expected, failing requests before the boot digest is done)
enforce read-only-ness of the boot disk (@jordanhendricks)
understand why stopping an instance with this branch failed (example on berlin) (@iximeow) (edit: this became make Attest an async trait dice-util#360 and associated work)
understand vsock issues (edit: will be merged in XXX close connections #1094)
attestation module comment
fix phd test failures (@jordanhendricks)
file follow-on issues for code cleanup, clippy fix

Testing notes

Happy path

I tested using an image @flihp built. (I've put the image for posterity in the lab at /staff/jordan/propolis-1091).

I made the instance with that image and a RO boot disk. For example:

#/usr/bin/env bash

set -euo pipefail

export OXIDE_HOST="https://recovery.sys.dublin.eng.oxide.computer/"
export PROFILE="dublin"

# image UUID
IMAGE="3fb74969-8981-437f-b4b5-2dc82d3ca191"

cat <<EOF > "./request.json"
{
  "name": "$NAME",
  "description": "demo instance",
  "hostname": "$NAME",
  "memory": 2147483648,
  "ncpus": 2,
  "start": false,
  "boot_disk": {
    "type": "create",
    "description": "a disk to run instance with name $NAME",
    "disk_backend": {
      "type": "distributed",
      "disk_source": {
        "type": "image",
        "image_id": "$IMAGE",
        "read_only": true
      }
    },
    "name": "disk-for-$NAME",
    "size": 2147483648
  }
}
EOF

$HOME/src/oxide.rs/target/debug/oxide instance create \
    --profile "$PROFILE" \
    --project "jordan" \
    --json-body "./request.json"

Inside the vm, edit vm-instance-cfg.json to have the instance UUID and the sha256sum of the image (in the case of the demo image, that's 06ff4481c775ffce878e722927985bffaa1fc7de5d3c2e231bea2adecd22615f).

Then in the VM, for a racklette (username root, password password), run:

$ appraiser --root-cert platform-id-staging.pem --reference-measurements sp-v1.0.64-1.0.64_corim.cbor --reference-measurements rot-1-v1.0.38-1.0.38_corim.cbor --vm-instance-cfg vm-instance-cfg.json vm-instance-rot vsock 605 -vvv

If everything works, you should see:

[2026-04-06T21:04:08Z INFO  appraiser] metadata from Oxide VM Instance RoT appraised
[2026-04-06T21:04:08Z INFO  appraiser] appraised attestation from VmInstanceRot over vsock

No boot disk

Steps to test: create instance, stop it (or don't auto-start it), then remove the boot disk as a boot disk. Send a challenge from inside the guest.

Result: attestation server used just the instance UUID for qualifying data

21:24:25.538Z INFO propolis-server (vm_state_driver): vm conf is ready = VmInstanceConf { uuid: 1f1ec2e3-c5cf-4eaf-8a19-aa25ec1f6895, boot_digest: None }

iximeow · 2026-03-27T23:01:54Z

Cargo.toml

+# Attestation
+#dice-verifier = { git = "https://github.com/oxidecomputer/dice-util", branch = "jhendricks/update-sled-agent-types-versions", features = ["sled-agent"] }
+dice-verifier = { git = "https://github.com/oxidecomputer/dice-util", features = ["sled-agent"] }
+vm-attest = { git = "https://github.com/oxidecomputer/vm-attest", rev = "a7c2a341866e359a3126aaaa67823ec5097000cd", default-features = false }


most of the Cargo.lock weirdness from dice-verifier -> sled-agent-client -> omciron-common (some previous rev) and that's where the later API dependency stuff we saw in Omicron comes up when building the tuf. sled-agent-client re-exports items out of propolis-client which means we end up in a situation where propolis-server depends on a different rev of propolis-client and everything's Weird.

i'm not totally sure what we want or need to do about this, particularly because we're definitely not using the propolis-client-related parts of sled-agent! we're just using one small part of the API for the RoT calls. but sled-agent and propolis are (i think?) updated in the same deployment unit so the cyclic dependency is fine.

this fixes issues (read: panics) related to AttestSledAgent's internal `rt`, block_on, and dropping.

actually stop the `AttestationSock` when we stop other Propolis devices/backends, and along the way `tcp_attest` -> `attest_handle`.

jordanhendricks · 2026-04-02T00:09:52Z

I want to add some comments in the attestation module but from a code-structure perspective @iximeow and I are happy with this. Ready for review!

bin/propolis-server/src/main.rs

hawkw

Some of the Tokio stuff felt a bit awkward here --- I'd be happy to open a PR against this branch changing some of the things I mentioned, if that's easier for you?

bin/propolis-server/src/lib/initializer.rs

hawkw · 2026-04-02T20:11:57Z

bin/propolis-server/src/lib/initializer.rs

not super important but this string could be better probably

done in 014950e

bin/propolis-server/src/lib/initializer.rs

hawkw · 2026-04-02T20:16:14Z

bin/propolis-server/src/lib/initializer.rs

+                    Some(backend.clone_volume())
+                } else {
+                    // Disk must be read-only to be used for attestation.
+                    slog::info!(self.log, "boot disk is not read-only");


maybe this should explicitly state that this means it will not be attested?

took a crack at this in 014950e

hawkw · 2026-04-02T20:35:26Z

lib/propolis/src/attestation/server.rs

+#[derive(Debug)]
+enum AttestationInitState {
+    Preparing {
+        vm_conf_send: oneshot::Sender<VmInstanceConf>,
+    },
+    /// A transient state while we're getting the initializer ready, having
+    /// taken `Preparing` and its `vm_conf_send`, but before we've got a
+    /// `JoinHandle` to track as running.
+    Initializing,
+    Running {
+        init_task: JoinHandle<()>,
+    },
+}
+
+/// This struct manages providing the requisite data for a corresponding
+/// `AttestationSock` to become fully functional.
+pub struct AttestationSockInit {
+    log: slog::Logger,
+    vm_conf_send: oneshot::Sender<VmInstanceConf>,
+    uuid: uuid::Uuid,
+    volume_ref: Option<crucible::Volume>,
+}
+
+impl AttestationSockInit {
+    /// Do any any remaining work of collecting VM RoT measurements in support
+    /// of this VM's attestation server.
+    pub async fn run(self) {
+        let AttestationSockInit { log, vm_conf_send, uuid, volume_ref } = self;
+
+        let mut vm_conf = vm_attest::VmInstanceConf { uuid, boot_digest: None };
+
+        if let Some(volume) = volume_ref {
+            // TODO(jph): make propolis issue, link to #1078 and add a log line
+            // TODO: load-bearing sleep: we have a Crucible volume, but we can
+            // be here and chomping at the bit to get a digest calculation
+            // started well before the volume has been activated; in
+            // `propolis-server` we need to wait for at least a subsequent
+            // instance start. Similar to the scrub task for Crucible disks,
+            // delay some number of seconds in the hopes that activation is done
+            // promptly.
+            //
+            // This should be replaced by awaiting for some kind of actual
+            // "activated" signal.
+            tokio::time::sleep(std::time::Duration::from_secs(10)).await;
+
+            let boot_digest =
+                match crate::attestation::boot_digest::boot_disk_digest(
+                    volume, &log,
+                )
+                .await
+                {
+                    Ok(digest) => digest,
+                    Err(e) => {
+                        // a panic here is unfortunate, but helps us debug for
+                        // now; if the digest calculation fails it may be some
+                        // retryable issue that a guest OS would survive. but
+                        // panicking here means we've stopped Propolis at the
+                        // actual error, rather than noticing the
+                        // `vm_conf_sender` having dropped elsewhere.
+                        panic!("failed to compute boot disk digest: {e:?}");
+                    }
+                };
+
+            vm_conf.boot_digest = Some(boot_digest);
+        } else {
+            slog::warn!(log, "not computing boot disk digest");
+        }
+
+        let send_res = vm_conf_send.send(vm_conf);
+        if let Err(_) = send_res {
+            slog::error!(
+                log,
+                "attestation server is not listening for its config?"
+            );
+        }
+    }
+}


Soo, it feels a bit funny to me that this thing is a task we spawn that, when it completes, sends a message over a oneshot channel and then exits, and then we have a JoinHandle<()> for that task. It kinda feels like this could just be a JoinHandle<VmInstanceConf> and make a bunch of this at least a bit simpler?

I'd be happy to throw together a patch that does that refactoring if it's too annoying.

That's fair. The JoinHandle was from a previous iteration of how we would structure things that looked more like the way we presently handle the VNC server. I'll take a look at how hard this is to remove.

Since this and also the change in this module that I suggested in #1091 (comment) are kinda just refactoring/tidying things up, I would be fine with leaving a lot of this as-is and then merge some refactoring later --- I'd be happy to open a follow-up PR after this has merged, if that makes life easier for you?

lib/propolis/src/attestation/server.rs

jordanhendricks · 2026-04-03T00:03:28Z

lib/propolis/src/attestation/boot_digest.rs

+        let mut buffer =
+            Buffer::new(this_block_count as usize, block_size as usize);
+
+        // TODO(jph): We don't want to panic in the case of a failed read. How


I still need to do this and test on dublin.

lib/propolis/src/attestation/mod.rs

hawkw

The Crucible retry stuff seems pretty much correct, I commented on some minor nitpicks. I think it's fine to defer some of the async refactoring to a subsequent PR, as there isn't anything wrong there, I just think we could maybe make the code a bit simpler. Beyond that, I think that pending whatever testing you need to do, I have no major concerns.

bin/propolis-server/src/lib/initializer.rs

crates/propolis-config-toml/src/spec.rs

lib/propolis/src/attestation/boot_digest.rs

hawkw · 2026-04-03T23:13:14Z

lib/propolis/src/attestation/boot_digest.rs

+                slog::error!(
+                    log,
+                    "read failed: {e:?}.
+                offset={offset},
+                this_block_cout={this_block_count},
+                block_size={block_size},
+                end_block={end_block}"


super weird formatting here, can we do something about that? also perhaps these ought to be structured fields...

and also, perhaps this ought to include the retry count?

both done in c096720

lib/propolis/src/attestation/boot_digest.rs

Co-authored-by: Eliza Weisman <eliza@elizas.website>

…ailed reads

hawkw

New comments and backoff look good to me, I commented on a couple very tiny nits

hawkw · 2026-04-06T18:28:17Z

lib/propolis/src/attestation/boot_digest.rs

+                    "failed to read boot disk in {n_retries} tries \
+		    aborting hash of boot digest"
+                );


more weird and unpleasant line wrapping (sorry):

Suggested change

"failed to read boot disk in {n_retries} tries \

aborting hash of boot digest"

);

"failed to read boot disk in {n_retries} tries \

aborting hash of boot digest"

);

hawkw · 2026-04-06T18:29:40Z

lib/propolis/src/attestation/boot_digest.rs

+        "hash of volume {:?} took {:?} ms",
+        vol_uuid,


turbo nit, sorry: mayhaps vol_uuid ought to be a structured field here as well? maybe we could build a child log context that includes it so that it's also added to the errors we log above?

hawkw · 2026-04-06T18:30:21Z

lib/propolis/src/attestation/boot_digest.rs

+
+            if let Err(e) = res {
+                error!(log,
+                    "read failed: {e:?}";


i'm not totally sure what the type of e is here, but are we sure that fmt::Debug is the best way to log it? not a huge deal...

lib/propolis/src/attestation/mod.rs

hawkw · 2026-04-06T18:32:45Z

lib/propolis/src/attestation/mod.rs

+//! If there is no boot disk, or the boot disk is not read-only, only the
+//! instance ID is used as identifying data.
+//!
+//! If there is a read-only disk, the attestation server will fail challenge


nit: maybe this should say

Suggested change

//! If there is a read-only disk, the attestation server will fail challenge

//! If there is a read-only boot disk, the attestation server will fail challenge

since currently, this sentence in isolation suggests we would hash any read-only disk...but, this is not a huge deal as i would kind of expect the reader could infer the correct thing from surrounding context...

hawkw · 2026-04-06T18:48:12Z

lib/propolis/src/attestation/mod.rs

+//! there.)
+//!
+//! * Guest software submits a 32-byte nonce to a known attestation port.
+//! * This port is backed by a vsock device in propolis.


turbo nit: maybe:

Suggested change

//! * This port is backed by a vsock device in propolis.

//! - This port is backed by a vsock device in propolis.

hawkw · 2026-04-06T18:49:08Z

lib/propolis/src/attestation/mod.rs

+//!   sent to the attestation server once all of the VM identity data is done
+//!   (so, in practice, when the boot disk is hashed).
+//! * Until the VM conf is ready, the attestation server fails challenges.
+//!   Once the VM conf is ready, these challenges are passed through to the


nit: this feels like maybe it should be its own bullet point?

Suggested change

//! Once the VM conf is ready, these challenges are passed through to the

//! * Once the VM conf is ready, these challenges are passed through to the

jordanhendricks and others added 17 commits March 20, 2026 18:59

something that compiles

f6d25c1

starting to sketch out sled-agent attest code

5dbf46c

mvp attestation??

e12a38f

remove dep on libipcc

6335323

make boot digest parseable

ef01e4b

ready for a racklette spin

591b9f5

paper over async/sync/async bits

4ca28cb

added recv channel for vm conf in attestation server

5f12a78

moved tcp attest server inside of vm objects

b1c710c

remove warning

e4b4a52

start adding boot digest stuff

1c55d2b

might have strung all the needful through propolis-server?

1c6ed47

clippy lints and cargo fmt

14122a2

racklette debug :(

449a3b2

more debugging

19cfbf7

restore 4ca28cb

d89273b

remove todo file from tree

2d0a0e4

iximeow reviewed Mar 27, 2026

View reviewed changes

iximeow and others added 10 commits March 30, 2026 20:36

bump dice-util/vm-attest for AttestAsync

fea9dbb

this fixes issues (read: panics) related to AttestSledAgent's internal `rt`, block_on, and dropping.

enforce read-only boot disk

60c8c04

rev dice-util and vm-attest further

9efdfb6

rev dice-util, vm-attest

b137a90

shuffle things around to be able to reign in a cancelled init task

cf55c6e

halt cleanup

7f84255

actually stop the `AttestationSock` when we stop other Propolis devices/backends, and along the way `tcp_attest` -> `attest_handle`.

cleaning up some todos

776795a

how had i not rebuilt the server...??

9af75aa

testing a phd fix

60935ca

my turn to not compile propolis-server

50c24ff

jordanhendricks marked this pull request as ready for review April 2, 2026 00:08

jordanhendricks requested a review from hawkw April 2, 2026 00:41

jordanhendricks self-assigned this Apr 2, 2026

papertigers reviewed Apr 2, 2026

View reviewed changes

bin/propolis-server/src/main.rs Outdated Show resolved Hide resolved

hawkw reviewed Apr 2, 2026

View reviewed changes

jordanhendricks commented Apr 3, 2026

View reviewed changes

lib/propolis/src/attestation/mod.rs Outdated Show resolved Hide resolved

first round of review feedback: minor things

014950e

jordanhendricks mentioned this pull request Apr 3, 2026

ls-apis CI test didn't fail for new cyclic dependency oxidecomputer/omicron#10214

Closed

jordanhendricks added 2 commits April 3, 2026 12:27

compiling, my bad

71b14da

add retries for crucible reads

2d8818d

hawkw approved these changes Apr 3, 2026

View reviewed changes

jordanhendricks and others added 2 commits April 5, 2026 15:06

nits from eliza (ty!)

38cb234

Co-authored-by: Eliza Weisman <eliza@elizas.website>

final bits of review feedback, comments, add sleep between crucible f…

c096720

…ailed reads

hawkw approved these changes Apr 6, 2026

View reviewed changes

jordanhendricks added 6 commits April 6, 2026 12:04

clean up log todo

e6bbd3a

hopefully resolve merge conflict with master

1ff4e3e

Merge remote-tracking branch 'origin' into jhendricks/rfd-605

26f31f0

more eliza review feedback

786ef27

final bits of review feedback?

0f843dd

fix clippy CI job

a42814d

jordanhendricks merged commit fe47987 into master Apr 6, 2026
12 checks passed

jordanhendricks deleted the jhendricks/rfd-605 branch April 6, 2026 21:26

jordanhendricks mentioned this pull request Apr 6, 2026

VM Attesation Socket Server #1010

Closed

	//! If there is a read-only disk, the attestation server will fail challenge
	//! If there is a read-only boot disk, the attestation server will fail challenge

	//! * This port is backed by a vsock device in propolis.
	//! - This port is backed by a vsock device in propolis.

	//! Once the VM conf is ready, these challenges are passed through to the
	//! * Once the VM conf is ready, these challenges are passed through to the

Conversation

jordanhendricks commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing notes

Happy path

No boot disk

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jordanhendricks commented Apr 2, 2026

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hawkw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jordanhendricks commented Mar 27, 2026 •

edited

Loading