Skip to content

Commit f6f2724

Browse files
committed
nvme: deadlock enabling doorbell buffer while doing I/O
9a3a93a introduced a bug in the previously-okay SubQueue::pop; before, acc_mem.access() would go through MemAccessor, acquire a reference on the underlying MemCtx (an Arc refcount bump), drop the lock, and continue. immediately after, SubQueue::pop would lock the SubQueue's state and actually perform the pop. this ordering is backwards from set_db_buf, which locks the SubQueue's state *then* conditionally performs acc_mem.access() if the buffer is set up as part of an import. after 9a3a93a, SubQueue::pop uses access_borrowed() to avoid contention on the refcount for MemCtx, the cost of retaining the lock on the SubQueue's MemAccessor node while taking the lock on the same queue's SubQueueState. at this point a concurrent SubQueue::pop and SubQueue::set_db_buf could deadlock; one holds the queue state lock, the other holds the accessor node lock, and both will try to take the other. resolve this tension by picking a consistent ordering for the queue state and acc_mem locks: SubQueueState first, then acc_mem.{access,_borrow}(). this was already the ordering in CompQueue::push and both set_db_buf, so pop() gets the trivial change.
1 parent 29e9cf7 commit f6f2724

1 file changed

Lines changed: 5 additions & 1 deletion

File tree

lib/propolis/src/hw/nvme/queue.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -649,12 +649,16 @@ impl SubQueue {
649649
pub fn pop(
650650
self: &Arc<SubQueue>,
651651
) -> Option<(GuestData<SubmissionQueueEntry>, Permit, u16)> {
652+
// Lock the SubQueueState early to order consistently with MemAccessor;
653+
// in all cases we acquire the queue state lock before working with
654+
// memory.
655+
let mut state = self.state.lock();
656+
652657
let Some(mem) = self.state.acc_mem.access_borrow() else { return None };
653658
let mem = mem.view();
654659

655660
// Attempt to reserve an entry on the Completion Queue
656661
let permit = self.cq.reserve_entry(&self, &mem)?;
657-
let mut state = self.state.lock();
658662

659663
// Check for last-minute updates to the tail via any configured doorbell
660664
// page, prior to attempting the pop itself.

0 commit comments

Comments
 (0)