cranelift: Fix safepoint stack slot reuse for loop-invariant values#13480
cranelift: Fix safepoint stack slot reuse for loop-invariant values#13480vouillon wants to merge 2 commits into
Conversation
The safepoint spiller allocates each needs-stack-map value's stack slot on demand during a reverse walk of the function's blocks: the slot is created the first time the walk sees the value as an instruction operand or in a safepoint's live set. This is required for loop-invariant values. Since they are defined outside the loop and not passed as block parameters, the reverse walk of the block's instructions might not otherwise be aware of their liveness at the block's exit. Without this, a block-local definition can free its slot onto the free list. That slot is then incorrectly reclaimed by the loop-invariant value later in the reverse walk, leading to shared storage and data corruption. Loop-carried values (passed as block parameters) do not have this issue because their next-iteration values are passed as arguments to the back-edge branch, making them visible to the walk at the block's exit.
The safepoint spiller walks instructions in reverse and allocates each needs-stack-map value's stack slot lazily, at its first mention as an instruction operand or safepoint live-set entry. This is required for loop-invariant values. Since they are defined outside the loop and not passed as block parameters, the reverse walk of the block's instructions might not otherwise be aware of their liveness at the block's exit. Without this, a block-local definition can free its slot onto the free list. That slot is then incorrectly reclaimed by the loop-invariant value later in the reverse walk, leading to shared storage and data corruption. Loop-carried values (passed as block parameters) do not have this issue because their next-iteration values are passed as arguments to the back-edge branch, making them visible to the walk at the block's exit. Before processing each block, reserve a slot for every value alive at both block entry and exit that is also live across at least one safepoint. The slot is then in place before any def in the block can free a slot for reuse.
| // This is required for loop-invariant values. Since they are | ||
| // defined outside the loop and not passed as block parameters, | ||
| // the reverse walk of the block's instructions might not otherwise | ||
| // be aware of their liveness at the block's exit. |
There was a problem hiding this comment.
This seems like it's papering over a deeper issue where our liveness computation is not sound -- I don't think that an ad-hoc conservative "reserve for all values" just for loops is the right answer. cc @fitzgen as you're more familiar with the design intent here?
There was a problem hiding this comment.
I have checked the change on a 273 KiB GC-heavy Wasm binary. This adds 13 slots (from 2154 slots), and 10 stack_addr instructions (from 3047), spread across 10 of 871 functions.
Note that we do not reserve for all values but only for the through-block values the instruction walk can't see. I don't see how to improve on this without significant changes. The current algorithm is a greedy linear scan, which is already not optimal, so I think a simple, obviously-sound reservation beats a more precise but heavier scheme.
There was a problem hiding this comment.
This seems like it's papering over a deeper issue where our liveness computation is not sound
Yeah, we should be seeing these values as live across the whole loop.
The current algorithm is a greedy linear scan
This is just the rewrite phase, which happens after we do a full fixpoint loop for the liveness analysis itself, which is what Chris is referring to in his comment. The intention is that the initial fixpoint loop inside the liveness analysis should recognize that these values are live across the whole loop.
I don't have anything else to say here yet except that the claimed behavior (I haven't verified anything yet) is surprising, perplexing, and a bit scary. Will dig in some more later today/tomorrow when I get time.
There was a problem hiding this comment.
I think the liveness analysis is sound, and the fix relies on the fact that loop-invariants are live across the whole loop.
The issue is in the rewrite phase: it reconstructs each value's live range from what it encounters during the backward walk (operand uses and safepoint live-sets), not from live_in/live_out. So it has no signal that a value is live across a back edge unless the value is mentioned there. Loop-carried values are fine, because their next-iteration value is passed as a branch argument on the back edge. But a loop-invariant value isn't passed as an argument, so if there is no safepoint between the back edge and the next slot-freeing def, and the backward walk has not yet seen any use of the value, nothing prevents the two values from sharing the same slot.
This fixes a bug in the safepoint spiller, reported in #13461 (comment), where two overlapping needs-stack-map values could end up sharing the same stack slot, leading to memory and GC tracking corruption.
The Problem
The safepoint spiller walks instructions in reverse (post-order) and allocates stack slots lazily when a value is first encountered as an operand or a safepoint live-set entry.
Since loop invariants are defined outside the loop and not passed as block parameters, they are not referenced by the loop's back-edge branch instructions. This means that during the reverse walk of the block, the spiller is initially unaware of the loop-invariant value.
If a block-local variable defined via a non-safepoint instruction is processed first in the reverse walk, its slot is freed onto the free list. When the spiller continues backward and eventually hits the loop-invariant's use at a safepoint, the lazy allocation pops that recently-freed slot from the list. This leaves the two overlapping values sharing the same storage. When the block-local value is written, it clobbers the loop-invariant value, leading to stale data on the next loop iteration.
The Fix
To address this, the spiller now performs a pre-reservation pass before processing the instructions of each block. It identifies all values that propagate through the block (present in both
live_inandlive_out) and are live across at least one safepoint, and immediately allocates stack slots for them. This ensures their slots are already occupied and protected before any instruction definitions inside the block are processed and freed for reuse.Verification
This fix has been verified by adding the regression test
cond_loop_with_body_local_pre_reserves_loop_invariantincranelift/frontend/src/frontend/safepoints.rsthat reproduces the slot-sharing issue and verifies the fix.