Skip to content

fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122

Open
avrabe wants to merge 3 commits into
mainfrom
fix/issue-121-wasm-to-ir-slot-stack
Open

fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122
avrabe wants to merge 3 commits into
mainfrom
fix/issue-121-wasm-to-ir-slot-stack

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 18, 2026

Summary

Closes #121 — the root architectural fix that PR #117's five rounds of continue patches were skating around. The wasm_to_ir lowering now tracks producer vregs via an explicit `slot_stack: Vec` parallel to `inst_id`, instead of overloading `inst_id` as both the IR id and the wasm-stack-slot index.

This is silicon-priority: the bug Gale reported in the field (wasm modules with Drop/LocalSet/Store mid-stream) is fixed here. PR #117's temporary demotion of `wasm_ops_lower_or_error` to `gating: false` is unaffected (the demote commit was on #117's branch, never reached main), but #117 may now rebase and revert its workflow change once this lands.

What was broken

`OptimizerBridge::wasm_to_ir` used a single `inst_id` counter as both the unique IR-instruction id AND a vreg-slot index. Binary/unary handlers used `inst_id.saturating_sub(N)` to look up operands, assuming a 1:1 correspondence with wasm value stack positions. That assumption broke whenever a wasm op consumed a slot without producing a vreg (Drop, LocalSet, GlobalSet, Stores, control-flow ops...). The result was either:

  • A loud panic at `get_arm_reg` (the defensive guard at line 1670 — what the fuzz harness kept hitting).
  • A silent miscompilation reading whatever stale vreg was bound to the consumed slot (the Gale class — much worse on hardware).

The non-optimized path (`select_with_stack`) was unaffected because it uses a real value stack.

What this PR does

Drive-by fix

`i64_operand_count` was missing the i64 div/rem variants (I64DivS/U, I64RemS/U). The old `inst_id.saturating_sub(4)` math happened to fortuitously work for the existing `i64_div.wast` test due to saturating-sub at slot boundaries; the slot_stack refactor unmasked this as a real `pop()` on an empty stack. Added the four ops to `i64_operand_count` to resolve cleanly.

Tests

`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` — 12 new tests, all passing:

Panic-free shapes (the fuzz-found inputs):

  • `drop_between_producer_and_consumer` — the PR fix(lowering): return Err on stack underflow instead of panic — fuzz #113 #117 round-6 input.
  • `local_set_between_producer_and_consumer`, `global_set_between_producer_and_consumer`, `i32_store_between_producer_and_consumer`.
  • `block_loop_end_between_producer_and_consumer`, `br_if_between_producer_and_consumer`, `local_tee_then_consumer`.
  • `double_drop_then_const`, `mixed_i32_i64_with_drop`, `i64_drop_between_i64_consts`.

Semantic correctness (proving the silent-miscompilation path is fixed, not just the panic path):

  • `drop_preserves_correct_value_for_consumer` — `[const(7), const(11), drop, popcnt]` must operate on 7, not 11. Asserts the Popcnt instruction's src points at const(7)'s slot.
  • `local_set_preserves_correct_value_for_consumer` — same shape with LocalSet instead of Drop.

Full workspace test (excluding synth-verify — z3 network issue): 1041 passing, 0 regressions. The 4 existing AAPCS/i64 regression tests from PR #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass.

Test plan

  • CI green across Test / Clippy / Format / Z3 / Kani / Bazel.
  • Gating fuzz harness `wasm_ops_lower_or_error` passes on the bd4ae7f/120c187/round-6 corpus seeds (it'll run automatically).
  • No regression in existing AAPCS / i64 / i32 selector tests.
  • Once this lands, PR fix(lowering): return Err on stack underflow instead of panic — fuzz #113 #117 can be rebased; its temporary demotion of `wasm_ops_lower_or_error` to non-gating is no longer needed.

Refs

🤖 Generated with Claude Code

`OptimizerBridge::wasm_to_ir` overloaded `inst_id` as both the unique IR
instruction id AND a vreg-slot index, with back-references like
`inst_id.saturating_sub(2)` assuming a one-to-one correspondence with
the wasm value stack. That assumption broke whenever any wasm op
consumed a stack slot without producing one — Drop, LocalSet, GlobalSet,
the i32/i64 store family, BrIf, and the structural Block/Loop/End
markers. The next binary or unary op's back-reference would then index
a stale or never-mapped vreg, and `get_arm_reg` would either trip the
PR #101 defensive panic or (pre-PR-101) silently fall back to R0 — the
silent-miscompilation class first surfaced in issue #93.

Gale (the real-hardware test rig) caught WASM modules in the field
that tripped this on production silicon; the cargo-fuzz
`wasm_ops_lower_or_error` harness on PR #117 surfaced the same class
six different ways (Nop/Unreachable/Return were closed there; Drop,
LocalSet, Store, Block/Loop/End remained until this PR).

Fix: introduce `slot_stack: Vec<u32>` in `wasm_to_ir` that mirrors the
wasm value stack. Each producer pushes its dest vreg onto slot_stack;
each consumer pops to discover its source vreg. `inst_id` reverts to
its original meaning — a monotonically increasing unique IR id — and
is no longer used for slot lookup.

i64 values occupy two consecutive entries on slot_stack (lo first,
then hi), matching the (dest_lo, dest_hi) two-vreg-pair layout already
used by i64 opcodes. I64ExtendI32U/S aliases dest_lo to the consumed
i32 src vreg by IR convention (preserved); I32WrapI64 aliases dest to
src_lo (preserved). Drop becomes an explicit `slot_stack.pop(); continue`
no-IR-emit arm; Nop/Unreachable/Return emit Opcode::Nop with no
slot_stack effect.

Drive-by: `i64_operand_count` was missing I64DivS/I64DivU/I64RemS/
I64RemU (so `analyze_i64_local_gets` failed to mark their i64
operands), which was masked by the same inst_id-slot scrambling.
Added them; the existing i64-div WAST tests now exercise the i64
LocalGet path instead of fortuitously-correct i32 Loads.

The catch-all `_ => Opcode::Nop` is preserved as a bug-finder: unknown
ops do not touch slot_stack, so subsequent consumers fail loudly via
`slot_stack.pop().expect(...)` instead of silently mis-binding vregs.

Regression coverage: new
`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs`
exercises Drop/LocalSet/GlobalSet/Store/BrIf/Block-Loop-End/LocalTee
between producer-and-consumer plus i32 and i64 variants. Two
semantic-correctness probes confirm that Popcnt reads the surviving
stack value (not the dropped one) — proving the fix addresses silent
miscompilation, not just the panic.

Test delta: +12 tests, 0 regressions. The 4 fuzz-related regression
tests from #100/#101/#103/#104 plus the #93 memset/i64-shift tests
all continue to pass.

Refs: issue #121, PR #117 (fuzz-harness reproductions), issue #93
(silent-drop class), PR #101 (defensive panic), PR #100 (fuzz harness).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 69.70588% with 206 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/optimizer_bridge.rs 62.91% 201 Missing ⚠️
crates/synth-core/src/wasm_stack_check.rs 96.35% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

The slot_stack refactor in f0becb6 closes the silent-miscompilation
class that Gale hit on silicon. But the .expect() panics inside the new
slot_stack.pop() sites turn malformed-input cases (which the fuzz
harness can construct) into a different panic class — same defensive
"crash the compiler not the firmware" intent, but the harness contract
is "lower or Err, never panic".

This commit ports #117's pre-flight wasm_stack_check into the #122
branch so malformed inputs return Err *before* reaching the slot_stack
pops. The two layers together give the full guarantee:

  Pre-flight (synth_core::wasm_stack_check::check_no_underflow)
    ↓ rejects malformed wasm before lowering — typed Err
  wasm_to_ir slot_stack model
    ↓ correct semantics for valid wasm — fixes Gale's silicon class
  ir_to_arm
    ↓ unmodified

Includes:
- `crates/synth-core/src/wasm_stack_check.rs` (365 lines, 16 unit
  tests) — copy from PR #117's branch.
- Module declaration in `crates/synth-core/src/lib.rs`.
- Pre-flight call at the top of `OptimizerBridge::optimize_full`
  (`optimizer_bridge.rs:1827`).
- Pre-flight call at the top of `InstructionSelector::select_with_stack`
  (`instruction_selector.rs:3559`).
- `.github/workflows/fuzz-smoke.yml` — ported from #117 with the
  seed_corpus replay step. `wasm_ops_lower_or_error` re-promoted to
  `gating: true` now that the slot_stack model + pre-flight together
  close the panic class.
- `fuzz/seed_corpus/wasm_ops_lower_or_error/seed-pr122-i32sub-empty-stack`
- `fuzz/seed_corpus/wasm_to_ir_roundtrip_op_coverage/seed-pr122-local-tee-empty-stack`
  — the two crash artifacts the harness found on the first #122 CI
  run; now seeded for deterministic replay.

When PR #117 lands first on main, the rebase will deduplicate
wasm_stack_check.rs cleanly (identical files).

Local verification:
- `cargo test -p synth-core wasm_stack` — 16/16 pass.
- `cargo test -p synth-synthesis --test regression_issue_121_slot_stack`
  — 12/12 pass.
- `cargo test --workspace --exclude synth-verify` — 0 regressions.

Refs: issues #121 (Gale silicon), #117 (pre-flight origin).
@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 19, 2026

Pushed 3e3e3b1 — wires #117's pre-flight wasm_stack_check into the slot_stack path. The two layers together close the panic class:

  • Pre-flight rejects malformed inputs (e.g. [I32Sub], [LocalTee(0)]) with typed Err before lowering.
  • slot_stack model handles valid wasm correctly, fixing Gale's silicon miscompilation class.

Also: wasm_ops_lower_or_error re-promoted to gating: true (the temporary demote in #117 was on its own branch — main was never affected; #122 lands with gating back on).

Seeded the two crash artifacts from the first #122 CI run ([I32Sub, I32Const(0)], [LocalTee(0)]) into fuzz/seed_corpus/ for deterministic replay.

When #117 merges first, the rebase will deduplicate wasm_stack_check.rs cleanly (identical content). Both PRs are now self-contained.

Local: 16/16 stack-check + 12/12 regression + 0 workspace regressions.

The slot_stack refactor in f0becb6 introduced 50 `.expect(...)` sites
on `slot_stack.pop()` with the comment "wasm validator + pre-flight
check guarantee stack depth". The pre-flight check (ported in 3e3e3b1)
tracks wasm-value depth, but the slot_stack model uses *slot* depth
where i64 = 2 slots. They diverge for type-mismatched malformed
wasm: `[I32Const, I64Clz]` looks fine at the wasm-value layer (depth
1, I64Clz pops 1 pushes 1) but at the slot layer I64Clz wants 2 slots
and only has 1 → pop on empty → `.expect` panic.

This shape is rejected by the wasm validator in production, but the
fuzz harness constructs it directly bypassing wasmparser. To honor the
harness contract ("lower or Err, never panic"), wasm_to_ir now returns
`Result<(Vec<Instruction>, Cfg)>` and each `slot_stack.pop()` becomes
`.ok_or_else(|| Error::validation(...))?` instead of `.expect(...)`.

48 mechanical site replacements via sed:
  `.expect("wasm validator + pre-flight check guarantee stack depth")`
  →
  `.ok_or_else(|| synth_core::Error::validation(
      "wasm stack underflow in wasm_to_ir (slot_stack pop on empty)"
  ))?`

The function signature changed:
  `fn wasm_to_ir(...) -> (Vec<Instruction>, Cfg)`
  →
  `fn wasm_to_ir(...) -> Result<(Vec<Instruction>, Cfg)>`

Both callers in `optimize_full` and `optimize` propagate via `?`.

Tests:
  * Three new cases in `regression_issue_121_slot_stack.rs`:
    `type_mismatch_i32_then_i64_clz_does_not_panic`,
    `i64_unary_on_empty_stack_does_not_panic`,
    `i64_binary_on_partial_stack_does_not_panic`.
    15 total, all pass.
  * Workspace tests (excluding synth-verify): 0 regressions.

Seed corpus:
  `fuzz/seed_corpus/wasm_ops_lower_or_error/seed-pr122-i64clz-on-i32`
  (16 bytes) — the third #122 CI crash artifact.
@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 19, 2026

Pushed e03dcef — third fix layer. The slot_stack model + pre-flight catch most cases, but type-mismatched malformed wasm (e.g. [I32Const, I64Clz] — I64Clz expects an i64 = 2 slots but only an i32 = 1 slot is on the stack) slips past both: pre-flight tracks wasm-value depth, slot_stack tracks slot count, they diverge for i64 ops.

Fix: wasm_to_ir now returns Result<(Vec<Instruction>, Cfg)>. All 48 slot_stack.pop().expect(...) sites converted to .ok_or_else(|| Error::validation(...))? so underflow on malformed input returns Err instead of panicking. Both callers propagate via ?.

This is the proper structural answer to the "lower or Err, never panic" contract — no more whack-a-mole on individual handler arms. Production callers go through wasmparser which catches the type mismatch upstream; the new Result path is the safety net for direct callers like the fuzz harness.

Tests: 3 new cases (type_mismatch_i32_then_i64_clz, i64_unary_on_empty_stack, i64_binary_on_partial_stack). 15 regression total, all pass. Workspace 0 regressions. Third crash seed committed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wasm_to_ir: inst_id overloaded as both IR-id and vreg-slot — decouple via explicit slot_stack

1 participant