Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Reorder invariant nodes in simple scenarios in stackifier Jitdump when moving nodes in stackifier When regallocwasm creates a new store node, lower it Apply regallocwasm fix from andy Checkpoint Checkpoint Add comment Speculatively implement the dstOnStack optimization (code that hits it doesn't compile yet)
867f769 to
4c8682d
Compare
There was a problem hiding this comment.
Pull request overview
This PR advances the WASM RyuJIT backend’s block-store support by adding cpobj codegen, addressing a stackifier NIY case, and ensuring newly generated stores are lowered post-rewrite to keep the pipeline consistent.
Changes:
- Implement
CodeGen::genCodeForCpObjforGT_STORE_BLKcpobj unrolling on WASM. - Extend WASM regalloc to track multi-use operands for
GT_STORE_BLKand to re-lower stores created byRewriteLocalStackStore. - Improve lowering/stackifier behavior: mark cpobj operands as multiply-used and relax one stackifier NIY by moving invariant nodes.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/regallocwasm.h | Adds CollectReferencesForBlockStore declaration for block store multi-use tracking. |
| src/coreclr/jit/regallocwasm.cpp | Implements block-store reference collection and re-lowers stores created during local-store rewrite. |
| src/coreclr/jit/lowerwasm.cpp | Marks cpobj operands as MultiplyUsed and adjusts stackifier handling for invariant nodes. |
| src/coreclr/jit/gentree.cpp | Improves WASM dump text for native block-store opcodes (memory.copy vs memory.fill). |
| src/coreclr/jit/compiler.h | Adds WasmRegAlloc friend access for invoking lowering post-rewrite. |
| src/coreclr/jit/codegenwasm.cpp | Implements cpobj unrolled copying sequence (load/store vs helper call) and adds pointer-sized instruction aliases. |
Comments suppressed due to low confidence (5)
src/coreclr/jit/codegenwasm.cpp:2368
- The call to
genEmitHelperCall(CORINFO_HELP_ASSIGN_BYREF, ...)is missing the required SP argument.CodeGen::genEmitHelperCallexplicitly notes that for WASM helper calls, the stack-pointer argument must be first on the value stack (below any other args). Not pushing SP here would mismatch the helper signature and can explain the reported crash when calling the write barrier helper.
Suggestion: push GetStackPointerReg() (via local.get) as the first argument before pushing the destination/source byrefs, for each helper call site (or refactor to a small helper that emits the correct argument sequence).
case PackOperAndType(GT_LE, TYP_DOUBLE):
ins = INS_f64_le;
break;
case PackOperAndType(GT_GE, TYP_FLOAT):
src/coreclr/jit/codegenwasm.cpp:2324
genCodeForCpObjcallsgenConsumeRegs(cpObjNode), which only updates liveness for the GT_STORE_BLK node itself and does not consume/update liveness for its operands. This differs from patterns likegenCodeForStoreInd, and can lead to incorrect liveness (and thus wrong code) forAddr()/Data().
Suggestion: consume the destination address and source address/value explicitly (e.g., using genConsumeAddress(dstAddr) and genConsumeRegs(...) in the correct execution order, or genConsumeOperands if appropriate) before emitting the copy sequence, and then do the usual life update for the store node.
// So we can re-express say GT_GE (UN) as !GT_LT
//
src/coreclr/jit/codegenwasm.cpp:2373
gcPtrCountis initialized to the total GC pointer slot count, but it is only decremented in the write-barrier path. WhendstOnStackis true, GC pointer slots are copied via the non-WB load/store path andgcPtrCountwill never reach 0, causing the finalassert(gcPtrCount == 0)to fire in debug builds.
Suggestion: either decrement gcPtrCount whenever layout->IsGCPtr(i) (regardless of dstOnStack), or move the gcPtrCount accounting + assert under the !dstOnStack branch (similar to other target implementations).
instruction ins;
switch (PackOperAndType(op, treeNode->gtOp1->TypeGet()))
{
case PackOperAndType(GT_EQ, TYP_FLOAT):
ins = INS_f32_eq;
break;
case PackOperAndType(GT_EQ, TYP_DOUBLE):
ins = INS_f64_eq;
break;
case PackOperAndType(GT_NE, TYP_FLOAT):
ins = INS_f32_ne;
break;
case PackOperAndType(GT_NE, TYP_DOUBLE):
ins = INS_f64_ne;
break;
case PackOperAndType(GT_LT, TYP_FLOAT):
ins = INS_f32_lt;
break;
case PackOperAndType(GT_LT, TYP_DOUBLE):
ins = INS_f64_lt;
break;
case PackOperAndType(GT_LE, TYP_FLOAT):
ins = INS_f32_le;
break;
case PackOperAndType(GT_LE, TYP_DOUBLE):
ins = INS_f64_le;
break;
case PackOperAndType(GT_GE, TYP_FLOAT):
ins = INS_f32_ge;
break;
case PackOperAndType(GT_GE, TYP_DOUBLE):
ins = INS_f64_ge;
break;
src/coreclr/jit/codegenwasm.cpp:2364
- The offset computation for the write-barrier helper path hard-codes
INS_i32_const/INS_i32_add, but the address locals may bei64whenTARGET_64BIT(and you already abstracted other pointer-sized ops viaINS_I_*). This will produce invalid wasm or incorrect values on 64-bit.
Suggestion: use the pointer-sized const/add instructions (INS_I_const/INS_I_add) and the appropriate emitAttr for the constant (or select i32 vs i64 based on TARGET_64BIT) so the address arithmetic matches the address type.
case PackOperAndType(GT_LT, TYP_DOUBLE):
ins = INS_f64_lt;
break;
case PackOperAndType(GT_LE, TYP_FLOAT):
ins = INS_f32_le;
break;
src/coreclr/jit/codegenwasm.cpp:2310
noway_assert(source->IsLocal())/noway_assert(dstAddr->IsLocal())will hard-fail compilation in non-DEBUG builds if either operand is not a local. Nothing inLowerBlockStoreguaranteesAddr()is a local, and other targets handle arbitrary address expressions here.
Suggestion: remove these noway_asserts and rely on GetMultiUseOperandReg (with MultiplyUsed marking where needed) to support non-local address expressions; if some shapes are truly unsupported for now, prefer an explicit NYI/IMPL_LIMITATION gate instead of a noway_assert in release builds.
genTreeOps op = treeNode->OperGet();
| { | ||
| // Try to use bulk copy helper | ||
| if (TryLowerBlockStoreAsGcBulkCopyCall(blkNode)) | ||
| { | ||
| return; | ||
| } |
There was a problem hiding this comment.
LowerBlockStore marks the block-store source src as Contained unconditionally, but in the doCpObj path it also marks src itself as MultiplyUsed when src is not GT_IND. A contained node won't be emitted as a standalone value, so marking it MultiplyUsed doesn't ensure a local.tee happens, and it can leave codegen (e.g., GetMultiUseOperandReg) reading from an uninitialized temp local.
Suggestion: ensure the MultiplyUsed flag is applied to the actual address/value node that will be emitted (e.g., the GT_IND address child), or uncontain/reshape non-GT_IND sources (such as local struct sources) into an address form that codegen can evaluate once and tee into a temp.
| emit->emitIns_I(INS_local_get, attrDstAddr, WasmRegToIndex(dstReg)); | ||
| emit->emitIns_I(INS_I_const, attrDstAddr, offset); | ||
| emit->emitIns(INS_I_add); | ||
| emit->emitIns_I(INS_local_get, attrSrcAddr, WasmRegToIndex(srcReg)); | ||
| emit->emitIns_I(INS_I_const, attrSrcAddr, offset); | ||
| emit->emitIns(INS_I_add); | ||
| // Call the byref assign helper. On other targets this updates the dst/src regs but here it won't, | ||
| // so we have to do the local.get+i32.const+i32.add dance every time. | ||
| genEmitHelperCall(CORINFO_HELP_ASSIGN_BYREF, 0, EA_PTRSIZE); | ||
| gcPtrCount--; |
There was a problem hiding this comment.
In genCodeForCpObj, the CORINFO_HELP_ASSIGN_BYREF helper call is emitted without pushing the implicit SP argument required by the WASM managed calling convention (see the contract documented in genEmitHelperCall). This will shift the helper arguments and can easily explain the current runtime crash. Push the stack pointer local first (below dst/src) before calling genEmitHelperCall so the helper receives (sp, dst, src) in the expected order.
| /* | ||
| else | ||
| { | ||
| // To resolve this scenario we have two options: | ||
| // 1. We try moving the whole tree rooted at `node`. | ||
| // To avoid quadratic behavior, we first stackify it and collect all the side effects | ||
| // from it. Then we check for interference of those side effects with nodes between | ||
| // 'node' and 'prev'. | ||
| // 2. Failing that, we insert a temporary ('ReplaceWithLclVar') for 'node'. | ||
| // To avoid explosion of temporaries, we maintain a busy/free set of them. | ||
| // For now, for simplicity we are implementing #2 only. | ||
|
|
||
| LIR::Use nodeUse; | ||
| // FIXME-WASM: TryGetUse is inefficient here, replace it with something more optimal | ||
| if (!m_lower->BlockRange().TryGetUse(node, &nodeUse)) | ||
| { | ||
| JITDUMP("node==[%06u] prev==[%06u]\n", Compiler::dspTreeID(node), | ||
| Compiler::dspTreeID(prev)); NYI_WASM("Could not get a LIR::Use for the node to be moved by the | ||
| stackifier"); | ||
| } | ||
|
|
||
| unsigned lclNum = nodeUse.ReplaceWithLclVar(m_lower->m_compiler); | ||
| GenTree* newNode = nodeUse.Def(); | ||
| JITDUMP("Stackifier replaced node [%06u] with lcl var %u\n", Compiler::dspTreeID(node), lclNum); | ||
| m_lower->BlockRange().Remove(newNode); | ||
| m_lower->BlockRange().InsertAfter(prev, newNode); | ||
| JITDUMP("Stackifier moved new node [%06u] after [%06u]\n", Compiler::dspTreeID(newNode), | ||
| Compiler::dspTreeID(prev)); break; | ||
| } | ||
| */ |
There was a problem hiding this comment.
This large block of commented-out code (lines 548-577) should either be removed or converted to a proper TODO comment with a tracking issue. Leaving this much implementation detail in comments makes the code harder to maintain and review. If this is experimental code for future work, consider moving it to a design document or tracking issue instead.
| { | ||
| // Copy the pointer-sized non-gc-pointer slots one at a time (and GC pointer slots if the destination is stack) | ||
| // using regular I-sized load/store pairs. | ||
| if (dstOnStack || !layout->IsGCPtr(i)) |
There was a problem hiding this comment.
The condition dstOnStack at line 2491 is unreachable because of the assertion assert(!dstOnStack) at line 2454. The assertion ensures that genCodeForCpObj is only called when the destination is on the heap (dstOnStack is false). Either remove the dstOnStack || part of this condition since it will always be false, or reconsider whether the assertion at line 2454 is correct. Looking at the lowering code (lowerwasm.cpp:249-250), destinations on the stack should use BlkOpKindNativeOpcode (memory.copy) instead of BlkOpKindCpObjUnroll, so the assertion appears correct.
| // Copy the pointer-sized non-gc-pointer slots one at a time (and GC pointer slots if the destination is stack) | ||
| // using regular I-sized load/store pairs. |
There was a problem hiding this comment.
The comment mentions "and GC pointer slots if the destination is stack" but this scenario is unreachable due to the assertion at line 2454 which ensures dstOnStack is always false in this function. Update the comment to accurately reflect the actual behavior: only non-GC-pointer slots are copied using regular load/store pairs.
| // Copy the pointer-sized non-gc-pointer slots one at a time (and GC pointer slots if the destination is stack) | |
| // using regular I-sized load/store pairs. | |
| // Copy the pointer-sized non-GC-pointer slots one at a time using regular I-sized load/store pairs. | |
| // GC pointer slots are handled via the byref assign helper in the else-branch below. |
| offset += TARGET_POINTER_SIZE; | ||
| } | ||
|
|
||
| assert(dstOnStack || (gcPtrCount == 0)); |
There was a problem hiding this comment.
The dstOnStack || part of this assertion is always false due to the assertion at line 2454 (assert(!dstOnStack)). Simplify this to assert(gcPtrCount == 0) since we know dstOnStack is always false here.
| assert(dstOnStack || (gcPtrCount == 0)); | |
| assert(gcPtrCount == 0); |
This is sufficient to compile the following to valid WASM (It crashes when attempting to call the write barrier helper):
And the following just plain works, since copies to the stack don't need a write barrier:
The following doesn't work yet due to containable memops (will be addressed separately):