Skip to content

JIT: Inlining via splitting#124870

Draft
jakobbotsch wants to merge 16 commits intodotnet:mainfrom
jakobbotsch:inlining-via-splitting
Draft

JIT: Inlining via splitting#124870
jakobbotsch wants to merge 16 commits intodotnet:mainfrom
jakobbotsch:inlining-via-splitting

Conversation

@jakobbotsch
Copy link
Member

Reviving my prototype as I have some runtime async work that this would simplify...

Copilot AI review requested due to automatic review settings February 25, 2026 19:59
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 25, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR revives/implements an alternative inlining pipeline in the CoreCLR JIT based on splitting trees/blocks around inline candidates (instead of using GT_RET_EXPR placeholders), with follow-on updates across importer, inliner, and various optimizations that previously depended on GT_RET_EXPR.

Changes:

  • Remove GT_RET_EXPR from the IR and update related utilities (node lists, cloning, side-effect checks, class-handle queries, etc.).
  • Rework inlining to inline directly from GT_CALL sites using statement/tree splitting plus new setup/teardown statement list plumbing.
  • Update guarded devirtualization/fat calli transformations, struct return/retbuf handling, and instrumentation to work without GT_RET_EXPR.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/coreclr/jit/simd.cpp Stop treating GT_RET_EXPR as a call-like SIMD stack value; normalize only GT_CALL.
src/coreclr/jit/lclvars.cpp Add a null-layout guard to struct promotion eligibility checks.
src/coreclr/jit/inline.h Adjust inline policy/debug hooks, InlineResult ctor signature, introduce InlineIRResult, add StatementListBuilder, track retbuf arg info.
src/coreclr/jit/inline.cpp Wire InlineResult construction to accept an InlineContext* and update policy notes accordingly.
src/coreclr/jit/indirectcalltransformer.cpp Refactor call discovery and operand spilling; handle STORE_LCL_VAR(call) statement shapes.
src/coreclr/jit/importervectorization.cpp Remove GT_RET_EXPR-based span/call handling paths.
src/coreclr/jit/importercalls.cpp Rework importer handling for inline/GDV candidates without GT_RET_EXPR; adjust some intrinsic patterns and candidate bookkeeping.
src/coreclr/jit/importer.cpp Remove various GT_RET_EXPR-specific normalization/spill/return handling paths; adapt retbuf and inline return plumbing.
src/coreclr/jit/gtstructs.h Remove RetExpr node struct mapping.
src/coreclr/jit/gtlist.h Remove GTNODE(RET_EXPR, ...) from the node list.
src/coreclr/jit/gentree.h Remove GenTreeRetExpr definition.
src/coreclr/jit/gentree.cpp Remove GT_RET_EXPR handling across layout queries, cloning constraints, leaf display, side-effect logic; extend gtSplitTree signature/behavior.
src/coreclr/jit/fgstmt.cpp Add helpers to splice whole statement lists into blocks (before/at end).
src/coreclr/jit/fgprofile.cpp Instrumentation temporarily links inlinee “return” IR using InlineIRResult instead of GT_RET_EXPR.
src/coreclr/jit/fgopt.cpp Simplify async-call scanning by removing GT_RET_EXPR recursion.
src/coreclr/jit/fginline.cpp Major rewrite: new walker that inlines from GT_CALL via splitting + statement/block insertion; adds setup/teardown list building.
src/coreclr/jit/fgbasic.cpp Add fgSplitBlockBeforeStatement helper.
src/coreclr/jit/debuginfo.cpp Disable (comment out) debug-info validation checks.
src/coreclr/jit/compiler.hpp Remove GT_RET_EXPR from operand visitation cases.
src/coreclr/jit/compiler.h Update signatures (gtSplitTree, inline helpers), add friendship and statement-list splicing APIs.
Comments suppressed due to low confidence (2)

src/coreclr/jit/inline.cpp:636

  • The comment for InlineResult::InlineResult still describes a "stmt" parameter, but the constructor now takes an InlineContext* instead. Update the parameter documentation to reflect the new signature to avoid confusion for future call sites.
// Arguments:
//   compiler      - the compiler instance examining a call for inlining
//   call          - the call in question
//   stmt          - statement containing the call (if known)
//   description   - string describing the context of the decision

src/coreclr/jit/importer.cpp:405

  • In impAppendStmt, the inner if (call->ShouldHaveRetBufArg()) is redundant because it's nested under if (call->TypeIs(TYP_VOID) && call->ShouldHaveRetBufArg()). As written, the else branch is unreachable; consider simplifying to a single path (or, if the two branches were meant to distinguish different shapes, adjust the condition so both are reachable as intended).
            if (call->TypeIs(TYP_VOID) && call->ShouldHaveRetBufArg())
            {
                GenTree* retBuf;
                if (call->ShouldHaveRetBufArg())
                {
                    assert(call->gtArgs.HasRetBuffer());
                    retBuf = call->gtArgs.GetRetBufferArg()->GetNode();
                }
                else
                {
                    assert(!call->gtArgs.HasThisPointer());
                    retBuf = call->gtArgs.GetArgByIndex(0)->GetNode();
                }

Copilot AI review requested due to automatic review settings February 25, 2026 21:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (3)

src/coreclr/jit/inline.cpp:1314

  • Line 1314 contains a commented-out DebugInfo validation call that appears to be intentionally disabled. This should either be removed if no longer needed, or uncommented if the validation is important. The comment doesn't explain why it's disabled, which makes it unclear whether this is temporary debugging code or intentionally disabled for a reason.
    // DebugInfo(parentContext, context->m_Location).Validate();

src/coreclr/jit/indirectcalltransformer.cpp:997

  • On line 996, lvSingleDef is unconditionally set to false when doesReturnValue is true, but this is done after potentially demoting the call to non-inline in the if block above (lines 968-978). If the call was demoted and the statement was already added, we're still modifying lvSingleDef. While this may be correct (since multiple defs will exist after GDV transformation), the logic would be clearer if this was moved inside the else block starting at line 980 where we know the call remains an inline candidate.
            if (doesReturnValue)
            {
                compiler->lvaGetDesc(stmt->GetRootNode()->AsLclVarCommon())->lvSingleDef = false;
            }

src/coreclr/jit/importer.cpp:789

  • On line 785, the condition checks !srcCall->IsInlineCandidate() before marking the local as DNER. However, this means that if a call IS an inline candidate, the local won't be marked DNER even though it's being used as a hidden buffer struct arg. If the inline later fails, the local may still be in an enregisterable state which could cause issues. The old code didn't have this condition. Consider whether this check is correct or if it should be removed.
            if (destAddr->OperIs(GT_LCL_ADDR) && !srcCall->IsInlineCandidate())
            {
                lvaSetVarDoNotEnregister(destAddr->AsLclVarCommon()->GetLclNum()
                                             DEBUGARG(DoNotEnregisterReason::HiddenBufferStructArg));
            }

Comment on lines +594 to +599
// TODO: move to InlineInfo
struct InlineIRResult
{
GenTree* substExpr;
BasicBlock* substBB;
};
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment "TODO: move to InlineInfo" on line 594 suggests this is incomplete work. The InlineIRResult struct should likely be moved into InlineInfo rather than being a separate struct that's embedded in InlineCandidateInfo. This would make the structure more logical since the result is specific to each inline attempt.

Copilot uses AI. Check for mistakes.
{
uint64_t cns1 = static_cast<uint64_t>(op1->AsIntConCommon()->LngValue());
result = gtNewLconNode(BitOperations::RotateLeft(cns1, cns2));
result = gtNewLconNode((int64_t)BitOperations::RotateLeft(cns1, cns2));
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast to (int64_t) on line 6221 changes the semantics from the original code which returned uint64_t directly. BitOperations::RotateLeft returns uint64_t but it's being explicitly cast to int64_t before being passed to gtNewLconNode. While this may work due to bit patterns, it's semantically incorrect and could cause issues with sign extension or interpretation. The original code relied on implicit conversion.

Suggested change
result = gtNewLconNode((int64_t)BitOperations::RotateLeft(cns1, cns2));
result = gtNewLconNode(BitOperations::RotateLeft(cns1, cns2));

Copilot uses AI. Check for mistakes.
{
uint64_t cns1 = static_cast<uint64_t>(op1->AsIntConCommon()->LngValue());
result = gtNewLconNode(BitOperations::RotateRight(cns1, cns2));
result = gtNewLconNode((int32_t)BitOperations::RotateRight(cns1, cns2));
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast to (int32_t) on line 6270 is incorrect. BitOperations::RotateRight returns uint64_t for the long case, but it's being cast to int32_t (truncating the upper 32 bits) before being passed to gtNewLconNode which expects an int64_t. This will silently lose data. It should be (int64_t) to match the signature and preserve all bits.

Suggested change
result = gtNewLconNode((int32_t)BitOperations::RotateRight(cns1, cns2));
result = gtNewLconNode(static_cast<int64_t>(BitOperations::RotateRight(cns1, cns2)));

Copilot uses AI. Check for mistakes.
Comment on lines +350 to +355
GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr;
auto getComplexity = [](GenTree* tree) {
return 1;
};
if ((substExpr != nullptr) && substExpr->IsValue() &&
m_compiler->gtComplexityExceeds(substExpr, 16, getComplexity))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda getComplexity on line 351-353 always returns 1 regardless of the tree parameter. This makes the complexity check on line 355 effectively just count nodes up to 16. This seems like incomplete or placeholder code that should either be removed (to use the default complexity metric) or properly implemented to assess actual tree complexity.

Suggested change
GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr;
auto getComplexity = [](GenTree* tree) {
return 1;
};
if ((substExpr != nullptr) && substExpr->IsValue() &&
m_compiler->gtComplexityExceeds(substExpr, 16, getComplexity))
GenTree* substExpr = call->gtInlineCandidateInfo->result.substExpr;
if ((substExpr != nullptr) && substExpr->IsValue() &&
m_compiler->gtComplexityExceeds(substExpr, 16))

Copilot uses AI. Check for mistakes.
Comment on lines +667 to +670
auto getComplexity = [](GenTree* tree) {
return 1;
};
if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16, getComplexity))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lambda getComplexity on lines 667-669 (same as lines 351-353) always returns 1 regardless of the tree parameter. This makes the complexity check effectively just count nodes up to 16. This appears to be placeholder code that should either be removed to use default complexity metrics or properly implemented.

Suggested change
auto getComplexity = [](GenTree* tree) {
return 1;
};
if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16, getComplexity))
if (m_compiler->gtComplexityExceeds(m_statement->GetRootNode(), 16))

Copilot uses AI. Check for mistakes.
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 25, 2026

One source of regressions is that for stfld we create IR of the shape

[000011] nACXG---R--                           STORE_BLK struct<System.TimeSpan, 8> (copy)
[000010] ---X-------                         ├──▌  FIELD_ADDR byref  <unknown class>:<unknown field>
[000006] -----------                           └──▌  LCL_VAR   ref    V00 this         
[000009] I-C-G------                         └──▌  CALL      struct System.TimeSpan:FromSeconds(long):System.TimeSpan (exactContextHandle=0x00007FF94C1F7CE1)
[000008] ----------- arg0                       └──▌  CNS_INT   long   10

Note GTF_EXCEPT on [000010]. That's clearly wrong when [000009] is the data that came from the IL stack and should be evaluated before [000010].

We make up for it in morph since we usually fold the NRE side effect of the address computation into the store itself, and when we do that we end up evaluating the call first. But when the call is an inline candidate we now end up spilling [000010] when we split, whereas before it would be a RET_EXPR and the rest of the inlinee statements would end up before the entire store.

Some kind of fix is needed here (as a separate PR). We could set GTF_REVERSE_OPS on the store, but that is not something we typically do during import.

(this issue is similar to #77650)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants