Reproduce/issue 485#838
Conversation
Codex Review该评论由 review 机器人自动更新。
SummaryPR #838 存在一个确定的构建阻断问题;另外,issue #485 的 workaround 只覆盖了 Findings
workaround 只在
这个 PR 改动了所有 |
There was a problem hiding this comment.
Code Review
This pull request introduces the PTOUnrollSIMTFor pass to unroll small constant-trip-count scf.for loops inside pto.simt_entry functions, addressing a bug (Issue #485) where structured control flow can lead to predicated END instructions in the AICore backend. Feedback on the implementation highlights a critical issue in PTOUnrollSIMTForPass.cpp where using applyPatternsAndFoldGreedily alongside loopUnrollByFactor can corrupt the greedy driver's internal state due to direct IR mutations. It is recommended to replace the pattern-based rewrite with a post-order walk-based unroller directly in runOnOperation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| namespace { | ||
|
|
||
| struct UnrollSIMTForPattern : public OpRewritePattern<scf::ForOp> { | ||
| UnrollSIMTForPattern(MLIRContext *ctx, int64_t maxTripCount) | ||
| : OpRewritePattern(ctx), maxTripCount(maxTripCount) {} | ||
|
|
||
| LogicalResult matchAndRewrite(scf::ForOp forOp, | ||
| PatternRewriter &rewriter) const override { | ||
| // Only apply inside SIMT entry functions. | ||
| auto func = forOp->getParentOfType<func::FuncOp>(); | ||
| if (!func || !isSIMTEntry(func)) | ||
| return failure(); | ||
|
|
||
| // Check explicit annotation first. | ||
| if (hasUnrollFullAttr(forOp)) { | ||
| return unrollFull(forOp, rewriter, /*isAnnotated=*/true); | ||
| } | ||
|
|
||
| // Auto-detect: constant bounds and small trip count. | ||
| auto tripCount = computeConstantTripCount(forOp); | ||
| if (!tripCount || *tripCount > maxTripCount) | ||
| return failure(); | ||
|
|
||
| return unrollFull(forOp, rewriter, /*isAnnotated=*/false); | ||
| } | ||
|
|
||
| private: | ||
| LogicalResult unrollFull(scf::ForOp forOp, PatternRewriter &rewriter, | ||
| bool isAnnotated) const { | ||
| auto tripCount = computeConstantTripCount(forOp); | ||
| if (!tripCount || *tripCount <= 0) | ||
| return failure(); | ||
|
|
||
| LLVM_DEBUG(llvm::dbgs() | ||
| << "PTOUnrollSIMTFor: unrolling scf.for " | ||
| << (isAnnotated ? "(annotated)" : "(auto)") << " tripCount=" | ||
| << *tripCount << " at " << forOp.getLoc() << "\n"); | ||
|
|
||
| // loopUnrollByFactor returns failure if the loop carries iteration | ||
| // arguments that have uses outside the loop (live-out values). In that | ||
| // case we cannot unroll. | ||
| if (failed(loopUnrollByFactor(forOp, static_cast<uint64_t>(*tripCount)))) | ||
| return failure(); | ||
|
|
||
| return success(); | ||
| } | ||
|
|
||
| int64_t maxTripCount; | ||
| }; | ||
|
|
||
| } // namespace | ||
|
|
||
| // --------------------------------------------------------------------------- | ||
| // Pass definition | ||
| // --------------------------------------------------------------------------- | ||
|
|
||
| namespace { | ||
|
|
||
| struct PTOUnrollSIMTFor : public pto::impl::PTOUnrollSIMTForBase<PTOUnrollSIMTFor> { | ||
| using pto::impl::PTOUnrollSIMTForBase< | ||
| PTOUnrollSIMTFor>::PTOUnrollSIMTForBase; | ||
|
|
||
| void runOnOperation() override { | ||
| func::FuncOp func = getOperation(); | ||
| if (!isSIMTEntry(func)) | ||
| return; | ||
|
|
||
| MLIRContext *ctx = &getContext(); | ||
| RewritePatternSet patterns(ctx); | ||
| patterns.add<UnrollSIMTForPattern>(ctx, maxTripCount); | ||
|
|
||
| GreedyRewriteConfig config; | ||
| config.maxIterations = 10; // loops may nest | ||
| config.strictMode = GreedyRewriteStrictness::ExistingOps; | ||
|
|
||
| if (failed(applyPatternsAndFoldGreedily(func, std::move(patterns), config))) | ||
| signalPassFailure(); | ||
| } | ||
| }; | ||
|
|
||
| } // namespace |
There was a problem hiding this comment.
Dangerous IR Mutation in Greedy Pattern Rewrite Driver
Using applyPatternsAndFoldGreedily with an OpRewritePattern that calls loopUnrollByFactor is highly dangerous and can lead to compiler crashes or undefined behavior in MLIR.
Why this is an issue:
- Greedy Driver Contracts: The greedy pattern rewrite driver relies on the
PatternRewriterto track all IR modifications (such as operation creation, erasure, and replacement) to keep its internal worklist and dead-node tracking structures consistent. - Direct Mutation:
loopUnrollByFactordirectly mutates the IR (erasing thescf.forloop and its blocks) without using or notifying aPatternRewriter. - State Corruption: When an
OpRewritePatternperforms direct IR mutations outside the rewriter, the greedy driver's internal state becomes corrupted, leading to use-after-free bugs or assertion failures (especially in debug builds).
Solution:
Instead of a pattern-based approach with applyPatternsAndFoldGreedily, implement a simple post-order walk-based unroller directly in runOnOperation. Since walk defaults to post-order (innermost first), we can safely collect and unroll nested loops without invalidating outer loop pointers or corrupting any driver state.
namespace {
struct PTOUnrollSIMTFor : public pto::impl::PTOUnrollSIMTForBase<PTOUnrollSIMTFor> {
using pto::impl::PTOUnrollSIMTForBase<
PTOUnrollSIMTFor>::PTOUnrollSIMTForBase;
void runOnOperation() override {
func::FuncOp func = getOperation();
if (!isSIMTEntry(func))
return;
// Collect all loops in post-order (innermost first) to safely unroll nested loops.
SmallVector<scf::ForOp> loops;
func.walk([&](scf::ForOp forOp) {
loops.push_back(forOp);
});
for (scf::ForOp forOp : loops) {
// Check explicit annotation first.
if (hasUnrollFullAttr(forOp)) {
(void)unrollFull(forOp, /*isAnnotated=*/true);
continue;
}
// Auto-detect: constant bounds and small trip count.
auto tripCount = computeConstantTripCount(forOp);
if (tripCount && *tripCount <= maxTripCount) {
(void)unrollFull(forOp, /*isAnnotated=*/false);
}
}
}
private:
LogicalResult unrollFull(scf::ForOp forOp, bool isAnnotated) const {
auto tripCount = computeConstantTripCount(forOp);
if (!tripCount || *tripCount <= 0)
return failure();
LLVM_DEBUG(llvm::dbgs()
<< "PTOUnrollSIMTFor: unrolling scf.for "
<< (isAnnotated ? "(annotated)" : "(auto)") << " tripCount="
<< *tripCount << " at " << forOp.getLoc() << "\\n");
if (failed(loopUnrollByFactor(forOp, static_cast<uint64_t>(*tripCount))))
return failure();
return success();
}
};
} // namespaceIssue: mouliangyu#485 When a pto.simt_entry function contains scf.for + scf.if with constant conditions (e.g. for i in 0..16: if i < 10: store), the AICore backend may emit a predicated END instruction, violating SIMTVF constraints. This pass unrolls small constant-trip-count scf.for loops inside pto.simt_entry functions, then relies on SCCP + canonicalize to constant-fold the induction-variable-dependent scf.if branches, producing straight-line code without divergent control flow. Two strategies: 1. Explicit: scf.for {pto.unroll = "full"} — always unrolls 2. Automatic: constant trip count <= max-trip-count (default 64) Pass runs in prepareVPTOForEmission, before SCF->CF lowering. Co-Authored-By: Claude <noreply@anthropic.com>
2e0bbe2 to
89a33e7
Compare
Zhendong404
left a comment
There was a problem hiding this comment.
如果已知这个循环次数是常量,用for static_range应该就可以在DSL层展平控制流。
而且我感觉这个PR不能真正解决问题,只是这个场景被规避了,如果真的有动态for循环+if怎么办
if (runtime condition):
// do something
else:
for i in static_range(n):
// do other things如果是runtime才能确定if分支执行路径,我理解else中的循环无法展开,就需要PTOAS在编译时展开。 |
|
|
||
| .simt | ||
| // .weak simt_vf_0_simt_entry | ||
| .p2align 3 |
04d8a0c to
89a33e7
Compare
新增
PTOUnrollSIMTForpass,规避 :SIMTVF kernel 中scf.for+scf.if常量分支导致 AICore 后端生成带谓词的END @!P0。根因
SCF→CF→LLVM lowering 后
ret自然在cond_br可达的 block 中,IR 本身正确。Bisheng AICore 后端未正确处理SimtEntrycalling convention,给END加上了谓词。方案
将
pto.simt_entry函数内常量scf.for完全展开,由下游 SCCP + canonicalize 自动消除scf.if分支,生成无分支直线代码。{pto.unroll = "full"}:无条件展开--max-trip-count调整)涉及文件
lib/PTO/Transforms/PTOUnrollSIMTForPass.cpp— pass 实现(新增)include/PTO/Transforms/Passes.td— 定义include/PTO/Transforms/Passes.h— 声明lib/PTO/Transforms/CMakeLists.txt— 注册tools/ptoas/ptoas.cpp— 接入prepareVPTOForEmission