clear residual tile op for llvm export#829
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces helper functions to clean up residual tile metadata operations before emitting LLVM IR in both VPTOCANN900LLVMEmitter.cpp and VPTOLLVMEmitter.cpp. The review feedback recommends adding defensive null-value checks in peelSingleResultCast to prevent potential crashes. Additionally, it suggests optimizing eraseDeadTileMetadataOps by replacing the multiple full-module walks with a worklist-based dead code elimination (DCE) algorithm to improve compilation performance.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| static Value peelSingleResultCast(Value value) { | ||
| while (auto castOp = value.getDefiningOp<UnrealizedConversionCastOp>()) { | ||
| if (castOp->getNumOperands() != 1 || castOp->getNumResults() != 1) | ||
| break; | ||
| value = castOp.getOperand(0); | ||
| } | ||
| return value; | ||
| } |
There was a problem hiding this comment.
如果传入的 value 为空(例如未初始化的 Value),直接调用 value.getDefiningOp() 可能会导致未定义行为或崩溃。建议在函数开头添加空值检查以增强防御性编程。
static Value peelSingleResultCast(Value value) {
if (!value)
return {};
while (auto castOp = value.getDefiningOp<UnrealizedConversionCastOp>()) {
if (castOp->getNumOperands() != 1 || castOp->getNumResults() != 1)
break;
value = castOp.getOperand(0);
}
return value;
}| static void eraseDeadTileMetadataOps(ModuleOp module) { | ||
| bool changed = true; | ||
| while (changed) { | ||
| changed = false; | ||
| SmallVector<Operation *> opsToErase; | ||
| module.walk([&](pto::SetValidShapeOp op) { | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::GetValidShapeOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::TReshapeOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::AllocTileOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](UnrealizedConversionCastOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
当前的 eraseDeadTileMetadataOps 实现中,在 while (changed) 循环内多次调用 module.walk 遍历整个模块。这种多轮全模块遍历在模块较大时会带来极大的性能开销(时间复杂度为
建议改用基于**工作队列(Worklist)**的死代码消除(DCE)算法。只需在最开始遍历一次模块收集所有初始的死操作,之后在删除操作时,动态地将因失去用户而变死的前驱操作加入工作队列。这样可以将时间复杂度降低到
static void eraseDeadTileMetadataOps(ModuleOp module) {
SmallVector<Operation *> worklist;
module.walk([&](Operation *op) {
if (isa<pto::SetValidShapeOp>(op)) {
worklist.push_back(op);
} else if (isa<pto::GetValidShapeOp, pto::TReshapeOp, pto::AllocTileOp, UnrealizedConversionCastOp>(op)) {
if (op->use_empty()) {
worklist.push_back(op);
}
}
});
while (!worklist.empty()) {
Operation *op = worklist.pop_back_val();
SmallVector<Value> operands(op->getOperands());
op->erase();
for (Value operand : operands) {
if (Operation *defOp = operand.getDefiningOp()) {
if (isa<pto::GetValidShapeOp, pto::TReshapeOp, pto::AllocTileOp, UnrealizedConversionCastOp>(defOp)) {
if (defOp->use_empty()) {
worklist.push_back(defOp);
}
}
}
}
}
}| static Value peelSingleResultCast(Value value) { | ||
| while (auto castOp = value.getDefiningOp<UnrealizedConversionCastOp>()) { | ||
| if (castOp->getNumOperands() != 1 || castOp->getNumResults() != 1) | ||
| break; | ||
| value = castOp.getOperand(0); | ||
| } | ||
| return value; | ||
| } |
There was a problem hiding this comment.
如果传入的 value 为空(例如未初始化的 Value),直接调用 value.getDefiningOp() 可能会导致未定义行为或崩溃。建议在函数开头添加空值检查以增强防御性编程。
static Value peelSingleResultCast(Value value) {
if (!value)
return {};
while (auto castOp = value.getDefiningOp<UnrealizedConversionCastOp>()) {
if (castOp->getNumOperands() != 1 || castOp->getNumResults() != 1)
break;
value = castOp.getOperand(0);
}
return value;
}| static void eraseDeadTileMetadataOps(ModuleOp module) { | ||
| bool changed = true; | ||
| while (changed) { | ||
| changed = false; | ||
| SmallVector<Operation *> opsToErase; | ||
| module.walk([&](pto::SetValidShapeOp op) { | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::GetValidShapeOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::TReshapeOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](pto::AllocTileOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
|
|
||
| opsToErase.clear(); | ||
| module.walk([&](UnrealizedConversionCastOp op) { | ||
| if (!op->use_empty()) | ||
| return; | ||
| opsToErase.push_back(op); | ||
| }); | ||
| for (Operation *op : llvm::reverse(opsToErase)) { | ||
| op->erase(); | ||
| changed = true; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
当前的 eraseDeadTileMetadataOps 实现中,在 while (changed) 循环内多次调用 module.walk 遍历整个模块。这种多轮全模块遍历在模块较大时会带来极大的性能开销(时间复杂度为
建议改用基于**工作队列(Worklist)**的死代码消除(DCE)算法。只需在最开始遍历一次模块收集所有初始的死操作,之后在删除操作时,动态地将因失去用户而变死的前驱操作加入工作队列。这样可以将时间复杂度降低到
static void eraseDeadTileMetadataOps(ModuleOp module) {
SmallVector<Operation *> worklist;
module.walk([&](Operation *op) {
if (isa<pto::SetValidShapeOp>(op)) {
worklist.push_back(op);
} else if (isa<pto::GetValidShapeOp, pto::TReshapeOp, pto::AllocTileOp, UnrealizedConversionCastOp>(op)) {
if (op->use_empty()) {
worklist.push_back(op);
}
}
});
while (!worklist.empty()) {
Operation *op = worklist.pop_back_val();
SmallVector<Value> operands(op->getOperands());
op->erase();
for (Value operand : operands) {
if (Operation *defOp = operand.getDefiningOp()) {
if (isa<pto::GetValidShapeOp, pto::TReshapeOp, pto::AllocTileOp, UnrealizedConversionCastOp>(defOp)) {
if (defOp->use_empty()) {
worklist.push_back(defOp);
}
}
}
}
}
}|
Fixes #827 |
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
|
/run a5 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #829xors
test_tmov_row_major_1x16_control_a5
test_tmov_col_major_16x1_align_a5
test_dynamic_valid_shape
test_barrier_sync
test_auto_sync_tail_hint
rmsnorm_incore_0
rar_optimization_test
nested_loop_confliect
matmul
decode_projection_incore_0
compensation_test
add_double_dynamic
rems
rem
rope_kv_cache
rmsnorm
qwen3_decode_incore_7
qwen3_decode_incore_6
qwen3_decode_incore_5
qwen3_decode_incore_4
qwen3_decode_incore_2
qwen3_decode_incore_1
qwen3_decode_incore_12
qwen3_decode_incore_11
qwen3_decode_incore_10
post_rmsnorm
vector_example_dag_kernel_mul
vector_example_dag_kernel_add_scalar
vector_example_dag_kernel_add
paged_attention_example_kernel_softmax_prepare
paged_attention_example_kernel_qk_matmul
paged_attention_example_kernel_pv_matmul
paged_attention_example_kernel_online_update
paged_attention_example_kernel_init_inplace
orchestration_example_kernel_mul
orchestration_example_kernel_add_scalar
orchestration_example_kernel_add
prelu
plan_memory_reuse_sequential
plan_memory_peak_exact_capacity
plan_memory_peak_8_overlapping
plan_memory_no_reuse_overlap
plan_memory_nested_loops
plan_memory_loop_no_reuse_outer_live
plan_memory_loop_in_if
plan_memory_if_yield
plan_memory_if_in_loop
plan_memory_fragmentation_two_holes
plan_memory_fragmentation_hole_fit
plan_memory_for_iter_args_yield
plan_memory_bind_tile_alias_liveness
partition_view_verify_valid
partition_view_verify_rank_mismatch_valid
partition5d_dynamic_a5
partition5d_a5
tensor_view_layout_dn
sparse_attn_test_incore_7
decode_swa_test_incore_40
decode_hca_test_incore_54
decode_csa_test_incore_81
attention_swa_test_incore_40
attention_hca_test_incore_54
attention_csa_test_refresh_incore_81
cmps
cmp
addptr_dynamic
|
问题总结
VPTO lowering pipeline 完成后,module 中残留的 tile 元数据 op 阻塞了 LLVM 导出。这些 op 分为两类:
语义已完成但未清理的死 op:
pto.set_valid_shape(始终无 user)、无 user 的pto.get_valid_shape、无 user 的pto.treshape、无 user 的pto.alloc_tile、无 user 的UnrealizedConversionCastOp。可被折叠的 cast 链:
pto.get_valid_shape→UnrealizedConversionCastOp→ LLVM i64 的 cast 链,其等价的 LLVM i64 值已存在于原始pto.alloc_tile的 valid row/col 属性中,但未被复用。修改方案
在
runPipeline()中,于 LLVM emit 之前插入cleanupResidualTileMetadataForLLVMExport()调用。该函数分两步执行清理:第一步:Rewire get_valid_shape → 直达 LLVM index 值
cleanupResidualTileMetadataForLLVMExport()遍历所有pto.get_valid_shapeop:peelSingleResultCast()穿透UnrealizedConversionCastOp找到其 source(期望为pto.alloc_tile)。alloc_tile中取出原始的 valid row/col value。findLLVMIndexValue()查找该 value 对应的 LLVM i64 值(穿透 cast 链查找或在其 user 中匹配 i64 类型的 cast 结果)。get_valid_shape下游的UnrealizedConversionCastOp的 i64 result 替换为找到的 LLVM i64 值,消除中间 cast 链。辅助函数:
peelSingleResultCast(Value): 递归穿透单输入单输出的UnrealizedConversionCastOp,返回最内层非 cast 的源头 value。findLLVMIndexValue(Value): 在 value 及其 user cast chain 中查找 i64 类型的 LLVM 值。第二步:迭代删除所有死 tile 元数据 op
eraseDeadTileMetadataOps()使用固定点迭代删除以下 op:pto.set_valid_shapepto.get_valid_shapepto.treshapepto.alloc_tileUnrealizedConversionCastOp采用 while-changed 循环确保级联删除(例如删除
get_valid_shape后,其下游 cast 变为无 user,下一轮被删除)。每轮按上述顺序依次收集并删除符合条件的 op。3. 涉及文件
lib/PTO/Transforms/VPTOCANN900LLVMEmitter.cpppeelSingleResultCast、findLLVMIndexValue、eraseDeadTileMetadataOps、cleanupResidualTileMetadataForLLVMExport四个静态函数;在runPipeline的 LLVM emit 前调用cleanupResidualTileMetadataForLLVMExportlib/PTO/Transforms/VPTOLLVMEmitter.cpp注意事项
tmp fix,当前清理逻辑仅覆盖已知的残留 op 类型。如果后续新增其他 tile metadata op 或更复杂的 cast 拓扑结构,可能需要扩展清理范围。get_valid_shape→alloc_tile的 def-use 链未被打断(即 lowering 过程中未插入额外的隔离 op),若未来 lowering 策略变化导致该链断裂,需调整peelSingleResultCast逻辑。