Skip to content

Add shape inference.#841

Open
TelGome wants to merge 1 commit into
hw-native-sys:mainfrom
TelGome:shape_inference
Open

Add shape inference.#841
TelGome wants to merge 1 commit into
hw-native-sys:mainfrom
TelGome:shape_inference

Conversation

@TelGome

@TelGome TelGome commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related improvements to the tile fusion pipeline that enable elementwise + reduce op chains (e.g. tadd → tmul → trowsum) to be fused into a single fusion_region with computed dynamic shapes, instead of generating separate elementwise and reduce loops.

1. Shape inference via structural-equivalence canonicalization

Replaces the previous SSA-value-only canonicalizeValue with a StructuralSignatureMap that deduplicates computable shape expressions (e.g. mins(vrow, c32) computed independently by different stages) into the same solver dimension. Two arith ops with the same name, attributes, and structurally equivalent operands are now treated as one representative, enabling stages that use separately materialized but structurally identical dynamic shapes to share loop bounds in the fused loop nest.

Key changes:

  • FusionAnalysis.cpp: Introduce StructuralSignatureMap with structural-equivalence key (op name, canonical operand list, attribute dictionary). The solver merges structurally equivalent shape expressions into a single dimension, while preserving differences (e.g. mins(vrow0, c32) vs mins(vrow1, c32) where vrow0 and vrow1 are different block arguments remain separate).
  • PTOFusionPlan.cpp: Mark reduce ops (trowsum, trowmax, trowmin, etc.) as Elementwise for fusion grouping — they share the row iteration domain with their producers, and the solver correctly binds their output domain (e.g. 32×1 for row-reduce).
  • PTOOpScheduling.cpp: Handle reduce output domains in scheduling.

2. Epilogue ops support in PTOLowLevelLoopFusion

Previously, analyzeStage rejected any ops appearing after the child loop within a loop level. Reduce ops like trowmax/trowsum produce a row-result store (a vsts writing the 32×1 output tile) after the inner col loop — these were classified as "unsupported" and prevented fusion.

This change introduces epilogueOps to LoopLevelInfo, collecting ops that appear after the child loop. These are:

  • Validated by isSupportedPreludeOp (no regions, pure or alias-analyzable memory ops)
  • Checked for alias conflicts with prior stages via canMovePreludeAcrossPriorStages
  • Cloned after all inner-level body ops in buildFusedLoopNestAtLevel, ensuring they appear after the fused inner-loop computation for their stage

Result: tadd → trowsum chains now produce a single fusion_region containing a row loop with iter_args (reduce accumulator), an inner col loop with both elementwise and reduce computations, and an epilogue vsts for the reduce row-result store.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the tile fusion pipeline to support operations with computed dynamic shapes by introducing a ShapeConstraintSolver and structural-equivalence-based shape canonicalization. It also adds support for row and column reduction operations, handles epilogue operations in low-level loop fusion, and updates scheduling dependencies for tile allocations. Feedback on these changes highlights two main issues: first, arith.divi is not a valid MLIR Arith operation and should be replaced with arith.divsi and arith.divui; second, there is a safety gap in the alias analysis where prior stages' epilogue operations are reordered across later stages' operations without being checked for dependencies.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +66 to +73
if (opName.starts_with("arith.")) {
return opName == "arith.minsi" || opName == "arith.maxsi" ||
opName == "arith.minui" || opName == "arith.maxui" ||
opName == "arith.cmpi" || opName == "arith.select" ||
opName == "arith.addi" || opName == "arith.subi" ||
opName == "arith.muli" || opName == "arith.divi" ||
opName == "arith.index_cast";
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the MLIR Arith dialect, there is no arith.divi operation. The integer division operations are arith.divsi (signed division) and arith.divui (unsigned division). Please update the check to include arith.divsi and arith.divui instead of arith.divi to ensure integer division operations can participate in shape canonicalization.

Suggested change
if (opName.starts_with("arith.")) {
return opName == "arith.minsi" || opName == "arith.maxsi" ||
opName == "arith.minui" || opName == "arith.maxui" ||
opName == "arith.cmpi" || opName == "arith.select" ||
opName == "arith.addi" || opName == "arith.subi" ||
opName == "arith.muli" || opName == "arith.divi" ||
opName == "arith.index_cast";
}
if (opName.starts_with("arith.")) {
return opName == "arith.minsi" || opName == "arith.maxsi" ||
opName == "arith.minui" || opName == "arith.maxui" ||
opName == "arith.cmpi" || opName == "arith.select" ||
opName == "arith.addi" || opName == "arith.subi" ||
opName == "arith.muli" || opName == "arith.divsi" ||
opName == "arith.divui" || opName == "arith.index_cast";
}

Comment on lines +289 to +294
// Epilogue ops are cloned after prior-stage leaf/epilogue ops in
// the fused loop, so they must not alias prior-stage memory ops.
for (Operation *op : level.epilogueOps) {
if (!canMovePreludeAcrossPriorStages(op, priorStages, debugOS))
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is an asymmetry and potential safety gap in the alias analysis for epilogue operations:

  1. Redundancy: Checking canMovePreludeAcrossPriorStages(op, priorStages) for the current stage's epilogue operations (level.epilogueOps) is redundant because they are cloned after all prior stages' leaf and epilogue operations in the fused loop. They do not move across them.
  2. Omission: Conversely, prior stages' epilogue operations (priorStage.epilogueOps) are reordered to execute after the current stage's leaf operations (level.leafOps / level.preludeOps). This means prior stages' epilogue operations are moved across later stages' operations, but we never check if they alias or have dependencies. While the fusion plan's domain-split checks mitigate this for direct dataflow, we should still perform a proper safety check to ensure prior stages' epilogue operations do not alias later stages' leaf/prelude operations.

@reedhecre

reedhecre commented Jun 18, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: Add shape inference. #841 Add shape inference.
  • Author: TelGome
  • Base/Head: main / shape_inference
  • Head SHA: edaf7bbd458b
  • Trigger: PR 有新提交
  • Generated At: 2026-06-18T12:12:54Z
  • Previous Head SHA: 338ba6d67395
  • Status: completed

Summary

检查到 2 个问题:新的 reduce-epilogue loop fusion 有副作用重排风险,且 tcolsum 的 shape inference 把 scratch tmp 错误并入了迭代域。

Findings

  1. P1 Reduce epilogue 的低层 loop fusion 缺少对先前 epilogue 副作用的合法性检查 lib/PTO/Transforms/TileFusion/PTOLowLevelLoopFusion.cpp:255

这次改动允许 stage 在 child loop 后带 epilogueOps,并在构造 fused loop 时先克隆所有 stage 的 leaf,再克隆各 stage 的 epilogue。这样后续 stage 的 leaf/epilogue 都会跨过前一个 stage 的 epilogue 改变执行顺序。但 canMovePreludeAcrossPriorStages 仍然只检查 priorStage.leafOps,没有把先前 stage 的 epilogueOps 纳入 alias 检查。只要前一 stage 的规约写回和后一 stage 的内存操作命中同一 root,这里就会把原本有顺序依赖的两个 stage 错误地 fuse,直接产生 wrong-code。

  1. P2 列规约 shape inference 把 `tcolsum` 的 tmp scratch 错当成迭代域输入 lib/PTO/Transforms/TileFusion/FusionAnalysis.cpp:438

ReduceCol 分支先对 semantics.tileInputsmergeAllShapes,这会把 src 和可选的 tmp 一起并入同一个 shape class。对 pto.tcolsum 来说,当前 verifier 只要求 tmp 是 ND-style vec tile 且元素类型匹配,并没有要求 tmpvalid_shapesrc/dst 相同。因此一个当前合法的 tcolsum(scratch tmp 形状不同)会被新的 shape solver 判成 inconsistent/unproven,拿不到 fusion metadata。这是本 PR 新引入的 contract mismatch。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants