feat(codegen): Add runtime stride-based tensor offset computation and…#197
feat(codegen): Add runtime stride-based tensor offset computation and…#197lyfne123 merged 2 commits intohw-native-sys:mainfrom
Conversation
Summary of ChangesHello @YunjiQin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the CCE code generation framework by introducing robust support for dynamic tensor operations. It enables the system to handle variable tensor shapes and memory layouts more effectively at runtime through stride-based offset computations and the ability to query tensor dimensions. This change improves the flexibility and adaptability of the generated code for various tensor processing scenarios. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature for runtime stride-based tensor offset computation, which is crucial for handling dynamic tensor shapes in the CCE codegen. The changes are well-organized across the codegen engine, backend operator implementations, and context management. The implementation of tensor.dim and the updates to block.load/store to use the new dynamic offset calculation are solid. My review has identified a few minor areas for improvement, including removing unused variables/parameters, correcting documentation to match implementation, and addressing a potential signed/unsigned comparison issue. All comments align with existing guidelines or are not covered by specific rules, and thus no modifications or removals were necessary.
| static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name, | ||
| ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) { |
There was a problem hiding this comment.
The tensor_type parameter is unused within this function. It's good practice to remove unused parameters to keep the function signature clean and improve code clarity.
| static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name, | |
| ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) { | |
| static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name, | |
| ir::MakeTuplePtr offsets) { |
| static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) { | ||
| auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base); | ||
| std::string target_var = codegen.GetCurrentResultTarget(); | ||
| std::string input_var = codegen.GetExprAsCode(op->args_[0]); |
| * Returns the Tensor struct pointer name that should be used for accessing | ||
| * buffer address and stride information. If no mapping exists, returns the | ||
| * input tensor_var_name itself (for compatibility). |
There was a problem hiding this comment.
The documentation for GetTensorStruct is inconsistent with its implementation. The documentation states that it returns the input tensor_var_name if no mapping exists, but the implementation in code_context.cpp throws an error using CHECK. The documentation should be updated to match the implementation's fail-fast behavior, which is safer.
| * Returns the Tensor struct pointer name that should be used for accessing | |
| * buffer address and stride information. If no mapping exists, returns the | |
| * input tensor_var_name itself (for compatibility). | |
| * Returns the Tensor struct pointer name that should be used for accessing | |
| * buffer address and stride information. Throws an error if no mapping exists. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces tensor struct pointer mapping infrastructure to the CCE codegen system, enabling runtime-based tensor dimension and stride access through struct pointers. Changes include new registration and retrieval APIs in CodeContext and CCECodegen, refactored stride calculation to use runtime values instead of compile-time computation, implementation of tensor.dim backend operations, and simplified orchestration task submission calls. Changes
Sequence Diagram(s)sequenceDiagram
participant Backend as Backend Ops<br/>(backend_910b_cce_ops)
participant Codegen as CCECodegen
participant Context as CodeContext
participant Global as Global Tensor
Backend->>Codegen: GenerateGlobalTensorTypeDeclaration<br/>(var_name, tensor_type, base_ptr, tensor_struct_ptr)
activate Codegen
Codegen->>Context: RegisterTensorStruct(var_name, struct_ptr)
activate Context
Context->>Context: Store mapping in<br/>tensor_to_struct_pointer_
deactivate Context
Codegen->>Global: Generate GlobalTensor with<br/>stride initialization from struct
deactivate Codegen
Backend->>Codegen: ComputeStrideBasedOffset<br/>(codegen, tensor_var, offsets)
activate Codegen
Codegen->>Context: GetTensorStruct(tensor_var)
activate Context
Context-->>Codegen: Return struct_ptr
deactivate Context
Codegen->>Codegen: Compute offset using<br/>base_offset + strides
Codegen-->>Backend: Return computed offset
deactivate Codegen
Backend->>Codegen: RegisterOutputTensorStruct<br/>(output_var, tensor_var)
activate Codegen
Codegen->>Context: RegisterTensorStruct(output_var,<br/>GetTensorStruct(tensor_var))
activate Context
Context->>Context: Propagate struct mapping
deactivate Context
deactivate Codegen
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/codegen/cce/cce_codegen.cpp (1)
820-892:⚠️ Potential issue | 🟡 MinorAdd invariant check to prevent invalid constructor generation when
tensor_struct_ptris provided withoutbase_pointer.The function allows both
base_pointerandtensor_struct_ptras independent optional parameters, but the code logic depends on them being correlated. Iftensor_struct_ptris present withoutbase_pointer, line 849 appends", {}, {"to an empty constructor argument list, generating invalid C++:GlobalTensorType var(, {}, {...}). While current call sites (line 199: both provided; line 330: neither provided) avoid this, the function signature permits the problematic pattern.Suggested invariant check
if (tensor_struct_ptr.has_value()) { + CHECK(base_pointer.has_value()) + << "tensor_struct_ptr requires base_pointer (or derive it from the Tensor struct)"; global_instance << ", {}, {";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/codegen/cce/cce_codegen.cpp` around lines 820 - 892, This function that takes const std::optional<std::string>& base_pointer and const std::optional<std::string>& tensor_struct_ptr must enforce the invariant that tensor_struct_ptr implies base_pointer; add an INTERNAL_CHECK (similar to the existing checks for var_name and tensor_type_) near the top of the function (after the two existing INTERNAL_CHECK lines) that fails if tensor_struct_ptr.has_value() && !base_pointer.has_value() with a clear message like "Internal error: tensor_struct_ptr provided without base_pointer" so we never emit an invalid constructor like GlobalTensorType var(, {}, {...}).
🧹 Nitpick comments (2)
tests/st/codegen/test_add_mul_orch_cce_codegen.py (1)
139-146: Silence unusedparamsargument (Ruff ARG002).The signature is part of the PTOTestCase contract, but you can mark it intentionally unused.
♻️ Suggested tweak
- def compute_expected(self, tensors, params=None): + def compute_expected(self, tensors, _params=None):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py` around lines 139 - 146, The compute_expected method signature declares an unused parameter params which triggers Ruff ARG002; to silence this, rename the parameter to _params (or prefix it with an underscore) in the compute_expected method definition (def compute_expected(self, tensors, _params=None)) so it remains part of the PTOTestCase contract but is clearly intentional and unused; ensure no other references to params exist in compute_expected and adjust any internal docstring/comments if needed.src/backend/910B_CCE/backend_910b_cce_ops.cpp (1)
43-67: Validate offset arity and use tensor_type to avoid mismatched strides.Right now
tensor_typeis unused and a mismatched offsets tuple would silently generate invalid code. Consider validating rank and using the parameter.♻️ Suggested guard
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name, ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) { // Get Tensor struct pointer for stride access std::string tensor_struct = codegen.GetTensorStruct(tensor_var_name); + CHECK(offsets->elements_.size() == tensor_type->shape_.size()) + << "Offsets tuple size must match tensor rank for stride-based access"; + // Build offset computation: offset[0] * stride[0] + offset[1] * stride[1] + ... std::ostringstream offset_computation;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 43 - 67, The ComputeStrideBasedOffset function currently ignores tensor_type and doesn't check that offsets (ir::MakeTuplePtr offsets) match the tensor rank, which can produce invalid code; modify ComputeStrideBasedOffset to use tensor_type (e.g., tensor_type->shape() or tensor_type->rank()) to validate that offsets->elements_.size() equals the tensor rank and throw or log an error if mismatched, and ensure you only iterate up to the validated rank when building the offset_computation (referencing tensor_struct from codegen.GetTensorStruct, codegen.GetExprAsCode, and the offsets vector).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 858-890: MakeTensorDimCodegenCCE currently assumes the normalized
axis is valid and can produce invalid GlobalTensorDim indices; after you compute
ndims and normalize axis (the variable axis in MakeTensorDimCodegenCCE), add a
bounds check ensuring axis >= 0 && axis < ndims and fail fast if not (e.g.,
using CHECK or a clear error message that includes axis and ndims) before
computing gt_dim and emitting the GetShape call; reference the symbols axis,
ndims, gt_dim, and input_tensor_var_name so the check is placed between the
normalization block and the line that computes gt_dim/Emit.
In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 365-372: The code is calling CodeContext::GetTensorStruct without
guarding for missing mappings which triggers CHECK failures; update the three
sites (the blocks around the GetTensorStruct calls at the yield handling and the
other two mention spots) to first verify a mapping exists (e.g., via a new or
existing has/lookup method or by checking the pointer returned by
context_.GetPointer/other indicator) before calling context_.GetTensorStruct,
and only call context_.RegisterTensorStruct(return_var_name, yielded_struct)
when a valid yielded_struct is present; similarly ensure GetPointer is used
safely and RegisterPointer/ RegisterTensorStruct are only invoked when their
inputs are valid to avoid hard CHECKs on missing tensor structs.
In `@src/codegen/cce/code_context.cpp`:
- Around line 71-89: GetTensorStruct currently CHECKs and aborts when
tensor_to_struct_pointer_ has no entry, but the header/doc says it should fall
back to returning the input tensor name; change GetTensorStruct to return
tensor_var_name when tensor_to_struct_pointer_.find(tensor_var_name) == end()
instead of CHECKing, keeping the existing behavior of returning it->second when
present; no changes needed to RegisterTensorStruct other than ensuring it still
inserts into tensor_to_struct_pointer_.
---
Outside diff comments:
In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 820-892: This function that takes const
std::optional<std::string>& base_pointer and const std::optional<std::string>&
tensor_struct_ptr must enforce the invariant that tensor_struct_ptr implies
base_pointer; add an INTERNAL_CHECK (similar to the existing checks for var_name
and tensor_type_) near the top of the function (after the two existing
INTERNAL_CHECK lines) that fails if tensor_struct_ptr.has_value() &&
!base_pointer.has_value() with a clear message like "Internal error:
tensor_struct_ptr provided without base_pointer" so we never emit an invalid
constructor like GlobalTensorType var(, {}, {...}).
---
Nitpick comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 43-67: The ComputeStrideBasedOffset function currently ignores
tensor_type and doesn't check that offsets (ir::MakeTuplePtr offsets) match the
tensor rank, which can produce invalid code; modify ComputeStrideBasedOffset to
use tensor_type (e.g., tensor_type->shape() or tensor_type->rank()) to validate
that offsets->elements_.size() equals the tensor rank and throw or log an error
if mismatched, and ensure you only iterate up to the validated rank when
building the offset_computation (referencing tensor_struct from
codegen.GetTensorStruct, codegen.GetExprAsCode, and the offsets vector).
In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py`:
- Around line 139-146: The compute_expected method signature declares an unused
parameter params which triggers Ruff ARG002; to silence this, rename the
parameter to _params (or prefix it with an underscore) in the compute_expected
method definition (def compute_expected(self, tensors, _params=None)) so it
remains part of the PTOTestCase contract but is clearly intentional and unused;
ensure no other references to params exist in compute_expected and adjust any
internal docstring/comments if needed.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
include/pypto/codegen/cce/cce_codegen.hinclude/pypto/codegen/cce/code_context.hinclude/pypto/codegen/cce/type_converter.hsrc/backend/910B_CCE/backend_910b_cce_ops.cppsrc/codegen/cce/cce_codegen.cppsrc/codegen/cce/code_context.cppsrc/codegen/cce/type_converter.cppsrc/codegen/orchestration/orchestration_codegen.cpptests/st/codegen/test_add_mul_orch_cce_codegen.pytests/ut/codegen/test_type_converter.py
💤 Files with no reviewable changes (1)
- include/pypto/codegen/cce/type_converter.h
| static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) { | ||
| auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base); | ||
| std::string target_var = codegen.GetCurrentResultTarget(); | ||
| std::string input_var = codegen.GetExprAsCode(op->args_[0]); | ||
| int axis = codegen.GetConstIntValue(op->args_[1]); | ||
|
|
||
| auto input_tensor = ir::As<ir::TensorType>(op->args_[0]->GetType()); | ||
| CHECK(input_tensor) << "tensor.dim need TensorType for first arg, but got " | ||
| << op->args_[0]->GetType()->TypeName(); | ||
| int ndims = input_tensor->shape_.size(); | ||
| int pad_dims = 5 - ndims; // pto-isa pad shape to 5 dims | ||
|
|
||
| // get axis in GlobalTensor 5 dims | ||
| if (axis < 0) { | ||
| axis += ndims; | ||
| } | ||
| int gt_dim = pad_dims + axis; | ||
|
|
||
| // get GlobalTensor of input_tensor | ||
| auto input_tensor_var = ir::As<ir::Var>(op->args_[0]); | ||
| CHECK(input_tensor_var) << "tensor.dim need var with TensorType for first arg"; | ||
| std::string input_tensor_var_name = codegen.GetVarName(input_tensor_var); | ||
|
|
||
| codegen.Emit("int " + target_var + " = " + input_tensor_var_name + ".GetShape(GlobalTensorDim::DIM_" + | ||
| std::to_string(gt_dim) + ");"); | ||
| return ""; | ||
| } | ||
|
|
||
| REGISTER_BACKEND_OP(Backend910B_CCE, "tensor.dim") | ||
| .set_pipe(ir::PipeType::S) | ||
| .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) { | ||
| return MakeTensorDimCodegenCCE(op, codegen); | ||
| }); |
There was a problem hiding this comment.
Add axis bounds checks for tensor.dim.
Out-of-range axis values will produce invalid GlobalTensorDim indices (or negative ones) and generate broken code. Guard the range before computing gt_dim.
🛡️ Suggested bounds validation
int ndims = input_tensor->shape_.size();
int pad_dims = 5 - ndims; // pto-isa pad shape to 5 dims
+ CHECK(ndims > 0 && ndims <= 5) << "tensor.dim supports rank in [1, 5], got " << ndims;
+ CHECK(axis >= -ndims && axis < ndims) << "tensor.dim axis out of range: " << axis;
+
// get axis in GlobalTensor 5 dims
if (axis < 0) {
axis += ndims;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 858 - 890,
MakeTensorDimCodegenCCE currently assumes the normalized axis is valid and can
produce invalid GlobalTensorDim indices; after you compute ndims and normalize
axis (the variable axis in MakeTensorDimCodegenCCE), add a bounds check ensuring
axis >= 0 && axis < ndims and fail fast if not (e.g., using CHECK or a clear
error message that includes axis and ndims) before computing gt_dim and emitting
the GetShape call; reference the symbols axis, ndims, gt_dim, and
input_tensor_var_name so the check is placed between the normalization block and
the line that computes gt_dim/Emit.
| // If the yielded value is a TensorType (GlobalTensor), inherit both pointer and Tensor struct mappings | ||
| if (std::dynamic_pointer_cast<const ir::TensorType>(return_var->GetType())) { | ||
| std::string yielded_ptr = context_.GetPointer(yielded_value); | ||
| context_.RegisterPointer(return_var_name, yielded_ptr); | ||
|
|
||
| std::string yielded_struct = context_.GetTensorStruct(yielded_value); | ||
| context_.RegisterTensorStruct(return_var_name, yielded_struct); | ||
| } |
There was a problem hiding this comment.
Guard GetTensorStruct to avoid hard CHECK failures.
CodeContext::GetTensorStruct CHECKs when no mapping exists. If any tensor reaches these paths without a registered struct pointer (e.g., static‑stride tensors or uninitialized temporaries), codegen will abort. Either enforce the invariant globally or guard these lookups.
🔧 Suggested guard (apply to all three sites)
- std::string yielded_struct = context_.GetTensorStruct(yielded_value);
- context_.RegisterTensorStruct(return_var_name, yielded_struct);
+ if (context_.HasTensorStruct(yielded_value)) {
+ std::string yielded_struct = context_.GetTensorStruct(yielded_value);
+ context_.RegisterTensorStruct(return_var_name, yielded_struct);
+ }Also applies to: 395-403, 446-455
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/cce/cce_codegen.cpp` around lines 365 - 372, The code is calling
CodeContext::GetTensorStruct without guarding for missing mappings which
triggers CHECK failures; update the three sites (the blocks around the
GetTensorStruct calls at the yield handling and the other two mention spots) to
first verify a mapping exists (e.g., via a new or existing has/lookup method or
by checking the pointer returned by context_.GetPointer/other indicator) before
calling context_.GetTensorStruct, and only call
context_.RegisterTensorStruct(return_var_name, yielded_struct) when a valid
yielded_struct is present; similarly ensure GetPointer is used safely and
RegisterPointer/ RegisterTensorStruct are only invoked when their inputs are
valid to avoid hard CHECKs on missing tensor structs.
| void CodeContext::RegisterTensorStruct(const std::string& tensor_var_name, | ||
| const std::string& struct_ptr_name) { | ||
| CHECK(!tensor_var_name.empty()) << "Cannot register Tensor struct with empty tensor var name"; | ||
| CHECK(!struct_ptr_name.empty()) << "Cannot register Tensor struct with empty pointer name"; | ||
|
|
||
| auto it = tensor_to_struct_pointer_.find(tensor_var_name); | ||
| if (it != tensor_to_struct_pointer_.end()) { | ||
| LOG_WARN << "Tensor struct for tensor " << tensor_var_name << " re-registered with: " << struct_ptr_name | ||
| << " vs " << it->second; | ||
| } | ||
| tensor_to_struct_pointer_[tensor_var_name] = struct_ptr_name; | ||
| } | ||
|
|
||
| std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const { | ||
| auto it = tensor_to_struct_pointer_.find(tensor_var_name); | ||
| CHECK(it != tensor_to_struct_pointer_.end()) | ||
| << "Tensor struct for tensor " << tensor_var_name << " not found"; | ||
| return it->second; | ||
| } |
There was a problem hiding this comment.
Align GetTensorStruct behavior with its documented fallback.
The header says a missing mapping should fall back to the input name, but this implementation hard CHECKs and aborts. Either update the docs or implement the fallback; the diff below matches the documented behavior.
♻️ Suggested alignment
std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const {
auto it = tensor_to_struct_pointer_.find(tensor_var_name);
- CHECK(it != tensor_to_struct_pointer_.end())
- << "Tensor struct for tensor " << tensor_var_name << " not found";
- return it->second;
+ if (it == tensor_to_struct_pointer_.end()) {
+ return tensor_var_name;
+ }
+ return it->second;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/cce/code_context.cpp` around lines 71 - 89, GetTensorStruct
currently CHECKs and aborts when tensor_to_struct_pointer_ has no entry, but the
header/doc says it should fall back to returning the input tensor name; change
GetTensorStruct to return tensor_var_name when
tensor_to_struct_pointer_.find(tensor_var_name) == end() instead of CHECKing,
keeping the existing behavior of returning it->second when present; no changes
needed to RegisterTensorStruct other than ensuring it still inserts into
tensor_to_struct_pointer_.
… codegen for tensor.dim Implements dynamic stride-based offset computation for CCE codegen, replacing compile-time row-major stride calculation with runtime stride access from Tensor struct. Key changes: - Add Tensor struct pointer tracking in CodeContext - Update block.load, block.store, block.l0c_store to use runtime strides - Add codegen function for tensor.dim op - Change GenerateStrideType to emit dynamic strides (-1) - Fix orchestration codegen for pto2_rt_submit_task function call - Update tests to reflect dynamic stride behavior - Update tests/st/codegen/test_add_mul_orch_cce_codegen.py to use st framework for onboard test
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary