feat(codegen): Add runtime stride-based tensor offset computation and… by YunjiQin · Pull Request #197 · hw-native-sys/pypto

YunjiQin · 2026-02-13T10:15:06Z

Summary

Add Tensor struct pointer tracking in CodeContext
Add runtime stride-based tensor offset computation for block.load, block.store, block.l0c_store codegen
Implement codegen for tensor.dim operations
Support dynamic strides (-1) in GenerateStrideType
Fix orchestration codegen, remove kernel name param in generating pto2_rt_submit_task function call
Update tests to reflect dynamic stride behavior
Update tests/st/codegen/test_add_mul_orch_cce_codegen.py to use st framework for onboard test

gemini-code-assist · 2026-02-13T10:15:31Z

Summary of Changes

Hello @YunjiQin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the CCE code generation framework by introducing robust support for dynamic tensor operations. It enables the system to handle variable tensor shapes and memory layouts more effectively at runtime through stride-based offset computations and the ability to query tensor dimensions. This change improves the flexibility and adaptability of the generated code for various tensor processing scenarios.

Highlights

Dynamic Stride-Based Tensor Offset Computation: Implemented runtime stride-based offset calculation for block.load, block.store, and block.l0c_store operations, enabling more flexible tensor memory access.
Tensor Struct Pointer Tracking: Introduced a mechanism within CodeContext to track and manage Tensor struct pointers, which encapsulate buffer address, shape, and stride information.
Tensor Dimension Query Codegen: Added code generation support for the tensor.dim operation, allowing runtime querying of tensor dimensions.
Dynamic Stride Generation: Modified GenerateStrideType to produce dynamic stride placeholders (-1) for runtime resolution, rather than fixed compile-time values.

Changelog

include/pypto/codegen/cce/cce_codegen.h
- Added declarations for GetTensorStruct and RegisterOutputTensorStruct methods.
- Modified GenerateGlobalTensorTypeDeclaration to accept an optional tensor_struct_ptr.
include/pypto/codegen/cce/code_context.h
- Declared RegisterTensorStruct and GetTensorStruct methods.
- Added tensor_to_struct_pointer_ map to track tensor struct pointers.
include/pypto/codegen/cce/type_converter.h
- Removed the private CalculateRowMajorStrides helper method.
- Updated the GenerateStrideType signature.
src/backend/910B_CCE/backend_910b_cce_ops.cpp
- Introduced a static helper function ComputeStrideBasedOffset for dynamic offset calculation.
- Refactored MakeBlockLoadCodegenCCE, MakeBlockStoreCodegenCCE, and MakeBlockL0CStoreCodegenCCE to use the new stride-based offset computation and Tensor struct access.
- Added RegisterOutputTensorStruct calls in MakeBlockStoreCodegenCCE and MakeBlockL0CStoreCodegenCCE.
- Implemented MakeTensorDimCodegenCCE and registered it for the tensor.dim operation.
src/codegen/cce/cce_codegen.cpp
- Updated GeneratePrologue to pass the tensor struct pointer during global tensor declaration.
- Modified VisitStmt_ for IfStmt and ForStmt to ensure tensor struct pointer mappings are inherited.
- Implemented GetTensorStruct and RegisterOutputTensorStruct methods.
- Updated GenerateGlobalTensorTypeDeclaration to initialize GlobalTensor with dynamic strides from the tensor_struct_ptr.
src/codegen/cce/code_context.cpp
- Implemented RegisterTensorStruct and GetTensorStruct methods.
- Cleared the tensor_to_struct_pointer_ map in the Clear method.
src/codegen/cce/type_converter.cpp
- Removed the implementation of CalculateRowMajorStrides.
- Modified GenerateStrideType to generate -1 for dynamic strides instead of computed values.
tests/ut/codegen/test_type_converter.py
- Updated test_generate_stride to assert expected dynamic stride patterns (-1) for various tensor shapes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant feature for runtime stride-based tensor offset computation, which is crucial for handling dynamic tensor shapes in the CCE codegen. The changes are well-organized across the codegen engine, backend operator implementations, and context management. The implementation of tensor.dim and the updates to block.load/store to use the new dynamic offset calculation are solid. My review has identified a few minor areas for improvement, including removing unused variables/parameters, correcting documentation to match implementation, and addressing a potential signed/unsigned comparison issue. All comments align with existing guidelines or are not covered by specific rules, and thus no modifications or removals were necessary.

gemini-code-assist · 2026-02-13T10:17:15Z

+static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
+                                            ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {


The tensor_type parameter is unused within this function. It's good practice to remove unused parameters to keep the function signature clean and improve code clarity.

Suggested change

static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,

ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {

static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,

ir::MakeTuplePtr offsets) {

gemini-code-assist · 2026-02-13T10:17:15Z

+static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
+  auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base);
+  std::string target_var = codegen.GetCurrentResultTarget();
+  std::string input_var = codegen.GetExprAsCode(op->args_[0]);


The variable input_var is initialized but never used. It should be removed to avoid confusion and clean up the code.

gemini-code-assist · 2026-02-13T10:17:16Z

+   * Returns the Tensor struct pointer name that should be used for accessing
+   * buffer address and stride information. If no mapping exists, returns the
+   * input tensor_var_name itself (for compatibility).


The documentation for GetTensorStruct is inconsistent with its implementation. The documentation states that it returns the input tensor_var_name if no mapping exists, but the implementation in code_context.cpp throws an error using CHECK. The documentation should be updated to match the implementation's fail-fast behavior, which is safer.

Suggested change

* Returns the Tensor struct pointer name that should be used for accessing

* buffer address and stride information. If no mapping exists, returns the

* input tensor_var_name itself (for compatibility).

* Returns the Tensor struct pointer name that should be used for accessing

* buffer address and stride information. Throws an error if no mapping exists.

coderabbitai · 2026-02-24T12:47:54Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces tensor struct pointer mapping infrastructure to the CCE codegen system, enabling runtime-based tensor dimension and stride access through struct pointers. Changes include new registration and retrieval APIs in CodeContext and CCECodegen, refactored stride calculation to use runtime values instead of compile-time computation, implementation of tensor.dim backend operations, and simplified orchestration task submission calls.

Changes

Cohort / File(s)	Summary
Tensor Struct Pointer Mapping `include/pypto/codegen/cce/code_context.h`, `src/codegen/cce/code_context.cpp`, `include/pypto/codegen/cce/cce_codegen.h`, `src/codegen/cce/cce_codegen.cpp`	Added public APIs for registering and retrieving tensor struct pointers; extended GenerateGlobalTensorTypeDeclaration to accept optional tensor_struct_ptr parameter for initialization; updated tensor yield and initialization paths to propagate mappings across control-flow boundaries.
Runtime Stride Handling `include/pypto/codegen/cce/type_converter.h`, `src/codegen/cce/type_converter.cpp`	Removed CalculateRowMajorStrides helper; refactored GenerateStrideType to emit runtime-determined stride values (-1 placeholders) instead of compile-time row-major calculations; updated dimensionality validation to use shape size directly.
Backend Tensor Operations `src/backend/910B_CCE/backend_910b_cce_ops.cpp`	Introduced ComputeStrideBasedOffset helper to centralize multi-dimensional tensor offset calculation; replaced ad-hoc stride/offset logic in block operations (load, store, l0c_store) with standardized computation; added MakeTensorDimCodegenCCE and registered tensor.dim backend operation for 910B CCE; updated output tensor struct registrations and variable aliasing.
Orchestration Simplification `src/codegen/orchestration/orchestration_codegen.cpp`	Removed callee_name argument from pto2_rt_submit_task call, reducing parameter count while preserving runtime, function ID, worker specifier, and task parameters.
Test Refactoring `tests/st/codegen/test_add_mul_orch_cce_codegen.py`, `tests/ut/codegen/test_type_converter.py`	Restructured orchestration test to use harness-based PTOTestCase pattern with define_tensors, get_program, and compute_expected methods; updated stride generation test expectations to reflect dynamic (-1) stride values instead of concrete row-major computation.

Sequence Diagram(s)

sequenceDiagram
    participant Backend as Backend Ops<br/>(backend_910b_cce_ops)
    participant Codegen as CCECodegen
    participant Context as CodeContext
    participant Global as Global Tensor

    Backend->>Codegen: GenerateGlobalTensorTypeDeclaration<br/>(var_name, tensor_type, base_ptr, tensor_struct_ptr)
    activate Codegen
    Codegen->>Context: RegisterTensorStruct(var_name, struct_ptr)
    activate Context
    Context->>Context: Store mapping in<br/>tensor_to_struct_pointer_
    deactivate Context
    Codegen->>Global: Generate GlobalTensor with<br/>stride initialization from struct
    deactivate Codegen

    Backend->>Codegen: ComputeStrideBasedOffset<br/>(codegen, tensor_var, offsets)
    activate Codegen
    Codegen->>Context: GetTensorStruct(tensor_var)
    activate Context
    Context-->>Codegen: Return struct_ptr
    deactivate Context
    Codegen->>Codegen: Compute offset using<br/>base_offset + strides
    Codegen-->>Backend: Return computed offset
    deactivate Codegen

    Backend->>Codegen: RegisterOutputTensorStruct<br/>(output_var, tensor_var)
    activate Codegen
    Codegen->>Context: RegisterTensorStruct(output_var,<br/>GetTensorStruct(tensor_var))
    activate Context
    Context->>Context: Propagate struct mapping
    deactivate Context
    deactivate Codegen

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(codegen): enhance orchestration codegen with tensor APIs, scalar ops, and paged attention #204: Modifies CCE codegen infrastructure with tensor-related helpers and type utilities, sharing common codebase and refactoring patterns around tensor handling.

Suggested reviewers

Hzfengsy

Poem

🐰 Hop through strides with pointers bright,
Struct mappings dance in tensor light,
Runtime dims and offsets flow,
Where orchestration whispers low,
The rabbit cheers—refactoring's done! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main change: adding runtime stride-based tensor offset computation. It reflects the primary objective of the PR to support dynamic strides and tensor struct pointer handling.
Description check	✅ Passed	The description is directly related to the changeset, providing a clear bullet-point summary of all major changes including tensor struct tracking, stride-based offset computation, tensor.dim codegen, dynamic strides, orchestration fix, and test updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/codegen/cce/cce_codegen.cpp (1)
820-892: ⚠️ Potential issue | 🟡 Minor

Add invariant check to prevent invalid constructor generation when tensor_struct_ptr is provided without base_pointer.

The function allows both base_pointer and tensor_struct_ptr as independent optional parameters, but the code logic depends on them being correlated. If tensor_struct_ptr is present without base_pointer, line 849 appends ", {}, {" to an empty constructor argument list, generating invalid C++: GlobalTensorType var(, {}, {...}). While current call sites (line 199: both provided; line 330: neither provided) avoid this, the function signature permits the problematic pattern.
Suggested invariant check
   if (tensor_struct_ptr.has_value()) {
+    CHECK(base_pointer.has_value())
+        << "tensor_struct_ptr requires base_pointer (or derive it from the Tensor struct)";
     global_instance << ", {}, {";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/cce/cce_codegen.cpp` around lines 820 - 892, This function that
takes const std::optional<std::string>& base_pointer and const
std::optional<std::string>& tensor_struct_ptr must enforce the invariant that
tensor_struct_ptr implies base_pointer; add an INTERNAL_CHECK (similar to the
existing checks for var_name and tensor_type_) near the top of the function
(after the two existing INTERNAL_CHECK lines) that fails if
tensor_struct_ptr.has_value() && !base_pointer.has_value() with a clear message
like "Internal error: tensor_struct_ptr provided without base_pointer" so we
never emit an invalid constructor like GlobalTensorType var(, {}, {...}).

🧹 Nitpick comments (2)

tests/st/codegen/test_add_mul_orch_cce_codegen.py (1)

139-146: Silence unused params argument (Ruff ARG002).

The signature is part of the PTOTestCase contract, but you can mark it intentionally unused.

♻️ Suggested tweak

-    def compute_expected(self, tensors, params=None):
+    def compute_expected(self, tensors, _params=None):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py` around lines 139 - 146,
The compute_expected method signature declares an unused parameter params which
triggers Ruff ARG002; to silence this, rename the parameter to _params (or
prefix it with an underscore) in the compute_expected method definition (def
compute_expected(self, tensors, _params=None)) so it remains part of the
PTOTestCase contract but is clearly intentional and unused; ensure no other
references to params exist in compute_expected and adjust any internal
docstring/comments if needed.

src/backend/910B_CCE/backend_910b_cce_ops.cpp (1)

43-67: Validate offset arity and use tensor_type to avoid mismatched strides.

Right now tensor_type is unused and a mismatched offsets tuple would silently generate invalid code. Consider validating rank and using the parameter.

♻️ Suggested guard

 static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
                                             ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
   // Get Tensor struct pointer for stride access
   std::string tensor_struct = codegen.GetTensorStruct(tensor_var_name);

+  CHECK(offsets->elements_.size() == tensor_type->shape_.size())
+      << "Offsets tuple size must match tensor rank for stride-based access";
+
   // Build offset computation: offset[0] * stride[0] + offset[1] * stride[1] + ...
   std::ostringstream offset_computation;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 43 - 67, The
ComputeStrideBasedOffset function currently ignores tensor_type and doesn't
check that offsets (ir::MakeTuplePtr offsets) match the tensor rank, which can
produce invalid code; modify ComputeStrideBasedOffset to use tensor_type (e.g.,
tensor_type->shape() or tensor_type->rank()) to validate that
offsets->elements_.size() equals the tensor rank and throw or log an error if
mismatched, and ensure you only iterate up to the validated rank when building
the offset_computation (referencing tensor_struct from codegen.GetTensorStruct,
codegen.GetExprAsCode, and the offsets vector).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 858-890: MakeTensorDimCodegenCCE currently assumes the normalized
axis is valid and can produce invalid GlobalTensorDim indices; after you compute
ndims and normalize axis (the variable axis in MakeTensorDimCodegenCCE), add a
bounds check ensuring axis >= 0 && axis < ndims and fail fast if not (e.g.,
using CHECK or a clear error message that includes axis and ndims) before
computing gt_dim and emitting the GetShape call; reference the symbols axis,
ndims, gt_dim, and input_tensor_var_name so the check is placed between the
normalization block and the line that computes gt_dim/Emit.

In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 365-372: The code is calling CodeContext::GetTensorStruct without
guarding for missing mappings which triggers CHECK failures; update the three
sites (the blocks around the GetTensorStruct calls at the yield handling and the
other two mention spots) to first verify a mapping exists (e.g., via a new or
existing has/lookup method or by checking the pointer returned by
context_.GetPointer/other indicator) before calling context_.GetTensorStruct,
and only call context_.RegisterTensorStruct(return_var_name, yielded_struct)
when a valid yielded_struct is present; similarly ensure GetPointer is used
safely and RegisterPointer/ RegisterTensorStruct are only invoked when their
inputs are valid to avoid hard CHECKs on missing tensor structs.

In `@src/codegen/cce/code_context.cpp`:
- Around line 71-89: GetTensorStruct currently CHECKs and aborts when
tensor_to_struct_pointer_ has no entry, but the header/doc says it should fall
back to returning the input tensor name; change GetTensorStruct to return
tensor_var_name when tensor_to_struct_pointer_.find(tensor_var_name) == end()
instead of CHECKing, keeping the existing behavior of returning it->second when
present; no changes needed to RegisterTensorStruct other than ensuring it still
inserts into tensor_to_struct_pointer_.

---

Outside diff comments:
In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 820-892: This function that takes const
std::optional<std::string>& base_pointer and const std::optional<std::string>&
tensor_struct_ptr must enforce the invariant that tensor_struct_ptr implies
base_pointer; add an INTERNAL_CHECK (similar to the existing checks for var_name
and tensor_type_) near the top of the function (after the two existing
INTERNAL_CHECK lines) that fails if tensor_struct_ptr.has_value() &&
!base_pointer.has_value() with a clear message like "Internal error:
tensor_struct_ptr provided without base_pointer" so we never emit an invalid
constructor like GlobalTensorType var(, {}, {...}).

---

Nitpick comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 43-67: The ComputeStrideBasedOffset function currently ignores
tensor_type and doesn't check that offsets (ir::MakeTuplePtr offsets) match the
tensor rank, which can produce invalid code; modify ComputeStrideBasedOffset to
use tensor_type (e.g., tensor_type->shape() or tensor_type->rank()) to validate
that offsets->elements_.size() equals the tensor rank and throw or log an error
if mismatched, and ensure you only iterate up to the validated rank when
building the offset_computation (referencing tensor_struct from
codegen.GetTensorStruct, codegen.GetExprAsCode, and the offsets vector).

In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py`:
- Around line 139-146: The compute_expected method signature declares an unused
parameter params which triggers Ruff ARG002; to silence this, rename the
parameter to _params (or prefix it with an underscore) in the compute_expected
method definition (def compute_expected(self, tensors, _params=None)) so it
remains part of the PTOTestCase contract but is clearly intentional and unused;
ensure no other references to params exist in compute_expected and adjust any
internal docstring/comments if needed.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5aff958 and e0e9203.

📒 Files selected for processing (10)

include/pypto/codegen/cce/cce_codegen.h
include/pypto/codegen/cce/code_context.h
include/pypto/codegen/cce/type_converter.h
src/backend/910B_CCE/backend_910b_cce_ops.cpp
src/codegen/cce/cce_codegen.cpp
src/codegen/cce/code_context.cpp
src/codegen/cce/type_converter.cpp
src/codegen/orchestration/orchestration_codegen.cpp
tests/st/codegen/test_add_mul_orch_cce_codegen.py
tests/ut/codegen/test_type_converter.py

💤 Files with no reviewable changes (1)

include/pypto/codegen/cce/type_converter.h

coderabbitai · 2026-02-24T12:54:39Z

+static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
+  auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base);
+  std::string target_var = codegen.GetCurrentResultTarget();
+  std::string input_var = codegen.GetExprAsCode(op->args_[0]);
+  int axis = codegen.GetConstIntValue(op->args_[1]);
+
+  auto input_tensor = ir::As<ir::TensorType>(op->args_[0]->GetType());
+  CHECK(input_tensor) << "tensor.dim need TensorType for first arg, but got "
+                      << op->args_[0]->GetType()->TypeName();
+  int ndims = input_tensor->shape_.size();
+  int pad_dims = 5 - ndims;  // pto-isa pad shape to 5 dims
+
+  // get axis in GlobalTensor 5 dims
+  if (axis < 0) {
+    axis += ndims;
+  }
+  int gt_dim = pad_dims + axis;
+
+  // get GlobalTensor of input_tensor
+  auto input_tensor_var = ir::As<ir::Var>(op->args_[0]);
+  CHECK(input_tensor_var) << "tensor.dim need var with TensorType for first arg";
+  std::string input_tensor_var_name = codegen.GetVarName(input_tensor_var);
+
+  codegen.Emit("int " + target_var + " = " + input_tensor_var_name + ".GetShape(GlobalTensorDim::DIM_" +
+               std::to_string(gt_dim) + ");");
+  return "";
+}
+
+REGISTER_BACKEND_OP(Backend910B_CCE, "tensor.dim")
+    .set_pipe(ir::PipeType::S)
+    .f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
+      return MakeTensorDimCodegenCCE(op, codegen);
+    });


⚠️ Potential issue | 🟠 Major

Add axis bounds checks for tensor.dim.

Out-of-range axis values will produce invalid GlobalTensorDim indices (or negative ones) and generate broken code. Guard the range before computing gt_dim.

🛡️ Suggested bounds validation

int ndims = input_tensor->shape_.size(); int pad_dims = 5 - ndims; // pto-isa pad shape to 5 dims + CHECK(ndims > 0 && ndims <= 5) << "tensor.dim supports rank in [1, 5], got " << ndims; + CHECK(axis >= -ndims && axis < ndims) << "tensor.dim axis out of range: " << axis; + // get axis in GlobalTensor 5 dims if (axis < 0) { axis += ndims; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 858 - 890, MakeTensorDimCodegenCCE currently assumes the normalized axis is valid and can produce invalid GlobalTensorDim indices; after you compute ndims and normalize axis (the variable axis in MakeTensorDimCodegenCCE), add a bounds check ensuring axis >= 0 && axis < ndims and fail fast if not (e.g., using CHECK or a clear error message that includes axis and ndims) before computing gt_dim and emitting the GetShape call; reference the symbols axis, ndims, gt_dim, and input_tensor_var_name so the check is placed between the normalization block and the line that computes gt_dim/Emit.

coderabbitai · 2026-02-24T12:54:39Z

+      // If the yielded value is a TensorType (GlobalTensor), inherit both pointer and Tensor struct mappings
      if (std::dynamic_pointer_cast<const ir::TensorType>(return_var->GetType())) {
        std::string yielded_ptr = context_.GetPointer(yielded_value);
        context_.RegisterPointer(return_var_name, yielded_ptr);
+
+        std::string yielded_struct = context_.GetTensorStruct(yielded_value);
+        context_.RegisterTensorStruct(return_var_name, yielded_struct);
      }


⚠️ Potential issue | 🟠 Major

Guard GetTensorStruct to avoid hard CHECK failures.

CodeContext::GetTensorStruct CHECKs when no mapping exists. If any tensor reaches these paths without a registered struct pointer (e.g., static‑stride tensors or uninitialized temporaries), codegen will abort. Either enforce the invariant globally or guard these lookups.

🔧 Suggested guard (apply to all three sites)

- std::string yielded_struct = context_.GetTensorStruct(yielded_value); - context_.RegisterTensorStruct(return_var_name, yielded_struct); + if (context_.HasTensorStruct(yielded_value)) { + std::string yielded_struct = context_.GetTensorStruct(yielded_value); + context_.RegisterTensorStruct(return_var_name, yielded_struct); + }

Also applies to: 395-403, 446-455

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/codegen/cce/cce_codegen.cpp` around lines 365 - 372, The code is calling CodeContext::GetTensorStruct without guarding for missing mappings which triggers CHECK failures; update the three sites (the blocks around the GetTensorStruct calls at the yield handling and the other two mention spots) to first verify a mapping exists (e.g., via a new or existing has/lookup method or by checking the pointer returned by context_.GetPointer/other indicator) before calling context_.GetTensorStruct, and only call context_.RegisterTensorStruct(return_var_name, yielded_struct) when a valid yielded_struct is present; similarly ensure GetPointer is used safely and RegisterPointer/ RegisterTensorStruct are only invoked when their inputs are valid to avoid hard CHECKs on missing tensor structs.

coderabbitai · 2026-02-24T12:54:39Z

+void CodeContext::RegisterTensorStruct(const std::string& tensor_var_name,
+                                       const std::string& struct_ptr_name) {
+  CHECK(!tensor_var_name.empty()) << "Cannot register Tensor struct with empty tensor var name";
+  CHECK(!struct_ptr_name.empty()) << "Cannot register Tensor struct with empty pointer name";
+
+  auto it = tensor_to_struct_pointer_.find(tensor_var_name);
+  if (it != tensor_to_struct_pointer_.end()) {
+    LOG_WARN << "Tensor struct for tensor " << tensor_var_name << " re-registered with: " << struct_ptr_name
+             << " vs " << it->second;
+  }
+  tensor_to_struct_pointer_[tensor_var_name] = struct_ptr_name;
+}
+
+std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const {
+  auto it = tensor_to_struct_pointer_.find(tensor_var_name);
+  CHECK(it != tensor_to_struct_pointer_.end())
+      << "Tensor struct for tensor " << tensor_var_name << " not found";
+  return it->second;
+}


⚠️ Potential issue | 🟡 Minor

Align GetTensorStruct behavior with its documented fallback.

The header says a missing mapping should fall back to the input name, but this implementation hard CHECKs and aborts. Either update the docs or implement the fallback; the diff below matches the documented behavior.

♻️ Suggested alignment

std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const { auto it = tensor_to_struct_pointer_.find(tensor_var_name); - CHECK(it != tensor_to_struct_pointer_.end()) - << "Tensor struct for tensor " << tensor_var_name << " not found"; - return it->second; + if (it == tensor_to_struct_pointer_.end()) { + return tensor_var_name; + } + return it->second; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/codegen/cce/code_context.cpp` around lines 71 - 89, GetTensorStruct currently CHECKs and aborts when tensor_to_struct_pointer_ has no entry, but the header/doc says it should fall back to returning the input tensor name; change GetTensorStruct to return tensor_var_name when tensor_to_struct_pointer_.find(tensor_var_name) == end() instead of CHECKing, keeping the existing behavior of returning it->second when present; no changes needed to RegisterTensorStruct other than ensuring it still inserts into tensor_to_struct_pointer_.

… codegen for tensor.dim Implements dynamic stride-based offset computation for CCE codegen, replacing compile-time row-major stride calculation with runtime stride access from Tensor struct. Key changes: - Add Tensor struct pointer tracking in CodeContext - Update block.load, block.store, block.l0c_store to use runtime strides - Add codegen function for tensor.dim op - Change GenerateStrideType to emit dynamic strides (-1) - Fix orchestration codegen for pto2_rt_submit_task function call - Update tests to reflect dynamic stride behavior - Update tests/st/codegen/test_add_mul_orch_cce_codegen.py to use st framework for onboard test

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Feb 13, 2026

View reviewed changes

YunjiQin force-pushed the codegen_v2 branch from c1ec6b5 to e0e9203 Compare February 24, 2026 12:47

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

YunjiQin force-pushed the codegen_v2 branch from e0e9203 to 1aad544 Compare February 24, 2026 13:21

YunjiQin changed the title ~~[WIP] feat(codegen): Add runtime stride-based tensor offset computation and…~~ feat(codegen): Add runtime stride-based tensor offset computation and… Feb 24, 2026

Apply suggestions from code review

751e7de

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

lyfne123 merged commit 6a64ab1 into hw-native-sys:main Feb 25, 2026
5 checks passed

YunjiQin deleted the codegen_v2 branch February 26, 2026 03:18

This was referenced Mar 10, 2026

feat: support transposed matmul with TensorLayout across codegen backends #394

Merged

fix(codegen): adapt to simplified PTO2 runtime API #462

Closed

This was referenced Mar 23, 2026

feat(codegen): refactor orchestration and dispatch API #684

Closed

feat(codegen): adapt orchestration codegen for A5 (Ascend950) backend #734

Merged

		static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
		ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {

Conversation

YunjiQin commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

gemini-code-assist Bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YunjiQin commented Feb 13, 2026 •

edited

Loading

coderabbitai Bot commented Feb 24, 2026 •

edited

Loading