Skip to content

feat(codegen): Add runtime stride-based tensor offset computation and…#197

Merged
lyfne123 merged 2 commits intohw-native-sys:mainfrom
YunjiQin:codegen_v2
Feb 25, 2026
Merged

feat(codegen): Add runtime stride-based tensor offset computation and…#197
lyfne123 merged 2 commits intohw-native-sys:mainfrom
YunjiQin:codegen_v2

Conversation

@YunjiQin
Copy link
Copy Markdown
Contributor

@YunjiQin YunjiQin commented Feb 13, 2026

Summary

  • Add Tensor struct pointer tracking in CodeContext
  • Add runtime stride-based tensor offset computation for block.load, block.store, block.l0c_store codegen
  • Implement codegen for tensor.dim operations
  • Support dynamic strides (-1) in GenerateStrideType
  • Fix orchestration codegen, remove kernel name param in generating pto2_rt_submit_task function call
  • Update tests to reflect dynamic stride behavior
  • Update tests/st/codegen/test_add_mul_orch_cce_codegen.py to use st framework for onboard test

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @YunjiQin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the CCE code generation framework by introducing robust support for dynamic tensor operations. It enables the system to handle variable tensor shapes and memory layouts more effectively at runtime through stride-based offset computations and the ability to query tensor dimensions. This change improves the flexibility and adaptability of the generated code for various tensor processing scenarios.

Highlights

  • Dynamic Stride-Based Tensor Offset Computation: Implemented runtime stride-based offset calculation for block.load, block.store, and block.l0c_store operations, enabling more flexible tensor memory access.
  • Tensor Struct Pointer Tracking: Introduced a mechanism within CodeContext to track and manage Tensor struct pointers, which encapsulate buffer address, shape, and stride information.
  • Tensor Dimension Query Codegen: Added code generation support for the tensor.dim operation, allowing runtime querying of tensor dimensions.
  • Dynamic Stride Generation: Modified GenerateStrideType to produce dynamic stride placeholders (-1) for runtime resolution, rather than fixed compile-time values.
Changelog
  • include/pypto/codegen/cce/cce_codegen.h
    • Added declarations for GetTensorStruct and RegisterOutputTensorStruct methods.
    • Modified GenerateGlobalTensorTypeDeclaration to accept an optional tensor_struct_ptr.
  • include/pypto/codegen/cce/code_context.h
    • Declared RegisterTensorStruct and GetTensorStruct methods.
    • Added tensor_to_struct_pointer_ map to track tensor struct pointers.
  • include/pypto/codegen/cce/type_converter.h
    • Removed the private CalculateRowMajorStrides helper method.
    • Updated the GenerateStrideType signature.
  • src/backend/910B_CCE/backend_910b_cce_ops.cpp
    • Introduced a static helper function ComputeStrideBasedOffset for dynamic offset calculation.
    • Refactored MakeBlockLoadCodegenCCE, MakeBlockStoreCodegenCCE, and MakeBlockL0CStoreCodegenCCE to use the new stride-based offset computation and Tensor struct access.
    • Added RegisterOutputTensorStruct calls in MakeBlockStoreCodegenCCE and MakeBlockL0CStoreCodegenCCE.
    • Implemented MakeTensorDimCodegenCCE and registered it for the tensor.dim operation.
  • src/codegen/cce/cce_codegen.cpp
    • Updated GeneratePrologue to pass the tensor struct pointer during global tensor declaration.
    • Modified VisitStmt_ for IfStmt and ForStmt to ensure tensor struct pointer mappings are inherited.
    • Implemented GetTensorStruct and RegisterOutputTensorStruct methods.
    • Updated GenerateGlobalTensorTypeDeclaration to initialize GlobalTensor with dynamic strides from the tensor_struct_ptr.
  • src/codegen/cce/code_context.cpp
    • Implemented RegisterTensorStruct and GetTensorStruct methods.
    • Cleared the tensor_to_struct_pointer_ map in the Clear method.
  • src/codegen/cce/type_converter.cpp
    • Removed the implementation of CalculateRowMajorStrides.
    • Modified GenerateStrideType to generate -1 for dynamic strides instead of computed values.
  • tests/ut/codegen/test_type_converter.py
    • Updated test_generate_stride to assert expected dynamic stride patterns (-1) for various tensor shapes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature for runtime stride-based tensor offset computation, which is crucial for handling dynamic tensor shapes in the CCE codegen. The changes are well-organized across the codegen engine, backend operator implementations, and context management. The implementation of tensor.dim and the updates to block.load/store to use the new dynamic offset calculation are solid. My review has identified a few minor areas for improvement, including removing unused variables/parameters, correcting documentation to match implementation, and addressing a potential signed/unsigned comparison issue. All comments align with existing guidelines or are not covered by specific rules, and thus no modifications or removals were necessary.

Comment on lines +45 to +46
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tensor_type parameter is unused within this function. It's good practice to remove unused parameters to keep the function signature clean and improve code clarity.

Suggested change
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
ir::MakeTuplePtr offsets) {

static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base);
std::string target_var = codegen.GetCurrentResultTarget();
std::string input_var = codegen.GetExprAsCode(op->args_[0]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable input_var is initialized but never used. It should be removed to avoid confusion and clean up the code.

Comment thread src/codegen/cce/cce_codegen.cpp Outdated
Comment on lines +101 to +103
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. If no mapping exists, returns the
* input tensor_var_name itself (for compatibility).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation for GetTensorStruct is inconsistent with its implementation. The documentation states that it returns the input tensor_var_name if no mapping exists, but the implementation in code_context.cpp throws an error using CHECK. The documentation should be updated to match the implementation's fail-fast behavior, which is safer.

Suggested change
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. If no mapping exists, returns the
* input tensor_var_name itself (for compatibility).
* Returns the Tensor struct pointer name that should be used for accessing
* buffer address and stride information. Throws an error if no mapping exists.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces tensor struct pointer mapping infrastructure to the CCE codegen system, enabling runtime-based tensor dimension and stride access through struct pointers. Changes include new registration and retrieval APIs in CodeContext and CCECodegen, refactored stride calculation to use runtime values instead of compile-time computation, implementation of tensor.dim backend operations, and simplified orchestration task submission calls.

Changes

Cohort / File(s) Summary
Tensor Struct Pointer Mapping
include/pypto/codegen/cce/code_context.h, src/codegen/cce/code_context.cpp, include/pypto/codegen/cce/cce_codegen.h, src/codegen/cce/cce_codegen.cpp
Added public APIs for registering and retrieving tensor struct pointers; extended GenerateGlobalTensorTypeDeclaration to accept optional tensor_struct_ptr parameter for initialization; updated tensor yield and initialization paths to propagate mappings across control-flow boundaries.
Runtime Stride Handling
include/pypto/codegen/cce/type_converter.h, src/codegen/cce/type_converter.cpp
Removed CalculateRowMajorStrides helper; refactored GenerateStrideType to emit runtime-determined stride values (-1 placeholders) instead of compile-time row-major calculations; updated dimensionality validation to use shape size directly.
Backend Tensor Operations
src/backend/910B_CCE/backend_910b_cce_ops.cpp
Introduced ComputeStrideBasedOffset helper to centralize multi-dimensional tensor offset calculation; replaced ad-hoc stride/offset logic in block operations (load, store, l0c_store) with standardized computation; added MakeTensorDimCodegenCCE and registered tensor.dim backend operation for 910B CCE; updated output tensor struct registrations and variable aliasing.
Orchestration Simplification
src/codegen/orchestration/orchestration_codegen.cpp
Removed callee_name argument from pto2_rt_submit_task call, reducing parameter count while preserving runtime, function ID, worker specifier, and task parameters.
Test Refactoring
tests/st/codegen/test_add_mul_orch_cce_codegen.py, tests/ut/codegen/test_type_converter.py
Restructured orchestration test to use harness-based PTOTestCase pattern with define_tensors, get_program, and compute_expected methods; updated stride generation test expectations to reflect dynamic (-1) stride values instead of concrete row-major computation.

Sequence Diagram(s)

sequenceDiagram
    participant Backend as Backend Ops<br/>(backend_910b_cce_ops)
    participant Codegen as CCECodegen
    participant Context as CodeContext
    participant Global as Global Tensor

    Backend->>Codegen: GenerateGlobalTensorTypeDeclaration<br/>(var_name, tensor_type, base_ptr, tensor_struct_ptr)
    activate Codegen
    Codegen->>Context: RegisterTensorStruct(var_name, struct_ptr)
    activate Context
    Context->>Context: Store mapping in<br/>tensor_to_struct_pointer_
    deactivate Context
    Codegen->>Global: Generate GlobalTensor with<br/>stride initialization from struct
    deactivate Codegen

    Backend->>Codegen: ComputeStrideBasedOffset<br/>(codegen, tensor_var, offsets)
    activate Codegen
    Codegen->>Context: GetTensorStruct(tensor_var)
    activate Context
    Context-->>Codegen: Return struct_ptr
    deactivate Context
    Codegen->>Codegen: Compute offset using<br/>base_offset + strides
    Codegen-->>Backend: Return computed offset
    deactivate Codegen

    Backend->>Codegen: RegisterOutputTensorStruct<br/>(output_var, tensor_var)
    activate Codegen
    Codegen->>Context: RegisterTensorStruct(output_var,<br/>GetTensorStruct(tensor_var))
    activate Context
    Context->>Context: Propagate struct mapping
    deactivate Context
    deactivate Codegen
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Hzfengsy

Poem

🐰 Hop through strides with pointers bright,
Struct mappings dance in tensor light,
Runtime dims and offsets flow,
Where orchestration whispers low,
The rabbit cheers—refactoring's done! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main change: adding runtime stride-based tensor offset computation. It reflects the primary objective of the PR to support dynamic strides and tensor struct pointer handling.
Description check ✅ Passed The description is directly related to the changeset, providing a clear bullet-point summary of all major changes including tensor struct tracking, stride-based offset computation, tensor.dim codegen, dynamic strides, orchestration fix, and test updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/codegen/cce/cce_codegen.cpp (1)

820-892: ⚠️ Potential issue | 🟡 Minor

Add invariant check to prevent invalid constructor generation when tensor_struct_ptr is provided without base_pointer.

The function allows both base_pointer and tensor_struct_ptr as independent optional parameters, but the code logic depends on them being correlated. If tensor_struct_ptr is present without base_pointer, line 849 appends ", {}, {" to an empty constructor argument list, generating invalid C++: GlobalTensorType var(, {}, {...}). While current call sites (line 199: both provided; line 330: neither provided) avoid this, the function signature permits the problematic pattern.

Suggested invariant check
   if (tensor_struct_ptr.has_value()) {
+    CHECK(base_pointer.has_value())
+        << "tensor_struct_ptr requires base_pointer (or derive it from the Tensor struct)";
     global_instance << ", {}, {";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/cce/cce_codegen.cpp` around lines 820 - 892, This function that
takes const std::optional<std::string>& base_pointer and const
std::optional<std::string>& tensor_struct_ptr must enforce the invariant that
tensor_struct_ptr implies base_pointer; add an INTERNAL_CHECK (similar to the
existing checks for var_name and tensor_type_) near the top of the function
(after the two existing INTERNAL_CHECK lines) that fails if
tensor_struct_ptr.has_value() && !base_pointer.has_value() with a clear message
like "Internal error: tensor_struct_ptr provided without base_pointer" so we
never emit an invalid constructor like GlobalTensorType var(, {}, {...}).
🧹 Nitpick comments (2)
tests/st/codegen/test_add_mul_orch_cce_codegen.py (1)

139-146: Silence unused params argument (Ruff ARG002).

The signature is part of the PTOTestCase contract, but you can mark it intentionally unused.

♻️ Suggested tweak
-    def compute_expected(self, tensors, params=None):
+    def compute_expected(self, tensors, _params=None):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py` around lines 139 - 146,
The compute_expected method signature declares an unused parameter params which
triggers Ruff ARG002; to silence this, rename the parameter to _params (or
prefix it with an underscore) in the compute_expected method definition (def
compute_expected(self, tensors, _params=None)) so it remains part of the
PTOTestCase contract but is clearly intentional and unused; ensure no other
references to params exist in compute_expected and adjust any internal
docstring/comments if needed.
src/backend/910B_CCE/backend_910b_cce_ops.cpp (1)

43-67: Validate offset arity and use tensor_type to avoid mismatched strides.

Right now tensor_type is unused and a mismatched offsets tuple would silently generate invalid code. Consider validating rank and using the parameter.

♻️ Suggested guard
 static std::string ComputeStrideBasedOffset(codegen::CCECodegen& codegen, const std::string& tensor_var_name,
                                             ir::MakeTuplePtr offsets, const ir::TensorTypePtr& tensor_type) {
   // Get Tensor struct pointer for stride access
   std::string tensor_struct = codegen.GetTensorStruct(tensor_var_name);

+  CHECK(offsets->elements_.size() == tensor_type->shape_.size())
+      << "Offsets tuple size must match tensor rank for stride-based access";
+
   // Build offset computation: offset[0] * stride[0] + offset[1] * stride[1] + ...
   std::ostringstream offset_computation;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 43 - 67, The
ComputeStrideBasedOffset function currently ignores tensor_type and doesn't
check that offsets (ir::MakeTuplePtr offsets) match the tensor rank, which can
produce invalid code; modify ComputeStrideBasedOffset to use tensor_type (e.g.,
tensor_type->shape() or tensor_type->rank()) to validate that
offsets->elements_.size() equals the tensor rank and throw or log an error if
mismatched, and ensure you only iterate up to the validated rank when building
the offset_computation (referencing tensor_struct from codegen.GetTensorStruct,
codegen.GetExprAsCode, and the offsets vector).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 858-890: MakeTensorDimCodegenCCE currently assumes the normalized
axis is valid and can produce invalid GlobalTensorDim indices; after you compute
ndims and normalize axis (the variable axis in MakeTensorDimCodegenCCE), add a
bounds check ensuring axis >= 0 && axis < ndims and fail fast if not (e.g.,
using CHECK or a clear error message that includes axis and ndims) before
computing gt_dim and emitting the GetShape call; reference the symbols axis,
ndims, gt_dim, and input_tensor_var_name so the check is placed between the
normalization block and the line that computes gt_dim/Emit.

In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 365-372: The code is calling CodeContext::GetTensorStruct without
guarding for missing mappings which triggers CHECK failures; update the three
sites (the blocks around the GetTensorStruct calls at the yield handling and the
other two mention spots) to first verify a mapping exists (e.g., via a new or
existing has/lookup method or by checking the pointer returned by
context_.GetPointer/other indicator) before calling context_.GetTensorStruct,
and only call context_.RegisterTensorStruct(return_var_name, yielded_struct)
when a valid yielded_struct is present; similarly ensure GetPointer is used
safely and RegisterPointer/ RegisterTensorStruct are only invoked when their
inputs are valid to avoid hard CHECKs on missing tensor structs.

In `@src/codegen/cce/code_context.cpp`:
- Around line 71-89: GetTensorStruct currently CHECKs and aborts when
tensor_to_struct_pointer_ has no entry, but the header/doc says it should fall
back to returning the input tensor name; change GetTensorStruct to return
tensor_var_name when tensor_to_struct_pointer_.find(tensor_var_name) == end()
instead of CHECKing, keeping the existing behavior of returning it->second when
present; no changes needed to RegisterTensorStruct other than ensuring it still
inserts into tensor_to_struct_pointer_.

---

Outside diff comments:
In `@src/codegen/cce/cce_codegen.cpp`:
- Around line 820-892: This function that takes const
std::optional<std::string>& base_pointer and const std::optional<std::string>&
tensor_struct_ptr must enforce the invariant that tensor_struct_ptr implies
base_pointer; add an INTERNAL_CHECK (similar to the existing checks for var_name
and tensor_type_) near the top of the function (after the two existing
INTERNAL_CHECK lines) that fails if tensor_struct_ptr.has_value() &&
!base_pointer.has_value() with a clear message like "Internal error:
tensor_struct_ptr provided without base_pointer" so we never emit an invalid
constructor like GlobalTensorType var(, {}, {...}).

---

Nitpick comments:
In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp`:
- Around line 43-67: The ComputeStrideBasedOffset function currently ignores
tensor_type and doesn't check that offsets (ir::MakeTuplePtr offsets) match the
tensor rank, which can produce invalid code; modify ComputeStrideBasedOffset to
use tensor_type (e.g., tensor_type->shape() or tensor_type->rank()) to validate
that offsets->elements_.size() equals the tensor rank and throw or log an error
if mismatched, and ensure you only iterate up to the validated rank when
building the offset_computation (referencing tensor_struct from
codegen.GetTensorStruct, codegen.GetExprAsCode, and the offsets vector).

In `@tests/st/codegen/test_add_mul_orch_cce_codegen.py`:
- Around line 139-146: The compute_expected method signature declares an unused
parameter params which triggers Ruff ARG002; to silence this, rename the
parameter to _params (or prefix it with an underscore) in the compute_expected
method definition (def compute_expected(self, tensors, _params=None)) so it
remains part of the PTOTestCase contract but is clearly intentional and unused;
ensure no other references to params exist in compute_expected and adjust any
internal docstring/comments if needed.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5aff958 and e0e9203.

📒 Files selected for processing (10)
  • include/pypto/codegen/cce/cce_codegen.h
  • include/pypto/codegen/cce/code_context.h
  • include/pypto/codegen/cce/type_converter.h
  • src/backend/910B_CCE/backend_910b_cce_ops.cpp
  • src/codegen/cce/cce_codegen.cpp
  • src/codegen/cce/code_context.cpp
  • src/codegen/cce/type_converter.cpp
  • src/codegen/orchestration/orchestration_codegen.cpp
  • tests/st/codegen/test_add_mul_orch_cce_codegen.py
  • tests/ut/codegen/test_type_converter.py
💤 Files with no reviewable changes (1)
  • include/pypto/codegen/cce/type_converter.h

Comment on lines +858 to +890
static std::string MakeTensorDimCodegenCCE(const ir::CallPtr& op, codegen::CodegenBase& codegen_base) {
auto& codegen = dynamic_cast<codegen::CCECodegen&>(codegen_base);
std::string target_var = codegen.GetCurrentResultTarget();
std::string input_var = codegen.GetExprAsCode(op->args_[0]);
int axis = codegen.GetConstIntValue(op->args_[1]);

auto input_tensor = ir::As<ir::TensorType>(op->args_[0]->GetType());
CHECK(input_tensor) << "tensor.dim need TensorType for first arg, but got "
<< op->args_[0]->GetType()->TypeName();
int ndims = input_tensor->shape_.size();
int pad_dims = 5 - ndims; // pto-isa pad shape to 5 dims

// get axis in GlobalTensor 5 dims
if (axis < 0) {
axis += ndims;
}
int gt_dim = pad_dims + axis;

// get GlobalTensor of input_tensor
auto input_tensor_var = ir::As<ir::Var>(op->args_[0]);
CHECK(input_tensor_var) << "tensor.dim need var with TensorType for first arg";
std::string input_tensor_var_name = codegen.GetVarName(input_tensor_var);

codegen.Emit("int " + target_var + " = " + input_tensor_var_name + ".GetShape(GlobalTensorDim::DIM_" +
std::to_string(gt_dim) + ");");
return "";
}

REGISTER_BACKEND_OP(Backend910B_CCE, "tensor.dim")
.set_pipe(ir::PipeType::S)
.f_codegen([](const ir::CallPtr& op, codegen::CodegenBase& codegen) {
return MakeTensorDimCodegenCCE(op, codegen);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add axis bounds checks for tensor.dim.

Out-of-range axis values will produce invalid GlobalTensorDim indices (or negative ones) and generate broken code. Guard the range before computing gt_dim.

🛡️ Suggested bounds validation
   int ndims = input_tensor->shape_.size();
   int pad_dims = 5 - ndims;  // pto-isa pad shape to 5 dims

+  CHECK(ndims > 0 && ndims <= 5) << "tensor.dim supports rank in [1, 5], got " << ndims;
+  CHECK(axis >= -ndims && axis < ndims) << "tensor.dim axis out of range: " << axis;
+
   // get axis in GlobalTensor 5 dims
   if (axis < 0) {
     axis += ndims;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/backend/910B_CCE/backend_910b_cce_ops.cpp` around lines 858 - 890,
MakeTensorDimCodegenCCE currently assumes the normalized axis is valid and can
produce invalid GlobalTensorDim indices; after you compute ndims and normalize
axis (the variable axis in MakeTensorDimCodegenCCE), add a bounds check ensuring
axis >= 0 && axis < ndims and fail fast if not (e.g., using CHECK or a clear
error message that includes axis and ndims) before computing gt_dim and emitting
the GetShape call; reference the symbols axis, ndims, gt_dim, and
input_tensor_var_name so the check is placed between the normalization block and
the line that computes gt_dim/Emit.

Comment on lines +365 to 372
// If the yielded value is a TensorType (GlobalTensor), inherit both pointer and Tensor struct mappings
if (std::dynamic_pointer_cast<const ir::TensorType>(return_var->GetType())) {
std::string yielded_ptr = context_.GetPointer(yielded_value);
context_.RegisterPointer(return_var_name, yielded_ptr);

std::string yielded_struct = context_.GetTensorStruct(yielded_value);
context_.RegisterTensorStruct(return_var_name, yielded_struct);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard GetTensorStruct to avoid hard CHECK failures.

CodeContext::GetTensorStruct CHECKs when no mapping exists. If any tensor reaches these paths without a registered struct pointer (e.g., static‑stride tensors or uninitialized temporaries), codegen will abort. Either enforce the invariant globally or guard these lookups.

🔧 Suggested guard (apply to all three sites)
-        std::string yielded_struct = context_.GetTensorStruct(yielded_value);
-        context_.RegisterTensorStruct(return_var_name, yielded_struct);
+        if (context_.HasTensorStruct(yielded_value)) {
+          std::string yielded_struct = context_.GetTensorStruct(yielded_value);
+          context_.RegisterTensorStruct(return_var_name, yielded_struct);
+        }

Also applies to: 395-403, 446-455

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/cce/cce_codegen.cpp` around lines 365 - 372, The code is calling
CodeContext::GetTensorStruct without guarding for missing mappings which
triggers CHECK failures; update the three sites (the blocks around the
GetTensorStruct calls at the yield handling and the other two mention spots) to
first verify a mapping exists (e.g., via a new or existing has/lookup method or
by checking the pointer returned by context_.GetPointer/other indicator) before
calling context_.GetTensorStruct, and only call
context_.RegisterTensorStruct(return_var_name, yielded_struct) when a valid
yielded_struct is present; similarly ensure GetPointer is used safely and
RegisterPointer/ RegisterTensorStruct are only invoked when their inputs are
valid to avoid hard CHECKs on missing tensor structs.

Comment on lines +71 to +89
void CodeContext::RegisterTensorStruct(const std::string& tensor_var_name,
const std::string& struct_ptr_name) {
CHECK(!tensor_var_name.empty()) << "Cannot register Tensor struct with empty tensor var name";
CHECK(!struct_ptr_name.empty()) << "Cannot register Tensor struct with empty pointer name";

auto it = tensor_to_struct_pointer_.find(tensor_var_name);
if (it != tensor_to_struct_pointer_.end()) {
LOG_WARN << "Tensor struct for tensor " << tensor_var_name << " re-registered with: " << struct_ptr_name
<< " vs " << it->second;
}
tensor_to_struct_pointer_[tensor_var_name] = struct_ptr_name;
}

std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const {
auto it = tensor_to_struct_pointer_.find(tensor_var_name);
CHECK(it != tensor_to_struct_pointer_.end())
<< "Tensor struct for tensor " << tensor_var_name << " not found";
return it->second;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align GetTensorStruct behavior with its documented fallback.

The header says a missing mapping should fall back to the input name, but this implementation hard CHECKs and aborts. Either update the docs or implement the fallback; the diff below matches the documented behavior.

♻️ Suggested alignment
 std::string CodeContext::GetTensorStruct(const std::string& tensor_var_name) const {
   auto it = tensor_to_struct_pointer_.find(tensor_var_name);
-  CHECK(it != tensor_to_struct_pointer_.end())
-      << "Tensor struct for tensor " << tensor_var_name << " not found";
-  return it->second;
+  if (it == tensor_to_struct_pointer_.end()) {
+    return tensor_var_name;
+  }
+  return it->second;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/cce/code_context.cpp` around lines 71 - 89, GetTensorStruct
currently CHECKs and aborts when tensor_to_struct_pointer_ has no entry, but the
header/doc says it should fall back to returning the input tensor name; change
GetTensorStruct to return tensor_var_name when
tensor_to_struct_pointer_.find(tensor_var_name) == end() instead of CHECKing,
keeping the existing behavior of returning it->second when present; no changes
needed to RegisterTensorStruct other than ensuring it still inserts into
tensor_to_struct_pointer_.

… codegen for tensor.dim

Implements dynamic stride-based offset computation for CCE codegen,
replacing compile-time row-major stride calculation with runtime
stride access from Tensor struct.

Key changes:
- Add Tensor struct pointer tracking in CodeContext
- Update block.load, block.store, block.l0c_store to use runtime strides
- Add codegen function for tensor.dim op
- Change GenerateStrideType to emit dynamic strides (-1)
- Fix orchestration codegen for pto2_rt_submit_task function call
- Update tests to reflect dynamic stride behavior
- Update tests/st/codegen/test_add_mul_orch_cce_codegen.py to use st framework for onboard test
@YunjiQin YunjiQin changed the title [WIP] feat(codegen): Add runtime stride-based tensor offset computation and… feat(codegen): Add runtime stride-based tensor offset computation and… Feb 24, 2026
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants