Refactor: share dump arg selection state by zmnobug · Pull Request #1069 · hw-native-sys/simpler

zmnobug · 2026-06-16T10:58:44Z

Summary

Centralize PTO2 profiling macro defaults in profiling_config.h
Move selective dump mask and scalar metadata bookkeeping into DumpArgSelection
Keep runtime Arg dump state gated by PTO2_PROFILING

Testing

conda run -n zm_pypto pre-commit run --files src/common/task_interface/profiling_config.h src/common/platform/include/aicpu/dump_arg_selection.h src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h src/a2a3/runtime/host_build_graph/runtime/pto_runtime2_types.h src/a5/runtime/host_build_graph/runtime/pto_runtime2_types.h src/a5/runtime/host_build_graph/runtime/tensor_info.h src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
cmake -B tests/ut/cpp/build -S tests/ut/cpp
cmake --build tests/ut/cpp/build --target a2a3_rt_objs a5_rt_objs
cmake -B /tmp/simpler-cpput-prof0 -S tests/ut/cpp -DCMAKE_CXX_FLAGS=-DPTO2_PROFILING=0
cmake --build /tmp/simpler-cpput-prof0 --target a2a3_rt_objs a5_rt_objs

Closes #1043

coderabbitai · 2026-06-16T10:58:57Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 580142c4-e702-447c-babe-776fa677bffd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Introduces profiling_config.h as a single canonical source for four PTO2_* profiling macros (with #error dependency enforcement) and DumpArgSelection as a new struct encapsulating dump-argument bitmask and scalar-slot metadata. All eight runtime headers across a5 and a2a3 drop their inline macro fallbacks in favor of #include "profiling_config.h", and both pto_types.h variants replace scattered inline profiling fields with a DumpArgSelection dump_arg_selection_ member.

Changes

Profiling Config Centralization and DumpArgSelection Extraction

Layer / File(s)	Summary
New shared headers: profiling_config and DumpArgSelection `src/common/task_interface/profiling_config.h`, `src/common/platform/include/aicpu/dump_arg_selection.h`	`profiling_config.h` defines `PTO2_PROFILING` (default `1`), `PTO2_ORCH_PROFILING`, `PTO2_SCHED_PROFILING`, `PTO2_TENSORMAP_PROFILING` (defaults `0`) with `#error` dependency checks. `dump_arg_selection.h` defines `DumpArgSelection` with a 64-bit selection mask, a 64-bit ambiguity mask, per-scalar-slot source pointers and dtypes, and public methods for marking, clearing, pointer-based scalar resolution, metadata recording, and dtype copying.
Runtime headers migrated to profiling_config.h `src/a2a3/runtime/host_build_graph/runtime/pto_runtime2_types.h`, `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h`, `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h`, `src/a5/runtime/host_build_graph/runtime/pto_runtime2_types.h`, `src/a5/runtime/host_build_graph/runtime/tensor_info.h`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h`	Each file replaces its inline `#ifndef PTO2_PROFILING /` #define`/`#endif`` fallback block (and where present, the full four-macro hierarchy with `#error` checks) with a single `#include "profiling_config.h"`.
Arg refactored onto DumpArgSelection (a2a3 and a5 pto_types.h) `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h`	Includes `aicpu/dump_arg_selection.h`; removes private fields `dump_arg_mask_`, `dump_arg_index_ambiguous_mask_`, `scalar_source_ptrs_[]`, `scalar_dtypes_[]` and adds `DumpArgSelection dump_arg_selection_`. All profiling code paths — `clear()`, `tensor_dump_arg_mask()`, `tensor_dump_arg_index_ambiguous_mask()`, `add_scalars`, `add_scalars_i32`, `add_scalar_one`, `copy_scalars_from`, `scalar_dtypes()`, `mark_all_dump_args`, `mark_dump_arg(Tensor)`, `mark_dump_arg(TensorCreateInfo)`, `mark_dump_arg(scalar)` — are rewired to call `dump_arg_selection_` methods.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hw-native-sys/simpler#953: Directly consumes Arg::tensor_dump_arg_mask() to enable selective dump mode, which this PR refactors to read from DumpArgSelection.
hw-native-sys/simpler#1028: Extends Arg::dump() marking for tensor vs. scalar lvalues and ambiguity handling — the exact logic this PR moves into DumpArgSelection.

Poem

🐇 Hopping through headers, I swept up the mess,
One profiling_config.h to rule all the rest!
DumpArgSelection born, tidy and trim,
No more mask fields scattered on a whim.
Eight headers rejoice, their clutter now gone —
The rabbit refactors, and hops gently on. 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.30% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Refactor: share dump arg selection state' clearly and concisely describes the main change—the extraction and sharing of dump argument selection functionality.
Description check	✅ Passed	The description provides a clear summary of the three main objectives: centralizing profiling macros, moving dump selection state, and maintaining runtime gating, plus testing instructions.
Linked Issues check	✅ Passed	The PR fully addresses both problems identified in `#1043`: centralizes PTO2_PROFILING macros in profiling_config.h and extracts dump/scalar-selection machinery into DumpArgSelection.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the `#1043` objectives: profiling macro centralization and dump arg selection extraction, with no unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors profiling and tensor dump configurations across multiple runtime directories by introducing a centralized profiling_config.h header and a new DumpArgSelection helper class to manage argument selection and masking. The code review feedback highlights several critical safety concerns in the new DumpArgSelection class, specifically pointing out potential undefined behavior from out-of-bounds bitwise shifts and several instances of missing bounds checks on indices and counts that could lead to out-of-bounds memory reads or writes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T11:00:33Z

+    uint64_t dump_arg_index_ambiguous_mask() const { return dump_arg_index_ambiguous_mask_; }
+    const uint8_t *scalar_dtypes() const { return scalar_dtypes_; }
+
+    void mark_index(int32_t index) { dump_arg_mask_ |= (uint64_t{1} << index); }


Shifting a 64-bit integer by an out-of-bounds index (negative or >= 64) is undefined behavior in C++. To prevent this, validate that index is within the safe range [0, 63] before performing the bitwise shift.

void mark_index(int32_t index) { if (index >= 0 && index < 64) { dump_arg_mask_ |= (uint64_t{1} << index); } }

Addressed in current head 9166ebb: mark_index now validates the argument with is_valid_arg_index(index) before doing the 64-bit shift.

gemini-code-assist · 2026-06-16T11:00:33Z

+    bool mark_scalar_by_ptr(uintptr_t ptr, int32_t scalar_count, int32_t tensor_offset) {
+        int32_t first_match = -1;


Validate that scalar_count is non-negative and does not exceed CORE_MAX_SCALAR_ARGS to prevent out-of-bounds reads from scalar_source_ptrs_.

bool mark_scalar_by_ptr(uintptr_t ptr, int32_t scalar_count, int32_t tensor_offset) { if (scalar_count < 0 || scalar_count > CORE_MAX_SCALAR_ARGS) { return false; } int32_t first_match = -1;

References

When using a signed integer for a count or size parameter in functions that perform bounds checks and memory operations, always validate that the count is non-negative to prevent bypassing bounds checks and causing out-of-bounds writes or signed-to-unsigned conversion overflows.

Addressed in current head 9166ebb: mark_scalar_by_ptr now validates scalar_count, tensor_offset, the arg span, and the computed arg_index before scanning or marking.

gemini-code-assist · 2026-06-16T11:00:33Z

+    void record_scalar_source(int32_t slot, uintptr_t ptr, uint8_t dtype) {
+        scalar_source_ptrs_[slot] = ptr;
+        scalar_dtypes_[slot] = dtype;
+    }


Validate that slot is within the bounds of CORE_MAX_SCALAR_ARGS to prevent out-of-bounds writes to scalar_source_ptrs_ and scalar_dtypes_.

void record_scalar_source(int32_t slot, uintptr_t ptr, uint8_t dtype) { if (slot >= 0 && slot < CORE_MAX_SCALAR_ARGS) { scalar_source_ptrs_[slot] = ptr; scalar_dtypes_[slot] = dtype; } }

Addressed in current head 9166ebb: record_scalar_source now returns unless slot is a valid single-slot scalar range.

gemini-code-assist · 2026-06-16T11:00:33Z

+    void copy_scalar_dtypes_from(const DumpArgSelection &src, int32_t dst_offset, int32_t src_offset, int32_t count) {
+        memcpy(&scalar_dtypes_[dst_offset], &src.scalar_dtypes_[src_offset], count * sizeof(uint8_t));
+        clear_scalar_sources(dst_offset, count);
+    }


Validate that count, dst_offset, and src_offset are non-negative and within the bounds of CORE_MAX_SCALAR_ARGS to prevent out-of-bounds memory operations during memcpy and clear_scalar_sources.

Suggested change

void copy_scalar_dtypes_from(const DumpArgSelection &src, int32_t dst_offset, int32_t src_offset, int32_t count) {

memcpy(&scalar_dtypes_[dst_offset], &src.scalar_dtypes_[src_offset], count * sizeof(uint8_t));

clear_scalar_sources(dst_offset, count);

}

void copy_scalar_dtypes_from(const DumpArgSelection &src, int32_t dst_offset, int32_t src_offset, int32_t count) {

if (count < 0 || dst_offset < 0 || src_offset < 0 ||

dst_offset + count > CORE_MAX_SCALAR_ARGS ||

src_offset + count > CORE_MAX_SCALAR_ARGS) {

return;

}

memcpy(&scalar_dtypes_[dst_offset], &src.scalar_dtypes_[src_offset], count * sizeof(uint8_t));

clear_scalar_sources(dst_offset, count);

}

References

When using a signed integer for a count or size parameter in functions that perform bounds checks and memory operations, always validate that the count is non-negative to prevent bypassing bounds checks and causing out-of-bounds writes or signed-to-unsigned conversion overflows.

Addressed in current head 9166ebb: copy_scalar_dtypes_from now validates both destination and source scalar ranges before memcpy and source clearing.

gemini-code-assist · 2026-06-16T11:00:34Z

+    }
+
+private:
+    void mark_index_ambiguous(int32_t index) { dump_arg_index_ambiguous_mask_ |= (uint64_t{1} << index); }


Shifting a 64-bit integer by an out-of-bounds index (negative or >= 64) is undefined behavior in C++. Validate that index is within the safe range [0, 63] before performing the bitwise shift.

void mark_index_ambiguous(int32_t index) { if (index >= 0 && index < 64) { dump_arg_index_ambiguous_mask_ |= (uint64_t{1} << index); } }

Addressed in current head 9166ebb: mark_index_ambiguous now validates the index before setting the ambiguity mask bit.

gemini-code-assist · 2026-06-16T11:00:34Z

+    void clear_scalar_sources(int32_t start, int32_t count) {
+        for (int32_t i = 0; i < count; i++) {
+            scalar_source_ptrs_[start + i] = 0;
+        }
+    }


Validate that start and count are non-negative and within the bounds of CORE_MAX_SCALAR_ARGS to prevent out-of-bounds writes to scalar_source_ptrs_.

void clear_scalar_sources(int32_t start, int32_t count) { if (count < 0 || start < 0 || start + count > CORE_MAX_SCALAR_ARGS) { return; } for (int32_t i = 0; i < count; i++) { scalar_source_ptrs_[start + i] = 0; } }

References

When using a signed integer for a count or size parameter in functions that perform bounds checks and memory operations, always validate that the count is non-negative to prevent bypassing bounds checks and causing out-of-bounds writes or signed-to-unsigned conversion overflows.

Addressed in current head 9166ebb: clear_scalar_sources(start, count) now validates the scalar range before writing scalar_source_ptrs_.

gemini-code-assist · 2026-06-16T11:00:34Z

+    void clear_scalar_dtypes(int32_t start, int32_t count) {
+        memset(&scalar_dtypes_[start], 0, count * sizeof(uint8_t));
+    }


Validate that start and count are non-negative and within the bounds of CORE_MAX_SCALAR_ARGS to prevent out-of-bounds writes to scalar_dtypes_ during memset.

Suggested change

void clear_scalar_dtypes(int32_t start, int32_t count) {

memset(&scalar_dtypes_[start], 0, count * sizeof(uint8_t));

}

void clear_scalar_dtypes(int32_t start, int32_t count) {

if (count < 0 || start < 0 || start + count > CORE_MAX_SCALAR_ARGS) {

return;

}

memset(&scalar_dtypes_[start], 0, count * sizeof(uint8_t));

}

References

When using a signed integer for a count or size parameter in functions that perform bounds checks and memory operations, always validate that the count is non-negative to prevent bypassing bounds checks and causing out-of-bounds writes or signed-to-unsigned conversion overflows.

Addressed in current head 9166ebb: clear_scalar_dtypes(start, count) now validates the scalar range before memset.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h (1)
392-403: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject negative src_offset in both copy_scalars_from() mirrors.

Both implementations only check count < 0 and src_offset + count > src.scalar_count_; a negative src_offset can pass and then index before src.scalars_ and src.dump_arg_selection_ metadata.

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h#L392-L403: add src_offset < 0 to the source-range validation before the memcpy and copy_scalar_dtypes_from() call.

src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h#L390-L401: apply the same lower-bound check in the mirror.
Proposed fix pattern
 void copy_scalars_from(const Arg &src, int src_offset, int count) {
-    if (count < 0 || src_offset + count > src.scalar_count_) {
+    if (src_offset < 0 || count < 0 || src_offset + count > src.scalar_count_) {
         set_error("Source scalar range out of bounds in copy_scalars_from");
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h` around lines
392 - 403, Both implementations of the copy_scalars_from() method have an
incomplete bounds check that allows negative src_offset values to pass through
and cause out-of-bounds memory access. In
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h (lines 392-403,
anchor), add a check for src_offset < 0 in the initial validation condition
alongside the existing count < 0 and src_offset + count > src.scalar_count_
checks. Apply the identical lower-bound check (src_offset < 0) to the mirror
implementation in src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
(lines 390-401, sibling) to ensure consistent validation across both copies of
the method.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/common/platform/include/aicpu/dump_arg_selection.h`:
- Around line 34-95: Add bounds checking to all index and range-based mutators
in the DumpArgSelection struct to prevent undefined behavior. In mark_index and
mark_index_ambiguous, validate that index is within [0, 63] before performing
the bit-shift operation. In mark_all, validate that tensor_count and
scalar_count are non-negative and that their sum does not exceed valid argument
limits. In mark_scalar_by_ptr, validate that scalar_count is non-negative,
tensor_offset is valid, and the computed arg_index is within [0, 63]. In
record_scalar_source, validate that slot is within [0, CORE_MAX_SCALAR_ARGS). In
clear_scalar_sources and clear_scalar_dtypes (both overloads), validate that
start is non-negative, count is non-negative, and start + count does not exceed
CORE_MAX_SCALAR_ARGS. In copy_scalar_dtypes_from, validate that dst_offset,
src_offset, and count are all non-negative and that both source and destination
ranges fit within the scalar arrays without overflow.

---

Outside diff comments:
In `@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h`:
- Around line 392-403: Both implementations of the copy_scalars_from() method
have an incomplete bounds check that allows negative src_offset values to pass
through and cause out-of-bounds memory access. In
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h (lines 392-403,
anchor), add a check for src_offset < 0 in the initial validation condition
alongside the existing count < 0 and src_offset + count > src.scalar_count_
checks. Apply the identical lower-bound check (src_offset < 0) to the mirror
implementation in src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
(lines 390-401, sibling) to ensure consistent validation across both copies of
the method.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0380cc61-e1fe-4642-a282-8ed187bf9bf9

📥 Commits

Reviewing files that changed from the base of the PR and between 19b2c0b and 55f4751.

📒 Files selected for processing (11)

src/a2a3/runtime/host_build_graph/runtime/pto_runtime2_types.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
src/a5/runtime/host_build_graph/runtime/pto_runtime2_types.h
src/a5/runtime/host_build_graph/runtime/tensor_info.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
src/common/platform/include/aicpu/dump_arg_selection.h
src/common/task_interface/profiling_config.h

- Centralize PTO2 profiling macro defaults in profiling_config.h - Move tensor/scalar dump selection metadata into DumpArgSelection - Reuse the shared helper from a2a3 and a5 Arg implementations - Update profiling and tensor dump docs for the new macro source

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread src/common/platform/include/aicpu/dump_arg_selection.h Outdated

zmnobug force-pushed the feature/selective-scalar-dump branch from e902a8d to 9166ebb Compare June 16, 2026 13:01

zmnobug force-pushed the feature/selective-scalar-dump branch from 9166ebb to 1266722 Compare June 17, 2026 07:32

		bool mark_scalar_by_ptr(uintptr_t ptr, int32_t scalar_count, int32_t tensor_offset) {
		int32_t first_match = -1;

Conversation

zmnobug commented Jun 16, 2026

Summary

Testing

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

zmnobug Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading