Rename dump tensor surface to dump args#1072
Conversation
📝 WalkthroughWalkthroughRenames the per-task diagnostic dump feature from "tensor dump" to "args dump" across the entire stack: the shared AICPU C API header and implementation, host collector output paths and manifest schema, ChangesArgs Dump Rename
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request renames the "tensor dump" feature to "args dump" across the codebase, updating documentation, CLI flags, output directories, artifact names, and internal APIs. The review feedback identifies a potential race condition and state leakage issue on AICPU due to the lazy allocation of global tables, suggesting eager allocation and cleanup during initialization and deinitialization to ensure thread safety and prevent memory leaks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| extern "C" void set_dump_args_enabled(bool enable) { | ||
| g_enable_dump_args = enable; | ||
| g_dump_args_level = DumpTensorLevel::OFF; | ||
| clear_dump_args_tables(); | ||
| } |
There was a problem hiding this comment.
There is a potential race condition and state leakage issue with the global tables g_dump_mask_table and g_dump_scalar_dtype_table on AICPU. Since task submission can be concurrent, lazy allocation in ensure_dump_args_mask_table and ensure_dump_args_scalar_dtype_table can lead to race conditions (double allocation/leaks or memory corruption).
Additionally, per the general rules, on platforms like AICPU where global variables persist across launches, we should explicitly initialize or reset these variables at the start of each execution to prevent state leakage.
We can solve both issues by eagerly allocating the tables during single-threaded initialization in set_dump_args_enabled when enable is true, and freeing them when enable is false.
extern "C" void set_dump_args_enabled(bool enable) {
g_enable_dump_args = enable;
g_dump_args_level = DumpTensorLevel::OFF;
if (enable) {
ensure_dump_args_mask_table();
ensure_dump_args_scalar_dtype_table();
} else {
if (g_dump_mask_table != nullptr) {
free(g_dump_mask_table);
g_dump_mask_table = nullptr;
}
if (g_dump_scalar_dtype_table != nullptr) {
free(g_dump_scalar_dtype_table);
g_dump_scalar_dtype_table = nullptr;
}
}
clear_dump_args_tables();
}References
- On platforms where static or global variables persist across launches on the same loaded binary (such as AICPU), always explicitly reset or initialize these variables at the start of each execution/launch to prevent state leakage and undefined behavior between runs.
1c48e2a to
cd63d98
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
.github/workflows/ci.yml (1)
1-14:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestrict default
GITHUB_TOKENpermissions for this workflow.Line 1 currently has no top-level
permissionsblock, so jobs inherit default token scopes. That leaves this CI pipeline broader than needed for PR validation.Suggested hardening
name: CI + +permissions: + contents: read + pull-requests: read🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 1 - 14, The CI workflow in .github/workflows/ci.yml lacks a top-level permissions block, which causes jobs to inherit default GITHUB_TOKEN scopes with broader access than needed. Add a top-level permissions block immediately after the name field (before the `on` section) to explicitly restrict token permissions to only what is required for pull request validation, such as read-only access to repository contents and write access only to pull request checks if needed.Source: Linters/SAST tools
src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp (1)
527-545:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd a hard upper-bound check for
num_dump_threadsbefore indexing fixed arrays.Line 544 iterates to
num_dump_threadsbut writes into static arrays sized byPLATFORM_MAX_AICPU_THREADS. Without validation, oversized thread counts can corrupt memory.Proposed patch
void dump_args_init(int num_dump_threads) { + if (num_dump_threads <= 0 || num_dump_threads > PLATFORM_MAX_AICPU_THREADS) { + LOG_ERROR( + "invalid num_dump_threads=%d (expected 1..%d)", num_dump_threads, PLATFORM_MAX_AICPU_THREADS + ); + return; + } + void *dump_base = reinterpret_cast<void *>(get_platform_dump_base()); if (dump_base == nullptr) { LOG_ERROR("platform dump base is NULL, cannot initialize args dump"); return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp` around lines 527 - 545, The function dump_args_init uses the num_dump_threads parameter to iterate and index into fixed-size arrays in the for loop (starting at line 544 with `for (int t = 0; t < num_dump_threads; t++)`), but the parameter is never validated against the maximum allowed threads (PLATFORM_MAX_AICPU_THREADS). Add a bounds check immediately after the existing dump_base null check to verify that num_dump_threads does not exceed PLATFORM_MAX_AICPU_THREADS, and if it does, log an error message and return early to prevent out-of-bounds array access.src/common/platform/include/aicpu/tensor_dump_aicpu.h (1)
123-130:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRename residual warning text to “args dump” for consistency.
Line 126, Line 256, and Line 271 still log
"tensor dump skipped...", which leaves user-visible output inconsistent with the renameddump_argssurface and makes log filtering harder.Proposed patch
- "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": active callable tensor args (%d) do not match payload tensor args (%d). " "Task-level dump assumes payload tensor args are concatenated by active subtask order.", @@ - "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": task args (%d) smaller than callable signature (%d)", @@ - "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": callable tensor args (%d) do not match registered tensor info (%d)",Also applies to: 255-258, 270-273
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h` around lines 123 - 130, The LOG_WARN message in the tensor dump validation block contains inconsistent terminology that references "tensor dump skipped" but should use "args dump" terminology to align with the renamed dump_args surface. Update the warning text at the current location (around the try_log_dump_args_layout_mismatch() call) and at the two other similar logging locations (around lines 255-258 and 270-273) to replace "tensor dump skipped" with "args dump skipped" to ensure consistent user-visible output and improve log filtering.src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)
878-901:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset
l2_swimlane_level_before the enablement branch to prevent cross-run state leak.
l2_swimlane_level_should be set toL2SwimlaneLevel::DISABLEDunconditionally beforeif (is_l2_swimlane_enabled()). Keeping reset only in theelsepath can leave stale state across launches in edge initialization flows.Suggested fix
`#if` PTO2_PROFILING + l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; // l2_swimlane_aicpu_init promotes g_l2_swimlane_level from the shared-memory // header — must be called BEFORE the orchestrator thread caches the level // via rt->orchestrator.l2_swimlane_level = get_l2_swimlane_level() in @@ if (is_l2_swimlane_enabled()) { l2_swimlane_aicpu_init(runtime->worker_count); l2_swimlane_level_ = get_l2_swimlane_level(); @@ - } else { - l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; - } + } `#endif`Based on learnings: “In
SchedulerContext::init()inside the scheduler cold-path implementation, ensurel2_swimlane_level_cannot retain a stale value across runs; reset toDISABLEDunconditionally before conditional logic.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp` around lines 878 - 901, The `l2_swimlane_level_` member variable is currently only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave stale state from previous initialization runs. Add an unconditional reset of `l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if (is_l2_swimlane_enabled())` check in the scheduler cold-path initialization logic. This ensures the variable always starts in a clean state regardless of which branch is taken, preventing cross-run state leakage in edge initialization flows.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/dfx/tensor-dump.md`:
- Around line 280-283: The relative links in docs/dfx/tensor-dump.md at lines
280, 282, and 410 are using single-level parent directory traversal (../) but
need to traverse two levels to correctly resolve to the tests directory at the
repository root. Change all instances of `../tests/` to `../../tests/` in the
markdown link references to fix the broken paths that currently would resolve to
the non-existent docs/tests/ location instead of the actual tests/ directory.
In `@docs/testing.md`:
- Line 108: The `--dump-args` option documentation in the table is incorrectly
showing `false` as the default value, but this option is level-based with
possible values 0, 1, 2, or 3, where the actual default is 0. Update the default
column for the `--dump-args` row to show `0` instead of `false` to accurately
document that this is a numeric level option, not a boolean flag.
In `@simpler_setup/scene_test.py`:
- Around line 1696-1698: The current code at the dump_args check appends only
the bare flag `--dump-args` to the common list without preserving its value.
When a user specifies a dump level like `--dump-args 2` or `--dump-args 3`, this
value is lost because subprocesses only receive the flag and argparse defaults
to const=1. Modify the condition around args.dump_args to check if the value is
greater than its default/const value and append both the flag and the actual
value to the common list to preserve the user's specified dump level in child
process invocations.
- Around line 1419-1421: The help text for the --dump-args argument is
incomplete and omits the level 3 option (full_json_only) from its description.
Update the help string in the add_argument call for --dump-args to include all
supported dump levels: add documentation for level 3 (full_json_only) to the
existing descriptions of levels 0 (off), 1 (partial), and 2 (full) so that the
help text accurately reflects all available options.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 1-14: The CI workflow in .github/workflows/ci.yml lacks a
top-level permissions block, which causes jobs to inherit default GITHUB_TOKEN
scopes with broader access than needed. Add a top-level permissions block
immediately after the name field (before the `on` section) to explicitly
restrict token permissions to only what is required for pull request validation,
such as read-only access to repository contents and write access only to pull
request checks if needed.
In
`@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 878-901: The `l2_swimlane_level_` member variable is currently
only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave
stale state from previous initialization runs. Add an unconditional reset of
`l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if
(is_l2_swimlane_enabled())` check in the scheduler cold-path initialization
logic. This ensures the variable always starts in a clean state regardless of
which branch is taken, preventing cross-run state leakage in edge initialization
flows.
In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h`:
- Around line 123-130: The LOG_WARN message in the tensor dump validation block
contains inconsistent terminology that references "tensor dump skipped" but
should use "args dump" terminology to align with the renamed dump_args surface.
Update the warning text at the current location (around the
try_log_dump_args_layout_mismatch() call) and at the two other similar logging
locations (around lines 255-258 and 270-273) to replace "tensor dump skipped"
with "args dump skipped" to ensure consistent user-visible output and improve
log filtering.
In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp`:
- Around line 527-545: The function dump_args_init uses the num_dump_threads
parameter to iterate and index into fixed-size arrays in the for loop (starting
at line 544 with `for (int t = 0; t < num_dump_threads; t++)`), but the
parameter is never validated against the maximum allowed threads
(PLATFORM_MAX_AICPU_THREADS). Add a bounds check immediately after the existing
dump_base null check to verify that num_dump_threads does not exceed
PLATFORM_MAX_AICPU_THREADS, and if it does, log an error message and return
early to prevent out-of-bounds array access.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5779ec94-992b-4fb9-97de-71161490d1e8
📒 Files selected for processing (31)
.github/workflows/ci.ymlconftest.pydocs/dfx/tensor-dump.mddocs/profiling-framework.mddocs/testing.mdsimpler_setup/scene_test.pysimpler_setup/tools/README.mdsimpler_setup/tools/dump_viewer.pysrc/a2a3/platform/onboard/aicpu/kernel.cppsrc/a2a3/platform/sim/host/device_runner.cppsrc/a2a3/platform/sim/host/device_runner.hsrc/a2a3/runtime/host_build_graph/aicpu/aicpu_executor.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.hsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/a5/platform/onboard/aicpu/kernel.cppsrc/a5/platform/sim/host/device_runner.cppsrc/a5/platform/sim/host/device_runner.hsrc/a5/runtime/host_build_graph/aicpu/aicpu_executor.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.hsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/common/platform/include/aicpu/tensor_dump_aicpu.hsrc/common/platform/include/host/tensor_dump_collector.hsrc/common/platform/shared/aicpu/tensor_dump_aicpu.cppsrc/common/platform/shared/host/tensor_dump_collector.cpptests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py
| [`tests/st/a5/host_build_graph/dump_tensor`](../tests/st/a5/host_build_graph/dump_tensor/) | ||
| (and the `a2a3` mirror at | ||
| `tests/st/a2a3/host_build_graph/dump_tensor_example`). | ||
| `tests/st/a2a3/host_build_graph/dump_tensor`). | ||
|
|
There was a problem hiding this comment.
Fix broken relative links to test paths in docs.
From docs/dfx/tensor-dump.md, Line 280/282/410 should step up two levels to reach tests/. Current ../tests/... links likely resolve to docs/tests/....
Suggested fix
-[`tests/st/a5/host_build_graph/dump_tensor`](../tests/st/a5/host_build_graph/dump_tensor/)
+[`tests/st/a5/host_build_graph/dump_tensor`](../../tests/st/a5/host_build_graph/dump_tensor/)
...
-`tests/st/a2a3/host_build_graph/dump_tensor`).
+`tests/st/a2a3/host_build_graph/dump_tensor`).
...
-[`dump_tensor_orch.cpp`](../tests/st/a5/host_build_graph/dump_tensor/kernels/orchestration/dump_tensor_orch.cpp)
+[`dump_tensor_orch.cpp`](../../tests/st/a5/host_build_graph/dump_tensor/kernels/orchestration/dump_tensor_orch.cpp)Also applies to: 410-411
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/dfx/tensor-dump.md` around lines 280 - 283, The relative links in
docs/dfx/tensor-dump.md at lines 280, 282, and 410 are using single-level parent
directory traversal (../) but need to traverse two levels to correctly resolve
to the tests directory at the repository root. Change all instances of
`../tests/` to `../../tests/` in the markdown link references to fix the broken
paths that currently would resolve to the non-existent docs/tests/ location
instead of the actual tests/ directory.
| | `--skip-golden` | | false | Skip golden comparison (for benchmarking) | | ||
| | `--enable-l2-swimlane [PERF_LEVEL]` | | `0` | Enable L2 swimlane collection on first round only. The flag takes an integer perf_level 0–4 (bare = 4); see [docs/dfx/l2-swimlane-profiling.md](dfx/l2-swimlane-profiling.md#31-enable-l2-swimlane) for the level table. Each test case gets its own `outputs/<case>_<ts>/` directory under which `l2_swimlane_records.json` lands; parallel runs never collide. | | ||
| | `--dump-tensor` | | false | Dump tensors plus scalar args into unified runtime artifacts | | ||
| | `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts | |
There was a problem hiding this comment.
Document the real default for --dump-args as 0, not false.
Line 108 currently states a boolean default, but this option is level-based (0/1/2/3) with default 0.
Suggested fix
-| `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts |
+| `--dump-args` | | `0` | Dump tensors plus scalar args into unified runtime artifacts (bare flag = `1`; supports `0/1/2/3`) |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts | | |
| | `--dump-args` | | `0` | Dump tensors plus scalar args into unified runtime artifacts (bare flag = `1`; supports `0/1/2/3`) | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/testing.md` at line 108, The `--dump-args` option documentation in the
table is incorrectly showing `false` as the default value, but this option is
level-based with possible values 0, 1, 2, or 3, where the actual default is 0.
Update the default column for the `--dump-args` row to show `0` instead of
`false` to accurately document that this is a numeric level option, not a
boolean flag.
| help="Dump per-task args at runtime. Level: 0=off, 1=partial (only " | ||
| "tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).", | ||
| ) |
There was a problem hiding this comment.
Align standalone --dump-args help text with supported levels.
Line 1419 help omits level 3 (full_json_only), which makes standalone UX inconsistent with the pytest surface and docs.
Suggested fix
- help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
- "tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).",
+ help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
+ "tasks marked via Arg::dump(...), default when given without a value), "
+ "2=full (all tasks), 3=full_json_only (all tasks, JSON metadata only, no .bin payload).",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| help="Dump per-task args at runtime. Level: 0=off, 1=partial (only " | |
| "tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).", | |
| ) | |
| help="Dump per-task args at runtime. Level: 0=off, 1=partial (only " | |
| "tasks marked via Arg::dump(...), default when given without a value), " | |
| "2=full (all tasks), 3=full_json_only (all tasks, JSON metadata only, no .bin payload).", | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@simpler_setup/scene_test.py` around lines 1419 - 1421, The help text for the
--dump-args argument is incomplete and omits the level 3 option (full_json_only)
from its description. Update the help string in the add_argument call for
--dump-args to include all supported dump levels: add documentation for level 3
(full_json_only) to the existing descriptions of levels 0 (off), 1 (partial),
and 2 (full) so that the help text accurately reflects all available options.
| if args.dump_args: | ||
| common.append("--dump-args") | ||
| if args.enable_dep_gen: |
There was a problem hiding this comment.
Preserve the selected dump level when forwarding --dump-args to child runs.
Line 1697 currently passes a bare flag, so argparse resolves it to const=1 in subprocesses. A parent invocation like --dump-args 2/3 is silently downgraded to level 1.
Suggested fix
- if args.dump_args:
- common.append("--dump-args")
+ if args.dump_args:
+ common += ["--dump-args", str(args.dump_args)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@simpler_setup/scene_test.py` around lines 1696 - 1698, The current code at
the dump_args check appends only the bare flag `--dump-args` to the common list
without preserving its value. When a user specifies a dump level like
`--dump-args 2` or `--dump-args 3`, this value is lost because subprocesses only
receive the flag and argparse defaults to const=1. Modify the condition around
args.dump_args to check if the value is greater than its default/const value and
append both the flag and the actual value to the common list to preserve the
user's specified dump level in child process invocations.
cd63d98 to
20891d2
Compare
Align the focused PR792/PR966 dump flow on the new dump-args naming across CLI, runtime artifacts, viewer, tests, CI, and user-facing docs while keeping the underlying compatibility fields intact.
20891d2 to
5155e6a
Compare
Summary
dump tensortodump argsacross the CLI, runtime artifacts, viewer, and manifest schemadlsym, and onboard entrypoints with the newdump_argssymbol surface while keeping the underlying compatibility fields intactargs_dumpcontractTest plan
python -m py_compile conftest.py simpler_setup/scene_test.py simpler_setup/tools/dump_viewer.py tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.pypytest tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py --platform a2a3sim --device 0-15 -p no:xdist --pto-session-timeout 600 --pto-isa-commit ddafa8da9c760ecd13fe9fe2833d6ee55fb20bd8 --dump-args --clone-protocol https🤖 Generated with Claude Code