Skip to content

Rename dump tensor surface to dump args#1072

Open
vegetabledoww wants to merge 1 commit into
hw-native-sys:mainfrom
vegetabledoww:feature/rename-dump-args-clean
Open

Rename dump tensor surface to dump args#1072
vegetabledoww wants to merge 1 commit into
hw-native-sys:mainfrom
vegetabledoww:feature/rename-dump-args-clean

Conversation

@vegetabledoww

@vegetabledoww vegetabledoww commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Test plan

  • python -m py_compile conftest.py simpler_setup/scene_test.py simpler_setup/tools/dump_viewer.py tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py
  • pytest tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py --platform a2a3sim --device 0-15 -p no:xdist --pto-session-timeout 600 --pto-isa-commit ddafa8da9c760ecd13fe9fe2833d6ee55fb20bd8 --dump-args --clone-protocol https

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Renames the per-task diagnostic dump feature from "tensor dump" to "args dump" across the entire stack: the shared AICPU C API header and implementation, host collector output paths and manifest schema, Arg type internals in both a2a3 and a5 runtimes, device-runner dlsym wiring, scheduler and executor call sites, Python tooling (dump_viewer.py, scene_test.py), CI workflow flags, and all documentation.

Changes

Args Dump Rename

Layer / File(s) Summary
Shared AICPU dump API rename (header + implementation)
src/common/platform/include/aicpu/tensor_dump_aicpu.h, src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp
Renames 14 exported C-linkage functions (set_dump_tensor_enabledset_dump_args_enabled, dump_tensor_initdump_args_init, dump_tensor_recorddump_arg_record, dump_tensor_flushdump_args_flush, etc.) and all internal globals, table allocators, predicates, and arena helpers from tensor to args naming.
Host collector output path and manifest schema rename
src/common/platform/include/host/tensor_dump_collector.h, src/common/platform/shared/host/tensor_dump_collector.cpp
Changes run directory to args_dump/, binary payload file to args.bin, manifest filename to args_dump.json, and all JSON counter keys from *_tensors to *_args. Header changes are comment-only; all logic changes are in the .cpp.
Arg::dump_arg_mask rename in pto_types.h (a2a3 and a5)
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h, src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
Renames private mask field, mark_all_tensor_dump_arg/mark_tensor_dump_arg helpers, and the public tensor_dump_arg_mask() accessor to dump_arg_mask_, mark_all_dump_args/mark_dump_arg, and dump_arg_mask() in both platform copies.
Platform device-runner and AICPU kernel wiring (a2a3 and a5)
src/a2a3/platform/sim/host/device_runner.h, src/a2a3/platform/sim/host/device_runner.cpp, src/a2a3/platform/onboard/aicpu/kernel.cpp, src/a5/platform/sim/host/device_runner.h, src/a5/platform/sim/host/device_runner.cpp, src/a5/platform/onboard/aicpu/kernel.cpp
Replaces the dlsym'd set_dump_tensor_enabled_func_ function pointer with set_dump_args_enabled_func_ in both headers, and updates ensure_binaries_loaded, run() null-check/invocation, unload_executor_binaries, and on-device kernel setup for both a2a3 and a5.
AICPU executor and orchestrator dump call sites (a2a3 and a5)
src/a2a3/runtime/host_build_graph/aicpu/aicpu_executor.cpp, src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp, src/a5/runtime/host_build_graph/aicpu/aicpu_executor.cpp, src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
Switches dump_tensor_init/dump_tensors_for_task/dump_tensor_flush and orchestrator set_dump_tensor_task_*/args.tensor_dump_arg_mask() calls to their args-dump equivalents at all init, dispatch, completion, and flush sites for both platforms.
Scheduler dispatch, completion, cold-path dump call sites (a2a3 and a5)
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp, src/a2a3/.../scheduler_completion.cpp, src/a2a3/.../scheduler_cold_path.cpp, src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp, src/a5/.../scheduler_completion.cpp, src/a5/.../scheduler_cold_path.cpp
Updates all scheduler dump gate predicates and call sites from is_dump_tensor_enabled()/dump_tensors_for_task/dump_tensor_init/dump_tensor_flush to is_dump_args_enabled()/dump_args_for_task/dump_args_init/dump_args_flush across BEFORE_DISPATCH, AFTER_COMPLETION, init, flush, and timeout paths for both platforms.
Python dump viewer and scene test framework rename
conftest.py, simpler_setup/scene_test.py, simpler_setup/tools/dump_viewer.py, simpler_setup/tools/README.md
Renames --dump-tensor to --dump-args in conftest.py and all scene_test.py method signatures (enable_dump_tensorenable_dump_args). Converts dump_viewer.py from tensor to args naming (read_arg_data, arg_filename, write_arg, export_arg, list_args), auto-detects args_dump/, loads args_dump.json, and falls back to manifest["tensors"] for backward compatibility.
Tests, CI, and documentation
tests/st/a2a3/.../test_tensor_dump.py, .github/workflows/ci.yml, docs/dfx/tensor-dump.md, docs/testing.md, docs/profiling-framework.md
Smoke test now validates args_dump/args_dump.json + args.bin and asserts tensor_dump.json is absent. CI pytest steps for both sim and onboard a2a3 jobs updated to --dump-args. All documentation updated to --dump-args, args_dump/, and the new manifest schema.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/simpler#953: Modifies the same Arg::dump()/tensor_dump_arg_mask() selective dump inference and orchestrator gating that this PR renames to dump_arg_mask() and args-dump APIs.
  • hw-native-sys/simpler#966: Touches the dump viewer's scalar kind/value presentation in dump_viewer.py, the same file this PR converts from tensor to args naming.
  • hw-native-sys/simpler#1028: Modifies Arg::dump(...) selective scalar/lvalue capture machinery and the unified dump_arg_mask_ that this PR introduces under the new naming.

Poem

🐇 A rename hops through files galore,
--dump-tensor knocks no more,
--dump-args leaps from CLI to .h,
args_dump/ blooms where tensors once lay,
The rabbit tidies names with cheer —
consistent APIs, crystal clear! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Rename dump tensor surface to dump args' clearly and specifically describes the main change: renaming the public surface terminology from 'dump tensor' to 'dump args'.
Description check ✅ Passed The PR description is well-related to the changeset, providing a clear summary of the rename scope, test plan, and reference to the closed issue #1075.
Linked Issues check ✅ Passed The code changes comprehensively address all objectives from issue #1075: CLI flags (#--dump-tensor to --dump-args), output artifacts (tensor_dump/ to args_dump/), tooling (dump_viewer updated), manifests (tensor_dump.json to args_dump.json), tests, CI workflows, and documentation are all renamed as required.
Out of Scope Changes check ✅ Passed All changes are directly related to the rename objective. Updates to runtime symbols, internal helper functions, private members, CI workflows, documentation, and tests all serve the stated purpose of renaming dump-tensor to dump-args across user-facing surfaces.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feature/rename-dump-args-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request renames the "tensor dump" feature to "args dump" across the codebase, updating documentation, CLI flags, output directories, artifact names, and internal APIs. The review feedback identifies a potential race condition and state leakage issue on AICPU due to the lazy allocation of global tables, suggesting eager allocation and cleanup during initialization and deinitialization to ensure thread safety and prevent memory leaks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +188 to 192
extern "C" void set_dump_args_enabled(bool enable) {
g_enable_dump_args = enable;
g_dump_args_level = DumpTensorLevel::OFF;
clear_dump_args_tables();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential race condition and state leakage issue with the global tables g_dump_mask_table and g_dump_scalar_dtype_table on AICPU. Since task submission can be concurrent, lazy allocation in ensure_dump_args_mask_table and ensure_dump_args_scalar_dtype_table can lead to race conditions (double allocation/leaks or memory corruption).

Additionally, per the general rules, on platforms like AICPU where global variables persist across launches, we should explicitly initialize or reset these variables at the start of each execution to prevent state leakage.

We can solve both issues by eagerly allocating the tables during single-threaded initialization in set_dump_args_enabled when enable is true, and freeing them when enable is false.

extern "C" void set_dump_args_enabled(bool enable) {
    g_enable_dump_args = enable;
    g_dump_args_level = DumpTensorLevel::OFF;
    if (enable) {
        ensure_dump_args_mask_table();
        ensure_dump_args_scalar_dtype_table();
    } else {
        if (g_dump_mask_table != nullptr) {
            free(g_dump_mask_table);
            g_dump_mask_table = nullptr;
        }
        if (g_dump_scalar_dtype_table != nullptr) {
            free(g_dump_scalar_dtype_table);
            g_dump_scalar_dtype_table = nullptr;
        }
    }
    clear_dump_args_tables();
}
References
  1. On platforms where static or global variables persist across launches on the same loaded binary (such as AICPU), always explicitly reset or initialize these variables at the start of each execution/launch to prevent state leakage and undefined behavior between runs.

@vegetabledoww vegetabledoww force-pushed the feature/rename-dump-args-clean branch from 1c48e2a to cd63d98 Compare June 17, 2026 03:09

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
.github/workflows/ci.yml (1)

1-14: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict default GITHUB_TOKEN permissions for this workflow.

Line 1 currently has no top-level permissions block, so jobs inherit default token scopes. That leaves this CI pipeline broader than needed for PR validation.

Suggested hardening
 name: CI
+
+permissions:
+  contents: read
+  pull-requests: read
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 1 - 14, The CI workflow in
.github/workflows/ci.yml lacks a top-level permissions block, which causes jobs
to inherit default GITHUB_TOKEN scopes with broader access than needed. Add a
top-level permissions block immediately after the name field (before the `on`
section) to explicitly restrict token permissions to only what is required for
pull request validation, such as read-only access to repository contents and
write access only to pull request checks if needed.

Source: Linters/SAST tools

src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp (1)

527-545: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add a hard upper-bound check for num_dump_threads before indexing fixed arrays.

Line 544 iterates to num_dump_threads but writes into static arrays sized by PLATFORM_MAX_AICPU_THREADS. Without validation, oversized thread counts can corrupt memory.

Proposed patch
 void dump_args_init(int num_dump_threads) {
+    if (num_dump_threads <= 0 || num_dump_threads > PLATFORM_MAX_AICPU_THREADS) {
+        LOG_ERROR(
+            "invalid num_dump_threads=%d (expected 1..%d)", num_dump_threads, PLATFORM_MAX_AICPU_THREADS
+        );
+        return;
+    }
+
     void *dump_base = reinterpret_cast<void *>(get_platform_dump_base());
     if (dump_base == nullptr) {
         LOG_ERROR("platform dump base is NULL, cannot initialize args dump");
         return;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp` around lines 527 -
545, The function dump_args_init uses the num_dump_threads parameter to iterate
and index into fixed-size arrays in the for loop (starting at line 544 with `for
(int t = 0; t < num_dump_threads; t++)`), but the parameter is never validated
against the maximum allowed threads (PLATFORM_MAX_AICPU_THREADS). Add a bounds
check immediately after the existing dump_base null check to verify that
num_dump_threads does not exceed PLATFORM_MAX_AICPU_THREADS, and if it does, log
an error message and return early to prevent out-of-bounds array access.
src/common/platform/include/aicpu/tensor_dump_aicpu.h (1)

123-130: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Rename residual warning text to “args dump” for consistency.

Line 126, Line 256, and Line 271 still log "tensor dump skipped...", which leaves user-visible output inconsistent with the renamed dump_args surface and makes log filtering harder.

Proposed patch
-                "Thread %d: tensor dump skipped for task 0x%" PRIx64
+                "Thread %d: args dump skipped for task 0x%" PRIx64
                 ": active callable tensor args (%d) do not match payload tensor args (%d). "
                 "Task-level dump assumes payload tensor args are concatenated by active subtask order.",
@@
-                "Thread %d: tensor dump skipped for task 0x%" PRIx64
+                "Thread %d: args dump skipped for task 0x%" PRIx64
                 ": task args (%d) smaller than callable signature (%d)",
@@
-                "Thread %d: tensor dump skipped for task 0x%" PRIx64
+                "Thread %d: args dump skipped for task 0x%" PRIx64
                 ": callable tensor args (%d) do not match registered tensor info (%d)",

Also applies to: 255-258, 270-273

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h` around lines 123 -
130, The LOG_WARN message in the tensor dump validation block contains
inconsistent terminology that references "tensor dump skipped" but should use
"args dump" terminology to align with the renamed dump_args surface. Update the
warning text at the current location (around the
try_log_dump_args_layout_mismatch() call) and at the two other similar logging
locations (around lines 255-258 and 270-273) to replace "tensor dump skipped"
with "args dump skipped" to ensure consistent user-visible output and improve
log filtering.
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)

878-901: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset l2_swimlane_level_ before the enablement branch to prevent cross-run state leak.

l2_swimlane_level_ should be set to L2SwimlaneLevel::DISABLED unconditionally before if (is_l2_swimlane_enabled()). Keeping reset only in the else path can leave stale state across launches in edge initialization flows.

Suggested fix
 `#if` PTO2_PROFILING
+    l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;
     // l2_swimlane_aicpu_init promotes g_l2_swimlane_level from the shared-memory
     // header — must be called BEFORE the orchestrator thread caches the level
     // via rt->orchestrator.l2_swimlane_level = get_l2_swimlane_level() in
@@
     if (is_l2_swimlane_enabled()) {
         l2_swimlane_aicpu_init(runtime->worker_count);
         l2_swimlane_level_ = get_l2_swimlane_level();
@@
-    } else {
-        l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;
-    }
+    }
 `#endif`

Based on learnings: “In SchedulerContext::init() inside the scheduler cold-path implementation, ensure l2_swimlane_level_ cannot retain a stale value across runs; reset to DISABLED unconditionally before conditional logic.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`
around lines 878 - 901, The `l2_swimlane_level_` member variable is currently
only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave
stale state from previous initialization runs. Add an unconditional reset of
`l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if
(is_l2_swimlane_enabled())` check in the scheduler cold-path initialization
logic. This ensures the variable always starts in a clean state regardless of
which branch is taken, preventing cross-run state leakage in edge initialization
flows.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/dfx/tensor-dump.md`:
- Around line 280-283: The relative links in docs/dfx/tensor-dump.md at lines
280, 282, and 410 are using single-level parent directory traversal (../) but
need to traverse two levels to correctly resolve to the tests directory at the
repository root. Change all instances of `../tests/` to `../../tests/` in the
markdown link references to fix the broken paths that currently would resolve to
the non-existent docs/tests/ location instead of the actual tests/ directory.

In `@docs/testing.md`:
- Line 108: The `--dump-args` option documentation in the table is incorrectly
showing `false` as the default value, but this option is level-based with
possible values 0, 1, 2, or 3, where the actual default is 0. Update the default
column for the `--dump-args` row to show `0` instead of `false` to accurately
document that this is a numeric level option, not a boolean flag.

In `@simpler_setup/scene_test.py`:
- Around line 1696-1698: The current code at the dump_args check appends only
the bare flag `--dump-args` to the common list without preserving its value.
When a user specifies a dump level like `--dump-args 2` or `--dump-args 3`, this
value is lost because subprocesses only receive the flag and argparse defaults
to const=1. Modify the condition around args.dump_args to check if the value is
greater than its default/const value and append both the flag and the actual
value to the common list to preserve the user's specified dump level in child
process invocations.
- Around line 1419-1421: The help text for the --dump-args argument is
incomplete and omits the level 3 option (full_json_only) from its description.
Update the help string in the add_argument call for --dump-args to include all
supported dump levels: add documentation for level 3 (full_json_only) to the
existing descriptions of levels 0 (off), 1 (partial), and 2 (full) so that the
help text accurately reflects all available options.

---

Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 1-14: The CI workflow in .github/workflows/ci.yml lacks a
top-level permissions block, which causes jobs to inherit default GITHUB_TOKEN
scopes with broader access than needed. Add a top-level permissions block
immediately after the name field (before the `on` section) to explicitly
restrict token permissions to only what is required for pull request validation,
such as read-only access to repository contents and write access only to pull
request checks if needed.

In
`@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 878-901: The `l2_swimlane_level_` member variable is currently
only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave
stale state from previous initialization runs. Add an unconditional reset of
`l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if
(is_l2_swimlane_enabled())` check in the scheduler cold-path initialization
logic. This ensures the variable always starts in a clean state regardless of
which branch is taken, preventing cross-run state leakage in edge initialization
flows.

In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h`:
- Around line 123-130: The LOG_WARN message in the tensor dump validation block
contains inconsistent terminology that references "tensor dump skipped" but
should use "args dump" terminology to align with the renamed dump_args surface.
Update the warning text at the current location (around the
try_log_dump_args_layout_mismatch() call) and at the two other similar logging
locations (around lines 255-258 and 270-273) to replace "tensor dump skipped"
with "args dump skipped" to ensure consistent user-visible output and improve
log filtering.

In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp`:
- Around line 527-545: The function dump_args_init uses the num_dump_threads
parameter to iterate and index into fixed-size arrays in the for loop (starting
at line 544 with `for (int t = 0; t < num_dump_threads; t++)`), but the
parameter is never validated against the maximum allowed threads
(PLATFORM_MAX_AICPU_THREADS). Add a bounds check immediately after the existing
dump_base null check to verify that num_dump_threads does not exceed
PLATFORM_MAX_AICPU_THREADS, and if it does, log an error message and return
early to prevent out-of-bounds array access.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5779ec94-992b-4fb9-97de-71161490d1e8

📥 Commits

Reviewing files that changed from the base of the PR and between 2b3ee17 and cd63d98.

📒 Files selected for processing (31)
  • .github/workflows/ci.yml
  • conftest.py
  • docs/dfx/tensor-dump.md
  • docs/profiling-framework.md
  • docs/testing.md
  • simpler_setup/scene_test.py
  • simpler_setup/tools/README.md
  • simpler_setup/tools/dump_viewer.py
  • src/a2a3/platform/onboard/aicpu/kernel.cpp
  • src/a2a3/platform/sim/host/device_runner.cpp
  • src/a2a3/platform/sim/host/device_runner.h
  • src/a2a3/runtime/host_build_graph/aicpu/aicpu_executor.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/a5/platform/onboard/aicpu/kernel.cpp
  • src/a5/platform/sim/host/device_runner.cpp
  • src/a5/platform/sim/host/device_runner.h
  • src/a5/runtime/host_build_graph/aicpu/aicpu_executor.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
  • src/common/platform/include/aicpu/tensor_dump_aicpu.h
  • src/common/platform/include/host/tensor_dump_collector.h
  • src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp
  • src/common/platform/shared/host/tensor_dump_collector.cpp
  • tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py

Comment thread docs/dfx/tensor-dump.md
Comment on lines +280 to 283
[`tests/st/a5/host_build_graph/dump_tensor`](../tests/st/a5/host_build_graph/dump_tensor/)
(and the `a2a3` mirror at
`tests/st/a2a3/host_build_graph/dump_tensor_example`).
`tests/st/a2a3/host_build_graph/dump_tensor`).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix broken relative links to test paths in docs.

From docs/dfx/tensor-dump.md, Line 280/282/410 should step up two levels to reach tests/. Current ../tests/... links likely resolve to docs/tests/....

Suggested fix
-[`tests/st/a5/host_build_graph/dump_tensor`](../tests/st/a5/host_build_graph/dump_tensor/)
+[`tests/st/a5/host_build_graph/dump_tensor`](../../tests/st/a5/host_build_graph/dump_tensor/)
 ...
-`tests/st/a2a3/host_build_graph/dump_tensor`).
+`tests/st/a2a3/host_build_graph/dump_tensor`).
...
-[`dump_tensor_orch.cpp`](../tests/st/a5/host_build_graph/dump_tensor/kernels/orchestration/dump_tensor_orch.cpp)
+[`dump_tensor_orch.cpp`](../../tests/st/a5/host_build_graph/dump_tensor/kernels/orchestration/dump_tensor_orch.cpp)

Also applies to: 410-411

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/dfx/tensor-dump.md` around lines 280 - 283, The relative links in
docs/dfx/tensor-dump.md at lines 280, 282, and 410 are using single-level parent
directory traversal (../) but need to traverse two levels to correctly resolve
to the tests directory at the repository root. Change all instances of
`../tests/` to `../../tests/` in the markdown link references to fix the broken
paths that currently would resolve to the non-existent docs/tests/ location
instead of the actual tests/ directory.

Comment thread docs/testing.md
| `--skip-golden` | | false | Skip golden comparison (for benchmarking) |
| `--enable-l2-swimlane [PERF_LEVEL]` | | `0` | Enable L2 swimlane collection on first round only. The flag takes an integer perf_level 0–4 (bare = 4); see [docs/dfx/l2-swimlane-profiling.md](dfx/l2-swimlane-profiling.md#31-enable-l2-swimlane) for the level table. Each test case gets its own `outputs/<case>_<ts>/` directory under which `l2_swimlane_records.json` lands; parallel runs never collide. |
| `--dump-tensor` | | false | Dump tensors plus scalar args into unified runtime artifacts |
| `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the real default for --dump-args as 0, not false.

Line 108 currently states a boolean default, but this option is level-based (0/1/2/3) with default 0.

Suggested fix
-| `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts |
+| `--dump-args` | | `0` | Dump tensors plus scalar args into unified runtime artifacts (bare flag = `1`; supports `0/1/2/3`) |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `--dump-args` | | false | Dump tensors plus scalar args into unified runtime artifacts |
| `--dump-args` | | `0` | Dump tensors plus scalar args into unified runtime artifacts (bare flag = `1`; supports `0/1/2/3`) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/testing.md` at line 108, The `--dump-args` option documentation in the
table is incorrectly showing `false` as the default value, but this option is
level-based with possible values 0, 1, 2, or 3, where the actual default is 0.
Update the default column for the `--dump-args` row to show `0` instead of
`false` to accurately document that this is a numeric level option, not a
boolean flag.

Comment on lines +1419 to 1421
help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
"tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align standalone --dump-args help text with supported levels.

Line 1419 help omits level 3 (full_json_only), which makes standalone UX inconsistent with the pytest surface and docs.

Suggested fix
-            help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
-            "tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).",
+            help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
+            "tasks marked via Arg::dump(...), default when given without a value), "
+            "2=full (all tasks), 3=full_json_only (all tasks, JSON metadata only, no .bin payload).",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
"tasks marked via Arg::dump(...), default when given without a value), 2=full (all tasks).",
)
help="Dump per-task args at runtime. Level: 0=off, 1=partial (only "
"tasks marked via Arg::dump(...), default when given without a value), "
"2=full (all tasks), 3=full_json_only (all tasks, JSON metadata only, no .bin payload).",
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@simpler_setup/scene_test.py` around lines 1419 - 1421, The help text for the
--dump-args argument is incomplete and omits the level 3 option (full_json_only)
from its description. Update the help string in the add_argument call for
--dump-args to include all supported dump levels: add documentation for level 3
(full_json_only) to the existing descriptions of levels 0 (off), 1 (partial),
and 2 (full) so that the help text accurately reflects all available options.

Comment on lines +1696 to 1698
if args.dump_args:
common.append("--dump-args")
if args.enable_dep_gen:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the selected dump level when forwarding --dump-args to child runs.

Line 1697 currently passes a bare flag, so argparse resolves it to const=1 in subprocesses. A parent invocation like --dump-args 2/3 is silently downgraded to level 1.

Suggested fix
-    if args.dump_args:
-        common.append("--dump-args")
+    if args.dump_args:
+        common += ["--dump-args", str(args.dump_args)]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@simpler_setup/scene_test.py` around lines 1696 - 1698, The current code at
the dump_args check appends only the bare flag `--dump-args` to the common list
without preserving its value. When a user specifies a dump level like
`--dump-args 2` or `--dump-args 3`, this value is lost because subprocesses only
receive the flag and argparse defaults to const=1. Modify the condition around
args.dump_args to check if the value is greater than its default/const value and
append both the flag and the actual value to the common list to preserve the
user's specified dump level in child process invocations.

@vegetabledoww vegetabledoww force-pushed the feature/rename-dump-args-clean branch from cd63d98 to 20891d2 Compare June 17, 2026 08:10
Align the focused PR792/PR966 dump flow on the new dump-args naming across CLI,
runtime artifacts, viewer, tests, CI, and user-facing docs while keeping the
underlying compatibility fields intact.
@vegetabledoww vegetabledoww force-pushed the feature/rename-dump-args-clean branch from 20891d2 to 5155e6a Compare June 17, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Health] Rename dump-tensor public surface to dump-args across CLI, artifacts, tests, CI, and docs

1 participant