Skip to content

Fix: unify build/run pto-isa version in CI#1078

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix/issue-1077-unified-pto-isa-build
Jun 17, 2026
Merged

Fix: unify build/run pto-isa version in CI#1078
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:fix/issue-1077-unified-pto-isa-build

Conversation

@ChaoZheng109

Copy link
Copy Markdown
Collaborator

Summary

Onboard a2a3 host_runtime.so embeds the pto-isa SDMA headers
(comm_hccl.cpp's #include "pto/npu/comm/.../sdma_workspace_manager.hpp"),
but it was built during pip install against an unpinned pto-isa (HEAD,
or a stale clone left on the self-hosted runner), while scene tests pin
pto-isa via --pto-isa-commit. So in one CI run host_runtime.so and the
test-time kernels could be built against two different pto-isa revisions.
Because pto-isa HEAD has since moved sdma_workspace_manager.hpp out of the
pto/npu/comm/... path, an unpinned fresh clone would now also fail to
compile the a2a3 host outright.

This makes both ends use the single ci.yml env.PTO_ISA_COMMIT — bump
that one line and build + run update together.

  • build_runtimes.py: add --pto-isa-commit, pass it to
    ensure_pto_isa_root(commit=...) for the a2a3 host build.
  • CMakeLists.txt: add SIMPLER_PTO_ISA_COMMIT cache var (mirrors
    SIMPLER_PTO_CLONE_PROTOCOL), threaded into the build_runtimes target
    only when set.
  • .github/workflows/ci.yml: onboard pip install passes the commit via
    --config-settings=cmake.define.SIMPLER_PTO_ISA_COMMIT="${PTO_ISA_COMMIT}";
    the requires_hardware pytest steps also pin --pto-isa-commit so
    conftest doesn't reset build/pto-isa back to HEAD before the a2a3 C++ UT
    host rebuild.
  • docs/developer-guide.md: document the pto-isa-commit rebuild trigger.

Note: the pinned commit value is unchanged (ddafa8da…); this only makes
the build honor the same value the tests already use.

Scope / non-goals

This is the build-hygiene problem found while investigating #1067 — it is
not that issue's root cause (which is a device-side CANN op runtime
failure of aclnnShmemSdmaStarsQueryGetWorkspaceSize, independent of the
pto-isa header revision). The @pytest.mark.skip on
test_sdma_async_completion_demo therefore stays in place.

Testing

  • pre-commit hooks pass (markdownlint, ruff, pyright, yaml, headers)
  • CMake conditional-arg expansion verified (empty cache var → flag
    omitted, 0 stray args; set → --pto-isa-commit <sha>)
  • python -m py_compile build_runtimes.py, YAML parse
  • Full effect is only observable on the self-hosted onboard runners
    (a2a3 / a5) — verify the onboard CI jobs build host_runtime against the
    pinned commit.

Fixes #1077
Related: #1067

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the ability to pin the pto-isa clone to a specific commit when building the onboard a2a3 host runtime, ensuring consistency between the host runtime and test-time kernels. The feedback recommends improving the CMake configuration by explicitly checking for an empty string to avoid unexpected boolean evaluations with commit SHAs, and adding strict validation in Python to fail fast if the commit string contains unexpected whitespace.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread CMakeLists.txt Outdated
Comment thread simpler_setup/build_runtimes.py
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 331b6939-bf66-4fa5-9f1a-b39bfbee5ec4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR fixes a CI build/test pto-isa version mismatch by introducing a --pto-isa-commit parameter in build_runtimes.py, a SIMPLER_PTO_ISA_COMMIT CMake cache variable, and CI workflow updates that pass PTO_ISA_COMMIT through pip install and pytest invocations for a2a3 and a5 hardware unit test and onboard scene test jobs.

Changes

pto-isa commit pinning across build and test

Layer / File(s) Summary
build_runtimes.py: --pto-isa-commit parameter and ensure_pto_isa_root forwarding
simpler_setup/build_runtimes.py
Adds pto_isa_commit: Optional[str] = None to build_all(), forwards it to ensure_pto_isa_root(commit=pto_isa_commit, ...) for the a2a3 path, adds a --pto-isa-commit CLI argument, and threads the parsed value into build_all().
CMakeLists.txt: SIMPLER_PTO_ISA_COMMIT cache variable and build_runtimes forwarding
CMakeLists.txt
Introduces SIMPLER_PTO_ISA_COMMIT cache variable, conditionally constructs a --pto-isa-commit <sha> argument when non-empty, and appends it to the build_runtimes custom target command.
CI workflow and developer docs: propagate PTO_ISA_COMMIT through pip and pytest
.github/workflows/ci.yml, docs/developer-guide.md
Documents PTO_ISA_COMMIT as single source of truth; updates pip install steps in ut-a2a3, st-onboard-a2a3, ut-a5, and st-onboard-a5 jobs to pass SIMPLER_PTO_ISA_COMMIT via --config-settings=cmake.define; adds --pto-isa-commit to pytest and task-submit invocations; adds a developer guide rebuild note.

Sequence Diagram(s)

sequenceDiagram
  participant CI as CI env (PTO_ISA_COMMIT)
  participant pip as pip install
  participant cmake as CMakeLists.txt
  participant brt as build_runtimes.py
  participant so as host_runtime.so
  participant test as pytest / task-submit

  CI->>pip: --config-settings cmake.define.SIMPLER_PTO_ISA_COMMIT=sha
  pip->>cmake: SIMPLER_PTO_ISA_COMMIT=sha
  cmake->>brt: --pto-isa-commit sha
  brt->>so: ensure_pto_isa_root(commit=sha, clone_protocol=...)
  Note over so: compiled against pinned pto-isa sha
  CI->>test: --pto-isa-commit sha
  Note over test: kernels checked out to same sha
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • hw-native-sys/simpler#879: Also modifies .github/workflows/ci.yml to pass --pto-isa-commit ${{ env.PTO_ISA_COMMIT }} to onboard scene test commands, directly overlapping with the CI pinning changes in this PR.

Poem

🐰 A rabbit once built with a wandering HEAD,
Its runtime and tests on two branches were led.
Now PTO_ISA_COMMIT rules over them all—
One pin for the build, one pin for the call.
No mismatch in headers, no mystery include,
The bunny hops forward in tidy compile mood! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix: unify build/run pto-isa version in CI' directly and concisely describes the main change: unifying the pto-isa version used during both build and run phases in CI.
Description check ✅ Passed The description comprehensively explains the problem (unpinned pto-isa during build vs pinned during tests), the motivation (recent pto-isa HEAD changes), and the solution across all four modified files.
Linked Issues check ✅ Passed All coding requirements from issue #1077 are met: build_runtimes.py adds --pto-isa-commit support, CMakeLists.txt adds SIMPLER_PTO_ISA_COMMIT cache variable, ci.yml passes the pinned commit via --config-settings and pytest, and docs are updated.
Out of Scope Changes check ✅ Passed All changes are directly scoped to unifying pto-isa versions across build and run phases as specified in issue #1077; documentation updates are appropriate supporting changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yml:
- Around line 12-17: The onboard a2a3 DFX smoke pytest commands are not passing
the `--pto-isa-commit` flag when running pytest, which causes conftest.py to
call ensure_pto_isa_root(update_if_exists=True) and potentially reset
build/pto-isa to HEAD, breaking the single-source-of-truth pin at
PTO_ISA_COMMIT. Locate the onboard a2a3 DFX smoke pytest step(s) in the workflow
and add the `--pto-isa-commit` flag with the PTO_ISA_COMMIT environment variable
value to all pytest invocations in those steps to ensure the pinned revision is
consistently used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2ea6dc79-54fa-480b-95de-57d8b68aae44

📥 Commits

Reviewing files that changed from the base of the PR and between c4b0aac and eec10d6.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml
  • CMakeLists.txt
  • docs/developer-guide.md
  • simpler_setup/build_runtimes.py

Comment thread .github/workflows/ci.yml
@ChaoZheng109 ChaoZheng109 force-pushed the fix/issue-1077-unified-pto-isa-build branch 2 times, most recently from 12c6483 to d3415b5 Compare June 17, 2026 08:42
@ChaoZheng109

Copy link
Copy Markdown
Collaborator Author

Addressed the bot review feedback (pushed as a squashed update):

  • CodeRabbit (Major) — DFX smokes bypass the pin: ✅ Fixed. The a2a3 DFX smoke steps (dep_gen, l2_swimlane, pmu, tensor_dump) ran pytest without --pto-isa-commit, so conftest's ensure_pto_isa_root(update_if_exists=True) could reset build/pto-isa to HEAD and reintroduce host-runtime/kernel drift within the same job. Pinned all of them — both the onboard (ssh) and the sim mirror (https) — to env.PTO_ISA_COMMIT.
  • Gemini (medium) — CMake if(var) truthiness: ✅ Fixed. Changed to if(NOT SIMPLER_PTO_ISA_COMMIT STREQUAL "") and quoted the expansion, so a numeric/boolean-like SHA can't be misread.
  • Gemini (medium) — strip/validate whitespace in build_runtimes.py: ⏭️ Skipped. The premise (silent .strip() normalization) doesn't apply — the code doesn't strip anything. The value comes from the single-source env.PTO_ISA_COMMIT and a malformed commit already fails fast at git checkout; adding speculative input validation here is gold-plating.

Note: the earlier ut / ut-a2a3 red was the pre-existing test_scope_stats_collector duplicate-target collision on main (c4b0aac2), unrelated to this PR — being fixed in #1080.

Onboard a2a3 host_runtime embeds the pto-isa SDMA headers (comm_hccl.cpp),
but it was built during `pip install` against an unpinned pto-isa (HEAD or a
stale runner clone), while scene tests pin pto-isa via `--pto-isa-commit`. So
host_runtime.so and the test-time kernels could use two different pto-isa
revisions in one CI run. Since pto-isa HEAD has moved
sdma_workspace_manager.hpp, an unpinned fresh clone would also fail to
compile the a2a3 host.

Drive both build and run from the single ci.yml env.PTO_ISA_COMMIT:

- build_runtimes.py: add --pto-isa-commit, pass to
  ensure_pto_isa_root(commit=...) for the a2a3 host build
- CMakeLists.txt: add SIMPLER_PTO_ISA_COMMIT cache var, thread it into the
  build_runtimes target when set (guarded with `NOT ... STREQUAL ""` and a
  quoted expansion so a numeric/boolean-like SHA can't be misread by if())
- ci.yml: onboard `pip install` passes the commit via --config-settings;
  the requires_hardware pytest steps and all a2a3 DFX smoke steps (dep_gen,
  l2_swimlane, pmu, tensor_dump — onboard and sim) also pin
  --pto-isa-commit so conftest does not reset build/pto-isa to HEAD and
  reintroduce drift within the same job
- docs/developer-guide.md: document the pto-isa-commit rebuild trigger

Fixes hw-native-sys#1077
@ChaoZheng109 ChaoZheng109 force-pushed the fix/issue-1077-unified-pto-isa-build branch from d3415b5 to 7456671 Compare June 17, 2026 08:45
@ChaoWao ChaoWao merged commit 966897c into hw-native-sys:main Jun 17, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code Health] CI: onboard a2a3 host_runtime built against unpinned pto-isa diverges from test-time --pto-isa-commit

2 participants