Fix: unify build/run pto-isa version in CI#1078
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to pin the pto-isa clone to a specific commit when building the onboard a2a3 host runtime, ensuring consistency between the host runtime and test-time kernels. The feedback recommends improving the CMake configuration by explicitly checking for an empty string to avoid unexpected boolean evaluations with commit SHAs, and adding strict validation in Python to fail fast if the commit string contains unexpected whitespace.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR fixes a CI build/test pto-isa version mismatch by introducing a Changespto-isa commit pinning across build and test
Sequence Diagram(s)sequenceDiagram
participant CI as CI env (PTO_ISA_COMMIT)
participant pip as pip install
participant cmake as CMakeLists.txt
participant brt as build_runtimes.py
participant so as host_runtime.so
participant test as pytest / task-submit
CI->>pip: --config-settings cmake.define.SIMPLER_PTO_ISA_COMMIT=sha
pip->>cmake: SIMPLER_PTO_ISA_COMMIT=sha
cmake->>brt: --pto-isa-commit sha
brt->>so: ensure_pto_isa_root(commit=sha, clone_protocol=...)
Note over so: compiled against pinned pto-isa sha
CI->>test: --pto-isa-commit sha
Note over test: kernels checked out to same sha
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Around line 12-17: The onboard a2a3 DFX smoke pytest commands are not passing
the `--pto-isa-commit` flag when running pytest, which causes conftest.py to
call ensure_pto_isa_root(update_if_exists=True) and potentially reset
build/pto-isa to HEAD, breaking the single-source-of-truth pin at
PTO_ISA_COMMIT. Locate the onboard a2a3 DFX smoke pytest step(s) in the workflow
and add the `--pto-isa-commit` flag with the PTO_ISA_COMMIT environment variable
value to all pytest invocations in those steps to ensure the pinned revision is
consistently used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2ea6dc79-54fa-480b-95de-57d8b68aae44
📒 Files selected for processing (4)
.github/workflows/ci.ymlCMakeLists.txtdocs/developer-guide.mdsimpler_setup/build_runtimes.py
12c6483 to
d3415b5
Compare
|
Addressed the bot review feedback (pushed as a squashed update):
Note: the earlier |
Onboard a2a3 host_runtime embeds the pto-isa SDMA headers (comm_hccl.cpp), but it was built during `pip install` against an unpinned pto-isa (HEAD or a stale runner clone), while scene tests pin pto-isa via `--pto-isa-commit`. So host_runtime.so and the test-time kernels could use two different pto-isa revisions in one CI run. Since pto-isa HEAD has moved sdma_workspace_manager.hpp, an unpinned fresh clone would also fail to compile the a2a3 host. Drive both build and run from the single ci.yml env.PTO_ISA_COMMIT: - build_runtimes.py: add --pto-isa-commit, pass to ensure_pto_isa_root(commit=...) for the a2a3 host build - CMakeLists.txt: add SIMPLER_PTO_ISA_COMMIT cache var, thread it into the build_runtimes target when set (guarded with `NOT ... STREQUAL ""` and a quoted expansion so a numeric/boolean-like SHA can't be misread by if()) - ci.yml: onboard `pip install` passes the commit via --config-settings; the requires_hardware pytest steps and all a2a3 DFX smoke steps (dep_gen, l2_swimlane, pmu, tensor_dump — onboard and sim) also pin --pto-isa-commit so conftest does not reset build/pto-isa to HEAD and reintroduce drift within the same job - docs/developer-guide.md: document the pto-isa-commit rebuild trigger Fixes hw-native-sys#1077
d3415b5 to
7456671
Compare
Summary
Onboard a2a3
host_runtime.soembeds the pto-isa SDMA headers(
comm_hccl.cpp's#include "pto/npu/comm/.../sdma_workspace_manager.hpp"),but it was built during
pip installagainst an unpinned pto-isa (HEAD,or a stale clone left on the self-hosted runner), while scene tests pin
pto-isa via
--pto-isa-commit. So in one CI runhost_runtime.soand thetest-time kernels could be built against two different pto-isa revisions.
Because pto-isa HEAD has since moved
sdma_workspace_manager.hppout of thepto/npu/comm/...path, an unpinned fresh clone would now also fail tocompile the a2a3 host outright.
This makes both ends use the single
ci.ymlenv.PTO_ISA_COMMIT— bumpthat one line and build + run update together.
build_runtimes.py: add--pto-isa-commit, pass it toensure_pto_isa_root(commit=...)for the a2a3 host build.CMakeLists.txt: addSIMPLER_PTO_ISA_COMMITcache var (mirrorsSIMPLER_PTO_CLONE_PROTOCOL), threaded into thebuild_runtimestargetonly when set.
.github/workflows/ci.yml: onboardpip installpasses the commit via--config-settings=cmake.define.SIMPLER_PTO_ISA_COMMIT="${PTO_ISA_COMMIT}";the
requires_hardwarepytest steps also pin--pto-isa-commitsoconftestdoesn't resetbuild/pto-isaback to HEAD before the a2a3 C++ UThost rebuild.
docs/developer-guide.md: document the pto-isa-commit rebuild trigger.Note: the pinned commit value is unchanged (
ddafa8da…); this only makesthe build honor the same value the tests already use.
Scope / non-goals
This is the build-hygiene problem found while investigating #1067 — it is
not that issue's root cause (which is a device-side CANN op runtime
failure of
aclnnShmemSdmaStarsQueryGetWorkspaceSize, independent of thepto-isa header revision). The
@pytest.mark.skipontest_sdma_async_completion_demotherefore stays in place.Testing
pre-commithooks pass (markdownlint, ruff, pyright, yaml, headers)omitted, 0 stray args; set →
--pto-isa-commit <sha>)python -m py_compile build_runtimes.py, YAML parse(a2a3 / a5) — verify the onboard CI jobs build host_runtime against the
pinned commit.
Fixes #1077
Related: #1067