chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579
chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579basvandijk wants to merge 68 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental GitHub Actions path to run bazel test using Namespace Bazel Remote Execution (BRE), including automation to build/mirror/optimize a worker image and keep the workflow pinned to an immutable digest.
Changes:
- Adds a new experimental workflow to run
bazel teston Namespace runners with remote execution enabled. - Extends the container autobuild workflow with jobs to mirror
ic-buildintonscr.io, optimize it for BRE, and automatically update the pinned worker-image digest used by the new workflow.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| .github/workflows/container-autobuild.yml | Adds jobs to create an optimized Namespace BRE worker image and auto-update the pinned digest reference in workflows. |
| .github/workflows/bre-namespace-test.yml | New opt-in workflow to run bazel test using Namespace remote execution with a pinned worker image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a new 'bre-namespace-test.yml' workflow that runs 'bazel test' on Namespace runners using Bazel Remote Execution (BRE). Actions execute on Namespace workers booted from a custom worker image (a mirror of ic-build). Extend 'container-autobuild.yml' to mirror the freshly built ic-build image into nscr.io, optimize it for BRE, and pin the resulting digest. These jobs are decoupled from the production image-reference update so an early-access BRE failure can never block it.
Addresses Copilot review: 'nsc bazel execution setup' writes short-lived credentials into the bazelrc, so 'cat'-ing it could leak auth material into the Actions logs.
…update job - Use the same fully-qualified nscr.io ref for the upload destination and the digest lookup, so the inspected tag is guaranteed to exist. - Guard bre-worker-image and bre-namespace-test against fork PRs via head.repo.full_name == github.repository (matching ci-kickoff.yml), since both run on privileged Namespace runners with pre-authenticated nsc. - Drop update-image-references from update-worker-reference's needs for true decoupling; the existing 'git pull --rebase' absorbs any concurrent push.
27f3f10 to
95ee442
Compare
…ac895bdd550cac7bacb9dad553bae ic-build: sha256:f4c6c7e0e16da470cba7ebceb0145f588d5fd4859c04acfa607bee475ecfa914 ic-dev: sha256:2f98d344d708a1ae70938d5e777a1f141f7f2a9545687653f407a405eb1a27ea
|
Run URL: https://github.com/dfinity/ic/actions/runs/28442091939 New container images with tag: |
update-image-references now depends on bre-worker-image (default needs semantics) and pins the worker digest unconditionally in the same commit, dropping the cancelled()-guard and the separate update-worker-reference job. Waiting on bre-worker-image before committing/pushing avoids the concurrency cancellation seen earlier.
nsc base-image upload prepends $NSC_CONTAINER_REGISTRY (nscr.io/<tenant>) to a relative name. Passing the fully-qualified nscr.io/dfinity ref caused a double-prefixed push (nscr.io/<tenant>/nscr.io/dfinity/...) and a 401 on the digest lookup. Derive the registry from $NSC_CONTAINER_REGISTRY, upload a relative name, pin the resulting full ref, and match any tenant in the pin sed. Also fixes a stale comment referencing the removed update-worker-reference job.
…ions rules_rust's `_symlink_sysroot_tree` iterated `target.files` instead of its `target_files` argument, so the linker target's runfiles (rust-lld's bundled `gcc-ld/*` self-contained linker wrappers, e.g. `gcc-ld/ld.lld`) were never symlinked into the generated sysroot and thus never became declared inputs of the Rustc actions. rustc defaults to lld on x86_64-unknown-linux-gnu (it links via `-fuse-ld=lld -B<sysroot>/lib/rustlib/<target>/bin/gcc-ld`). Local (non-sandboxed) builds still found gcc-ld on disk, but Bazel Remote Execution (Namespace BRE) ships only an action's declared inputs, so every Rustc link action (starting with the bootstrap process_wrapper) failed with: "the self-contained linker was requested, but it wasn't found in the target's sysroot, or in rustc's sysroot".
…e-bazel-remote-execution
…e-bazel-remote-execution
The previous version of this patch switched _symlink_sysroot_tree to iterate only the linker target's runfiles (rust-lld's gcc-ld/* wrappers), which dropped the rust-lld binary at lib/rustlib/<target>/bin/rust-lld that rustc invokes directly to link wasm32 canisters, causing 'linker `rust-lld` not found' under remote execution. Symlink the union of the linker target's files (the rust-lld binary, used directly for wasm32) and its runfiles (the gcc-ld/* wrappers, used by rustc's default lld on x86_64-unknown-linux-gnu) so both link under remote execution.
…e-bazel-remote-execution
For each pocket-ic test tagged 'requires-network', add a '-darwin' copy that is only compatible with arm64-darwin and keeps the 'requires-network' tag (needed to work around the sporadic 'Failed to bind PocketIC server to address 127.0.0.1:0' on Apple Silicon). Drop 'requires-network' from the originals and make them incompatible with arm64-darwin, so the darwin variants run there instead while the originals keep running on x86_64-linux and arm64-linux.
rules_rust infers the crate root from a src whose basename matches the target name; the '-darwin' rename breaks this for the multi-source 'tests-darwin' and 'unix-darwin' targets. Set crate_root explicitly for them (the single-source and icp_features variants don't need it).
The artifact_bundle rule produces a tree artifact whose children are absolute
symlinks into the execroot (via 'ln -s "$(realpath ...)"'). Those symlinks only
resolve on the machine that produced them. Under Bazel Remote Execution with
build-without-the-bytes, the bundle is produced remotely and validated locally,
where the symlink targets aren't materialized, so Bazel fails with 'dangling
symbolic link' (e.g. for the testonly types-test.gz, which no local action pulls
down).
Set execution_requirements = {"no-remote": ""} on the action so every
artifact_bundle target builds locally and is not served from the remote cache,
without needing per-target tags.
Add the ic-test-utilities-privileges crate exposing run_as_nobody_if_root, which runs a test closure in a forked child that drops to the nobody user when the process is root, and in-process otherwise. Root holds CAP_DAC_OVERRIDE and therefore bypasses filesystem permission bits, so tests asserting PermissionDenied instead observe the operation succeeding and fail when run as root (e.g. on Bazel remote-execution workers). The helper forks, redirects TMPDIR/TEST_TMPDIR to a nobody-owned base, drops supplementary groups, gid, and uid to nobody, and runs the whole test body there; the child's outcome (including any panic message) is mirrored to the parent so #[should_panic] and its expected message keep working. Use it in the affected permission tests in rs/sys, the crypto service provider, and rs/state_manager.
a84bfd7 to
24982cf
Compare
nix 0.24.3 configures `setgroups` out on Apple targets, so the `imp` module failed to compile on darwin. The helper's semantics (the Linux `nobody` uid 65534 and CAP_DAC_OVERRIDE) are Linux-specific anyway, so gate `imp` on `target_os = "linux"` and fall through to the in-process no-op on the other platforms.
The tests test_ensure_file_exists_and_is_writeable_fails_if_non_writeable and test_save_proposal_id_to_file_fails_if_write_fails assert PermissionDenied on a read-only file. Root holds CAP_DAC_OVERRIDE and bypasses filesystem permission bits, so under Bazel remote-execution (running as root) the operations succeed and the assertions fail. Wrap both test bodies in ic_test_utilities_privileges::run_as_nobody_if_root so they drop to the nobody user when the process is root, mirroring the treatment applied in 380255a.
This reverts commit 24982cf.
… variants" This reverts commit 20bc0f4.
…in variants" This reverts commit 6c83df8.
This test spawns a loopback HTTP server in the test process and has the PocketIC server make a canister HTTP outcall to it, which the remote executor cannot reach under Bazel Remote Execution. The bazel-test-bre job now sets POCKET_IC_SKIP_HTTP_LIVE_MODE_TEST via --test_env, and the test returns early when it is set. Other tests ignore the env var, so only this test is skipped and only on that job.
Overview
Experiment with Namespace Bazel Remote Execution (BRE): run
bazel teston Namespace runners, with actions executed on remote Namespace workers booted from a custom worker image (a mirror ofic-build).Changes
Extended:
.github/workflows/container-autobuild.ymlbre-worker-imagejob (on the Namespace runner wherenscis pre-authenticated): mirrors the freshly builtic-buildimage — by digest — into the Namespace tenant registry ($NSC_CONTAINER_REGISTRY, i.e.nscr.io/<tenant>/ic-build-worker) viansc base-image upload, resolves the pushed digest, and optimizes it for BRE viansc base-image optimize. Same fork guard as above.ic-build-image: if it fails, that's a bug to fix.update-image-referencesjob now also pins the worker image. Itneedsbre-worker-image(so its commit/push — which re-triggers the workflow and would otherwise cancel the in-flight optimize viacancel-in-progress— only happens once the optimize completes), and rewrites the pinned ref inbre-namespace-test.ymlin the same commit as theic-build/ic-devref andTAGupdates. The pinsedis tenant-agnostic, so it self-corrects if the Namespace tenant ever changes.New workflow:
.github/workflows/bre-namespace-test.ymlbazel teston a Namespace runner (namespace-profile-amd64-linux-32x64) using BRE. Targets default to//...and are overridable via aworkflow_dispatchinput.nsc bazel execution setup(writes a bazelrc with the remote executor, cache and credentials — deliberately not printed, since it contains short-lived credentials).--remote_default_exec_properties=container-image=....--noworkspace_rc+ explicit.bazelrc.build+ the Namespace RBE bazelrc), mirroring the existingbazel-test-arm64job.--test_tag_filtersand runs with--keep_going.workflow_dispatch, pushes todev-gh-*, or non-fork PRs labeledCI_BRE. Restricted todfinity/ic; fork PRs are excluded because the job runs on a privileged Namespace runner with pre-authenticatednsc.Notes / follow-ups
ic-buildrebuild (bumpci/container/TAG).bre-worker-imageis required, a Namespace/BRE outage would block the productionic-build/ic-devreference bump — a conscious trade-off (both must succeed).//...under BRE (ic-os local-strategy targets, privileged/system tests); the broad--test_tag_filtersexclusions and--keep_goingreduce noise while iterating.