Skip to content

chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579

Draft
basvandijk wants to merge 68 commits into
masterfrom
basvandijk/namespace-bazel-remote-execution
Draft

chore: experiment with Bazel Remote Execution (BRE) on Namespace#10579
basvandijk wants to merge 68 commits into
masterfrom
basvandijk/namespace-bazel-remote-execution

Conversation

@basvandijk

@basvandijk basvandijk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Overview

Experiment with Namespace Bazel Remote Execution (BRE): run bazel test on Namespace runners, with actions executed on remote Namespace workers booted from a custom worker image (a mirror of ic-build).

Changes

Extended: .github/workflows/container-autobuild.yml

  • New bre-worker-image job (on the Namespace runner where nsc is pre-authenticated): mirrors the freshly built ic-build image — by digest — into the Namespace tenant registry ($NSC_CONTAINER_REGISTRY, i.e. nscr.io/<tenant>/ic-build-worker) via nsc base-image upload, resolves the pushed digest, and optimizes it for BRE via nsc base-image optimize. Same fork guard as above.
  • It is treated as a required job, like ic-build-image: if it fails, that's a bug to fix.
  • The existing update-image-references job now also pins the worker image. It needs bre-worker-image (so its commit/push — which re-triggers the workflow and would otherwise cancel the in-flight optimize via cancel-in-progress — only happens once the optimize completes), and rewrites the pinned ref in bre-namespace-test.yml in the same commit as the ic-build/ic-dev ref and TAG updates. The pin sed is tenant-agnostic, so it self-corrects if the Namespace tenant ever changes.

New workflow: .github/workflows/bre-namespace-test.yml

  • Runs bazel test on a Namespace runner (namespace-profile-amd64-linux-32x64) using BRE. Targets default to //... and are overridable via a workflow_dispatch input.
  • Provisions the RBE cluster with nsc bazel execution setup (writes a bazelrc with the remote executor, cache and credentials — deliberately not printed, since it contains short-lived credentials).
  • Routes actions to the custom worker image via --remote_default_exec_properties=container-image=....
  • Bypasses the DFINITY-internal Bazel cache/RE config (--noworkspace_rc + explicit .bazelrc.build + the Namespace RBE bazelrc), mirroring the existing bazel-test-arm64 job.
  • Excludes long/nightly/fuzz/large-system tests via --test_tag_filters and runs with --keep_going.
  • Opt-in while experimental: workflow_dispatch, pushes to dev-gh-*, or non-fork PRs labeled CI_BRE. Restricted to dfinity/ic; fork PRs are excluded because the job runs on a privileged Namespace runner with pre-authenticated nsc.

Notes / follow-ups

  • The worker-image ref in the test workflow starts as a placeholder digest; it is populated on the next ic-build rebuild (bump ci/container/TAG).
  • Because bre-worker-image is required, a Namespace/BRE outage would block the production ic-build/ic-dev reference bump — a conscious trade-off (both must succeed).
  • Expect rough edges running //... under BRE (ic-os local-strategy targets, privileged/system tests); the broad --test_tag_filters exclusions and --keep_going reduce noise while iterating.

@basvandijk basvandijk requested a review from Copilot June 26, 2026 12:28
@basvandijk basvandijk changed the title Experiment with Bazel Remote Execution (BRE) on Namespace chore: experiment with Bazel Remote Execution (BRE) on Namespace Jun 26, 2026
@github-actions github-actions Bot added the chore label Jun 26, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental GitHub Actions path to run bazel test using Namespace Bazel Remote Execution (BRE), including automation to build/mirror/optimize a worker image and keep the workflow pinned to an immutable digest.

Changes:

  • Adds a new experimental workflow to run bazel test on Namespace runners with remote execution enabled.
  • Extends the container autobuild workflow with jobs to mirror ic-build into nscr.io, optimize it for BRE, and automatically update the pinned worker-image digest used by the new workflow.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.github/workflows/container-autobuild.yml Adds jobs to create an optimized Namespace BRE worker image and auto-update the pinned digest reference in workflows.
.github/workflows/bre-namespace-test.yml New opt-in workflow to run bazel test using Namespace remote execution with a pinned worker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/bre-namespace-test.yml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/container-autobuild.yml Outdated
Comment thread .github/workflows/bre-namespace-test.yml Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Add a new 'bre-namespace-test.yml' workflow that runs 'bazel test' on Namespace runners using Bazel Remote Execution (BRE). Actions execute on Namespace workers booted from a custom worker image (a mirror of ic-build).

Extend 'container-autobuild.yml' to mirror the freshly built ic-build image into nscr.io, optimize it for BRE, and pin the resulting digest. These jobs are decoupled from the production image-reference update so an early-access BRE failure can never block it.
Addresses Copilot review: 'nsc bazel execution setup' writes short-lived credentials into the bazelrc, so 'cat'-ing it could leak auth material into the Actions logs.
…update job

- Use the same fully-qualified nscr.io ref for the upload destination and the digest lookup, so the inspected tag is guaranteed to exist.

- Guard bre-worker-image and bre-namespace-test against fork PRs via head.repo.full_name == github.repository (matching ci-kickoff.yml), since both run on privileged Namespace runners with pre-authenticated nsc.

- Drop update-image-references from update-worker-reference's needs for true decoupling; the existing 'git pull --rebase' absorbs any concurrent push.
@basvandijk basvandijk force-pushed the basvandijk/namespace-bazel-remote-execution branch from 27f3f10 to 95ee442 Compare June 28, 2026 12:15
…ac895bdd550cac7bacb9dad553bae

ic-build: sha256:f4c6c7e0e16da470cba7ebceb0145f588d5fd4859c04acfa607bee475ecfa914

ic-dev:   sha256:2f98d344d708a1ae70938d5e777a1f141f7f2a9545687653f407a405eb1a27ea
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Run URL: https://github.com/dfinity/ic/actions/runs/28442091939

New container images with tag: 22378bb2ad2621b518f4000afdb1ebbe793826b789c9ea988d61e863e46d4d95
ic-build: sha256:e9f95a42acbb5dd96f36d53037129842e16f2ec628ea38f09c9d2404cba2fdff
ic-dev: sha256:cb0b750d7254a4fa280b2f0d0a62ab05649fa9b8c24eafb59bb2dc040fd8dac2
ic-build-worker: nscr.io/c9ptjuknd7oc6/ic-build-worker@sha256:b3209ba49237175d9f4339daa4b0828f1eee0cf9bd8ccd193af7be4a9663d919

update-image-references now depends on bre-worker-image (default needs semantics) and pins the worker digest unconditionally in the same commit, dropping the cancelled()-guard and the separate update-worker-reference job. Waiting on bre-worker-image before committing/pushing avoids the concurrency cancellation seen earlier.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread .github/workflows/bre-namespace-test.yml Outdated
Comment thread .github/workflows/bre-namespace-test.yml Outdated
Comment thread .github/workflows/container-autobuild.yml
nsc base-image upload prepends $NSC_CONTAINER_REGISTRY (nscr.io/<tenant>) to a relative name. Passing the fully-qualified nscr.io/dfinity ref caused a double-prefixed push (nscr.io/<tenant>/nscr.io/dfinity/...) and a 401 on the digest lookup. Derive the registry from $NSC_CONTAINER_REGISTRY, upload a relative name, pin the resulting full ref, and match any tenant in the pin sed. Also fixes a stale comment referencing the removed update-worker-reference job.
…ions

rules_rust's `_symlink_sysroot_tree` iterated `target.files` instead of its `target_files` argument, so the linker target's runfiles (rust-lld's bundled `gcc-ld/*` self-contained linker wrappers, e.g. `gcc-ld/ld.lld`) were never symlinked into the generated sysroot and thus never became declared inputs of the Rustc actions.

rustc defaults to lld on x86_64-unknown-linux-gnu (it links via `-fuse-ld=lld -B<sysroot>/lib/rustlib/<target>/bin/gcc-ld`). Local (non-sandboxed) builds still found gcc-ld on disk, but Bazel Remote Execution (Namespace BRE) ships only an action's declared inputs, so every Rustc link action (starting with the bootstrap process_wrapper) failed with: "the self-contained linker was requested, but it wasn't found in the target's sysroot, or in rustc's sysroot".
Commit 7cde29f added --experimental_inmemory_dotd_files to counteract --noexperimental_inmemory_dotd_files in bazel/conf/.bazelrc.build. Commit 91ae530 stopped setting --noexperimental_inmemory_dotd_files, so the flag (and its comment) are no longer needed.
The previous version of this patch switched _symlink_sysroot_tree to iterate
only the linker target's runfiles (rust-lld's gcc-ld/* wrappers), which dropped
the rust-lld binary at lib/rustlib/<target>/bin/rust-lld that rustc invokes
directly to link wasm32 canisters, causing 'linker `rust-lld` not found' under
remote execution.

Symlink the union of the linker target's files (the rust-lld binary, used
directly for wasm32) and its runfiles (the gcc-ld/* wrappers, used by rustc's
default lld on x86_64-unknown-linux-gnu) so both link under remote execution.
For each pocket-ic test tagged 'requires-network', add a '-darwin' copy that is only compatible with arm64-darwin and keeps the 'requires-network' tag (needed to work around the sporadic 'Failed to bind PocketIC server to address 127.0.0.1:0' on Apple Silicon).

Drop 'requires-network' from the originals and make them incompatible with arm64-darwin, so the darwin variants run there instead while the originals keep running on x86_64-linux and arm64-linux.
rules_rust infers the crate root from a src whose basename matches the target name; the '-darwin' rename breaks this for the multi-source 'tests-darwin' and 'unix-darwin' targets. Set crate_root explicitly for them (the single-source and icp_features variants don't need it).
The artifact_bundle rule produces a tree artifact whose children are absolute
symlinks into the execroot (via 'ln -s "$(realpath ...)"'). Those symlinks only
resolve on the machine that produced them. Under Bazel Remote Execution with
build-without-the-bytes, the bundle is produced remotely and validated locally,
where the symlink targets aren't materialized, so Bazel fails with 'dangling
symbolic link' (e.g. for the testonly types-test.gz, which no local action pulls
down).

Set execution_requirements = {"no-remote": ""} on the action so every
artifact_bundle target builds locally and is not served from the remote cache,
without needing per-target tags.
Add the ic-test-utilities-privileges crate exposing run_as_nobody_if_root,
which runs a test closure in a forked child that drops to the nobody user when
the process is root, and in-process otherwise.

Root holds CAP_DAC_OVERRIDE and therefore bypasses filesystem permission bits,
so tests asserting PermissionDenied instead observe the operation succeeding and
fail when run as root (e.g. on Bazel remote-execution workers). The helper forks,
redirects TMPDIR/TEST_TMPDIR to a nobody-owned base, drops supplementary groups,
gid, and uid to nobody, and runs the whole test body there; the child's outcome
(including any panic message) is mirrored to the parent so #[should_panic] and
its expected message keep working.

Use it in the affected permission tests in rs/sys, the crypto service provider,
and rs/state_manager.
@basvandijk basvandijk force-pushed the basvandijk/namespace-bazel-remote-execution branch from a84bfd7 to 24982cf Compare July 1, 2026 11:36
nix 0.24.3 configures `setgroups` out on Apple targets, so the `imp`
module failed to compile on darwin. The helper's semantics (the Linux
`nobody` uid 65534 and CAP_DAC_OVERRIDE) are Linux-specific anyway, so
gate `imp` on `target_os = "linux"` and fall through to the in-process
no-op on the other platforms.
The tests test_ensure_file_exists_and_is_writeable_fails_if_non_writeable and
test_save_proposal_id_to_file_fails_if_write_fails assert PermissionDenied on a
read-only file. Root holds CAP_DAC_OVERRIDE and bypasses filesystem permission
bits, so under Bazel remote-execution (running as root) the operations succeed
and the assertions fail.

Wrap both test bodies in ic_test_utilities_privileges::run_as_nobody_if_root so
they drop to the nobody user when the process is root, mirroring the treatment
applied in 380255a.
This test spawns a loopback HTTP server in the test process and has the PocketIC server make a canister HTTP outcall to it, which the remote executor cannot reach under Bazel Remote Execution. The bazel-test-bre job now sets POCKET_IC_SKIP_HTTP_LIVE_MODE_TEST via --test_env, and the test returns early when it is set. Other tests ignore the env var, so only this test is skipped and only on that job.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore CI_BRE Trigger the bazel-test-bre job to run bazel test via Remote Execution @ Namespace

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants