fix(scripts): preserve direct-root nemoclaw-start under CAP_DAC_OVERRIDE drop by laitingsheng · Pull Request #4528 · NVIDIA/NemoClaw

laitingsheng · 2026-05-29T12:57:42Z

Summary

Restore the direct-root startup path of scripts/nemoclaw-start.sh under runtimes that drop CAP_DAC_OVERRIDE. Two distinct failures shared the same trigger and the same script, so they are bundled here.

Related Issue

Fixes #4498
Fixes #4512

Changes

ensure_mutable_openclaw_config_hash now routes the sha256sum write through STEP_DOWN_PREFIX_SANDBOX when running as root. The .config-hash is 660 sandbox:sandbox in mutable-default mode, so without CAP_DAC_OVERRIDE the previous in-process redirection aborted with EACCES. Letting the file's owner perform the write keeps the same on-disk shape while staying inside the dropped cap set.
Add run_step_down_as_sandbox, which serialises the named functions plus the trailing invocation into a helper script under /tmp (sticky-bit, unlink-protected, unguessable mktemp suffix; the directory is no longer configurable from production env) and runs bash <file> through the step-down prefix. The step-down shell reads the literal source bytes from disk, so the heredoc-bearing function bodies no longer have to survive the bash -c "$(declare -f ...) ..." argv round-trip that aborted with syntax error near unexpected token 'fi'. The helper documents a trusted-literal-only contract on the invocation snippet.
Replace the two bash -c "$(declare -f ...) ..." call sites with run_step_down_as_sandbox invocations. The auth-profile entry point is now wrapped in setup_auth_profile_as_sandbox, which exports HOME=/sandbox before dispatching — setpriv preserves the parent shell's environment, and without this write_auth_profile's ~/.openclaw/... expansion would target /root.
Extend test/nemoclaw-start.test.ts with:
- run_step_down_as_sandbox happy-path cleanup, body-failure cleanup with exit-code propagation, and a regression that exercises two adjacent <<'TAG' heredoc-bearing functions (mirrors the exact shape that triggered the earlier syntax error).
- setup_auth_profile_as_sandbox observed-behaviour assertion that HOME=/sandbox reaches the dispatched function even when the parent env carries HOME=/root.
- ensure_mutable_openclaw_config_hash root-mode step-down (uid=0 routes through the prefix), non-root pass-through, stale-hash overwrite, and an EACCES-precondition test that pre-creates a read-only .config-hash, asserts the direct redirection genuinely fails with "permission denied", then runs the production function under a step-down stub that mirrors the owner-uid recovery.
- A direct-root composition harness that chains ensure_mutable_openclaw_config_hash → ensure_gateway_token → export_gateway_token → write_runtime_shell_env → lock_rc_files → setup_auth_profile_as_sandbox under a uid=0 stub and a step-down prefix mirroring the CAP_DAC_OVERRIDE-dropped effective ownership. Asserts the six round-4 review clauses: fresh sha256 in .config-hash at mode 660, OPENCLAW_GATEWAY_TOKEN exported in /tmp/nemoclaw-proxy-env.sh, no Missing gateway auth token, no syntax error near unexpected token 'fi', .bashrc/.profile locked to 0444, continuation reached.

E2E / Nightly was dispatched against this branch with jobs=shields-config-e2e,runtime-overrides-e2e,device-auth-health-e2e; all three selective jobs (plus the surrounding suite) passed.

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Summary by CodeRabbit

New Features
- Added a sandbox "step-down" mechanism to run setup tasks as the non-root sandbox user and a wrapper to create/harden the auth profile with HOME=/sandbox.
Bug Fixes
- Startup now refreshes the mutable configuration hash via the sandbox step-down when running as root; failures abort startup instead of continuing.
Tests
- Added comprehensive tests covering step-down dispatch/cleanup, heredocs, auth-profile HOME enforcement, config-hash refresh paths, template seeding, and end-to-end root-entrypoint scenarios.

…IDE drop Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-05-29T12:57:55Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 84418774-614d-4a50-94e2-31229864b499

📥 Commits

Reviewing files that changed from the base of the PR and between a773602 and 19e7997.

📒 Files selected for processing (2)

scripts/nemoclaw-start.sh
test/nemoclaw-start.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

test/nemoclaw-start.test.ts
scripts/nemoclaw-start.sh

📝 Walkthrough

Walkthrough

Adds run_step_down_as_sandbox(), uses it to run auth profile setup and template seeding as the sandbox user, routes .config-hash refresh through the sandbox when running as root, and expands tests to cover helper dispatch, cleanup on failure, heredoc safety, HOME override, and root/non-root hash refresh behavior.

Changes

Sandbox step-down helper and permission fix

Layer / File(s)	Summary
Step-down helper and auth-step-down replacement `scripts/nemoclaw-start.sh`	Adds `run_step_down_as_sandbox()` that writes a temporary script containing declared functions plus an invocation, executes it via `STEP_DOWN_PREFIX_SANDBOX`, and unlinks the temp file. Adds `setup_auth_profile_as_sandbox()` and replaces prior inline `bash -c "$(declare -f ...)"` usage; updates seeding to call through the helper.
Mutable config hash refresh with step-down `scripts/nemoclaw-start.sh`	`ensure_mutable_openclaw_config_hash()` now recomputes the mutable-default `.config-hash`; when executed as root it performs `sha256sum` and `chmod` via the sandbox step-down (inner `sh -c`) and returns non-zero on failure so startup aborts.
Test coverage for step-down and hash behaviors `test/nemoclaw-start.test.ts`	Updates seeding test to embed `run_step_down_as_sandbox`, adds stubs for `run_step_down_as_sandbox`/`setup_auth_profile_as_sandbox` in test scaffolding, and adds suites verifying: temp-script dispatch and cleanup on success, cleanup and inner-exit capture on failure, heredoc-containing function-body safety, `HOME=/sandbox` enforcement for auth setup, root/non-root `.config-hash` refresh scenarios (including read-only regression), and an end-to-end root-entrypoint composition test.

Sequence Diagram(s)

sequenceDiagram
  participant Entrypoint
  participant run_step_down_as_sandbox
  participant TempScript
  participant STEP_DOWN_PREFIX_SANDBOX
  participant SandboxProcess
  Entrypoint->>run_step_down_as_sandbox: invoke helper with functions + snippet
  run_step_down_as_sandbox->>TempScript: write declare -f bodies + invocation
  run_step_down_as_sandbox->>STEP_DOWN_PREFIX_SANDBOX: execute TempScript path
  STEP_DOWN_PREFIX_SANDBOX->>SandboxProcess: run TempScript as sandbox user
  SandboxProcess-->>run_step_down_as_sandbox: return exit status
  run_step_down_as_sandbox->>TempScript: unlink temp file

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Sandbox, Integration: OpenClaw

Suggested reviewers

ericksoa
prekshivyas
cv

Poem

🐰 I stitched a script in moonlit sand,

declared my functions, tidy and planned,
stepped down softly where permissions teach,
wrote the hash and swept up the beach,
temp-file gone — the sandbox hums, launch stands.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main purpose of the PR—fixing the direct-root startup path of nemoclaw-start.sh when CAP_DAC_OVERRIDE is dropped, which is the primary change across both the script and test files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4498-4512-direct-root-startup

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-29T12:59:52Z

E2E Advisor Recommendation

Required E2E: cloud-e2e, shields-config-e2e, runtime-overrides-e2e
Optional E2E: openclaw-inference-switch-e2e, sandbox-survival-e2e, openclaw-onboard-security-posture-e2e

Dispatch hint: cloud-e2e,shields-config-e2e,runtime-overrides-e2e

Auto-dispatched E2E: cloud-e2e, shields-config-e2e, runtime-overrides-e2e via nightly-e2e.yaml at 19e7997e0cdb025c2df5b042dac04839e886695f — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cloud-e2e (high): Runs the primary OpenClaw install/onboard/start path against a real sandbox and cloud inference. This is the highest-signal merge-blocking check for changes to scripts/nemoclaw-start.sh because it verifies the root entrypoint boots, token/runtime environment setup works, the gateway is reachable, and an assistant flow can run.
shields-config-e2e (medium): Directly exercises mutable-default config permissions, .config-hash behavior, shields-up locked config integrity, tamper detection, and shields-down restoration. The changed hash-refresh logic and config-owner handling can regress this security boundary.
runtime-overrides-e2e (medium): Builds the OpenClaw sandbox image and runs short-lived containers through the entrypoint while validating config hash correctness after startup/runtime config mutations. The changed ensure_mutable_openclaw_config_hash path is central to this behavior.

Optional E2E

openclaw-inference-switch-e2e (high): Useful adjacent coverage because inference switch rewrites openclaw.json and verifies .config-hash in a live OpenClaw sandbox after route changes. It is relevant to the config hash change but overlaps with required config/startup coverage.
sandbox-survival-e2e (high): Useful confidence for restart/survival behavior because entrypoint changes can surface after gateway stop/start and sandbox recovery. Not strictly required if cloud-e2e and shields/runtime override coverage pass.
openclaw-onboard-security-posture-e2e (high): Validates non-root host security posture, rc-file/runtime guard assertions, and startup logs. This is adjacent to the HOME/auth-profile and rc-locking changes but can be optional if the targeted shields and full OpenClaw flow are run.

New E2E recommendations

OpenClaw root entrypoint container smoke (high): Existing hermes-root-entrypoint-smoke-e2e covers Hermes' separate entrypoint, but this PR changes scripts/nemoclaw-start.sh, the OpenClaw entrypoint. The added unit tests simulate the root entrypoint chain, but there does not appear to be a small Docker-only OpenClaw root-entrypoint smoke that starts the production OpenClaw image as root and asserts fresh .config-hash, token export, clean startup log, rc-file locking, and gateway health without a full onboard.
- Suggested test: Add an OpenClaw root-entrypoint smoke E2E analogous to test/e2e/test-hermes-root-entrypoint-smoke.sh, using the production Dockerfile image and /usr/local/bin/nemoclaw-start.

Dispatch hint

Workflow: E2E / Nightly
jobs input: cloud-e2e,shields-config-e2e,runtime-overrides-e2e

github-actions · 2026-05-29T12:59:53Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh is the sandbox entrypoint used by repo-current Docker onboarding. The changes touch startup-time mutable config hash refresh, sandbox step-down execution, auth-profile setup, and gateway-token/runtime-env ordering; the baseline Ubuntu OpenClaw scenario is the smallest routed scenario that starts the sandbox/gateway and exercises this surface.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Optional platform coverage for the same repo-current OpenClaw startup path under WSL. Useful because the changed shell entrypoint and privilege/ownership handling can be sensitive to host/container environment, but it requires the special Windows/WSL runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
gpu-repo-local-ollama-openclaw: Optional adjacent coverage for the same sandbox startup path with local Ollama/GPU provider wiring. This is not the primary target and uses a special GPU runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Relevant changed files

scripts/nemoclaw-start.sh

github-actions · 2026-05-29T13:02:50Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: ensure_mutable_openclaw_config_hash root-mode step-down: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: scripts/nemoclaw-start.sh now builds a run_prefix from STEP_DOWN_PREFIX_SANDBOX when id -u is 0 and runs sha256sum through that prefix; tests simulate the permission failure and recovery but do not exercise the real container boundary.
Direct-root capability-drop startup is still covered by a simulated harness, not the real runtime boundary (test/nemoclaw-start.test.ts:4116): The added composition harness now chains the relevant production helpers and asserts the linked issue clauses for config hash refresh, OPENCLAW_GATEWAY_TOKEN export, no missing-token output, no heredoc-fi syntax error, read-only RC files, and continuation. However, this is sandbox lifecycle and privilege-boundary code, and the strongest test still stubs uid=0, STEP_DOWN_PREFIX_SANDBOX, lock_rc_files, auth-profile functions, and the CAP_DAC_OVERRIDE-dropped permission behavior instead of exercising the actual container entrypoint, setpriv/gosu boundary, file ownership, gateway startup, or openclaw tui smoke described by the issues. This leaves both the acceptance evidence and security-testing confidence partial for the real direct-root runtime path.
- Recommendation: Add or identify a targeted runtime/integration validation for the changed entrypoint path under the real direct-root CAP_DAC_OVERRIDE-dropped environment. Keep it scoped to this PR: startup should refresh or preserve /sandbox/.openclaw/.config-hash, write /tmp/nemoclaw-proxy-env.sh with OPENCLAW_GATEWAY_TOKEN, avoid the prior heredoc syntax and missing-token outputs, preserve static RC files as read-only, and reach the running/continuation path.
- Evidence: Issue [Ubuntu 24.04][Sandbox] direct root sandbox startup fails refreshing mutable OpenClaw config hash #4498 expects the direct root entrypoint path to start, refresh or preserve /sandbox/.openclaw/.config-hash, export OPENCLAW_GATEWAY_TOKEN through /tmp/nemoclaw-proxy-env.sh, and avoid "Missing gateway auth token". Issue [Ubuntu 24.04][Security] CAP_DAC_OVERRIDE-dropped startup path fails with nemoclaw-start syntax error #4512 expects the CAP_DAC_OVERRIDE-dropped direct startup path to complete without shell syntax errors, keep static RC files read-only, prepare token/proxy env, and keep the container running. The added direct-root composition test in test/nemoclaw-start.test.ts stubs the runtime boundary while scripts/nemoclaw-start.sh changes the real entrypoint path.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: ensure_mutable_openclaw_config_hash root-mode step-down: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: scripts/nemoclaw-start.sh now builds a run_prefix from STEP_DOWN_PREFIX_SANDBOX when id -u is 0 and runs sha256sum through that prefix; tests simulate the permission failure and recovery but do not exercise the real container boundary.
Direct-root capability-drop startup is still covered by a simulated harness, not the real runtime boundary (test/nemoclaw-start.test.ts:4116): The added composition harness now chains the relevant production helpers and asserts the linked issue clauses for config hash refresh, OPENCLAW_GATEWAY_TOKEN export, no missing-token output, no heredoc-fi syntax error, read-only RC files, and continuation. However, this is sandbox lifecycle and privilege-boundary code, and the strongest test still stubs uid=0, STEP_DOWN_PREFIX_SANDBOX, lock_rc_files, auth-profile functions, and the CAP_DAC_OVERRIDE-dropped permission behavior instead of exercising the actual container entrypoint, setpriv/gosu boundary, file ownership, gateway startup, or openclaw tui smoke described by the issues. This leaves both the acceptance evidence and security-testing confidence partial for the real direct-root runtime path.
- Recommendation: Add or identify a targeted runtime/integration validation for the changed entrypoint path under the real direct-root CAP_DAC_OVERRIDE-dropped environment. Keep it scoped to this PR: startup should refresh or preserve /sandbox/.openclaw/.config-hash, write /tmp/nemoclaw-proxy-env.sh with OPENCLAW_GATEWAY_TOKEN, avoid the prior heredoc syntax and missing-token outputs, preserve static RC files as read-only, and reach the running/continuation path.
- Evidence: Issue [Ubuntu 24.04][Sandbox] direct root sandbox startup fails refreshing mutable OpenClaw config hash #4498 expects the direct root entrypoint path to start, refresh or preserve /sandbox/.openclaw/.config-hash, export OPENCLAW_GATEWAY_TOKEN through /tmp/nemoclaw-proxy-env.sh, and avoid "Missing gateway auth token". Issue [Ubuntu 24.04][Security] CAP_DAC_OVERRIDE-dropped startup path fails with nemoclaw-start syntax error #4512 expects the CAP_DAC_OVERRIDE-dropped direct startup path to complete without shell syntax errors, keep static RC files read-only, prepare token/proxy env, and keep the container running. The added direct-root composition test in test/nemoclaw-start.test.ts stubs the runtime boundary while scripts/nemoclaw-start.sh changes the real entrypoint path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

…t coverage Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-29T13:49:00Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26640502810
Target ref: f465e27888e79d2c26c7dea71ef12589ef7255fc
Workflow ref: main
Requested jobs: runtime-overrides-e2e,sandbox-survival-e2e,openclaw-inference-switch-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
openclaw-inference-switch-e2e	✅ success
runtime-overrides-e2e	✅ success
sandbox-survival-e2e	✅ success

github-actions · 2026-05-29T13:51:28Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26640810179
Target ref: fix/4498-4512-direct-root-startup
Requested jobs: shields-config-e2e,runtime-overrides-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

…cation snippet contract Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

test/nemoclaw-start.test.ts (1)

4001-4006: 💤 Low value

Minor: Redundant HASH_PATH environment variable.

HASH_PATH is already assigned inline in the script at line 3993, so passing it again via env at line 4002 is redundant. The inline assignment takes precedence within the bash script. Consider removing the env override for clarity.

      const result = spawnSync("bash", [scriptPath], {
        encoding: "utf-8",
-       env: { ...process.env, HASH_PATH: hashPath },
+       env: process.env,
        timeout: 5000,
      });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/nemoclaw-start.test.ts` around lines 4001 - 4006, Remove the redundant
environment override of HASH_PATH when spawning the script: in the test where
spawnSync("bash", [scriptPath], { encoding: "utf-8", env: { ...process.env,
HASH_PATH: hashPath }, timeout: 5000 }) is used, delete the HASH_PATH entry from
the env object (or remove the env override entirely) because the script already
sets HASH_PATH inline (see scriptPath content around line 3993); keep spawnSync,
scriptPath, hashPath and timeout unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 4001-4006: Remove the redundant environment override of HASH_PATH
when spawning the script: in the test where spawnSync("bash", [scriptPath], {
encoding: "utf-8", env: { ...process.env, HASH_PATH: hashPath }, timeout: 5000
}) is used, delete the HASH_PATH entry from the env object (or remove the env
override entirely) because the script already sets HASH_PATH inline (see
scriptPath content around line 3993); keep spawnSync, scriptPath, hashPath and
timeout unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 53897a42-f883-4077-8213-654b41fb7112

📥 Commits

Reviewing files that changed from the base of the PR and between f465e27 and ab1d604.

📒 Files selected for processing (2)

scripts/nemoclaw-start.sh
test/nemoclaw-start.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/nemoclaw-start.sh

github-actions · 2026-05-29T14:28:50Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26642732678
Target ref: ab1d604b6cab3d70dddf5e6a8b5e6e7264844acc
Workflow ref: main
Requested jobs: runtime-overrides-e2e,shields-config-e2e,openclaw-onboard-security-posture-e2e,credential-sanitization-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job	Result
credential-sanitization-e2e	✅ success
openclaw-onboard-security-posture-e2e	✅ success
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

…ce-shape test Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-29T14:56:22Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26644221816
Target ref: a773602d6d5676b595adb3d342f1232479ce15db
Workflow ref: main
Requested jobs: runtime-overrides-e2e,device-auth-health-e2e,shields-config-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
device-auth-health-e2e	✅ success
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

jyaunches

Thanks for the careful split — the helper-level coverage and the heredoc regression test are solid. Holding on merge until the PR Review Advisor "Needs attention" finding is closed.

Unresolved advisor finding

Direct-root CAP_DAC_OVERRIDE-dropped startup acceptance is not validated end-to-end.

The new tests exercise each helper in isolation:

run_step_down_as_sandbox happy / failure / heredoc-pair
setup_auth_profile_as_sandbox HOME=/sandbox propagation
ensure_mutable_openclaw_config_hash root-mode step-down + EACCES precondition

…but nothing chains them into the uid=0 + dropped-cap startup path that #4498 / #4512 actually hit. The advisor's ask, scoped to this PR, is one assertion that the direct-root entrypoint:

Passes the mutable hash refresh (/sandbox/.openclaw/.config-hash written 660 sandbox:sandbox).
Writes /tmp/nemoclaw-proxy-env.sh with OPENCLAW_GATEWAY_TOKEN exported.
Does not emit "Missing gateway auth token".
Does not emit "syntax error near unexpected token `fi'".
Keeps /sandbox/.bashrc and /sandbox/.profile read-only.
Reaches the expected running/continuation path.

A grep across test/nemoclaw-start.test.ts finds no "Missing gateway auth token" assertion and no combined direct-root + dropped-cap harness, so the regression the linked issues describe could re-land without these unit tests catching it.

What would unblock

Either of:

(preferred) A single integration-shaped test in test/nemoclaw-start.test.ts that drives the direct-root entrypoint with CAP_DAC_OVERRIDE dropped (or a stub mirroring it) and asserts the six clauses above.
(alternative) A written justification on this PR for why the per-helper unit tests are sufficient — specifically why the helpers' composition cannot regress independently — plus a pointer to the existing E2E job that exercises the dropped-cap uid=0 path. The selective runtime-overrides-e2e / shields-config-e2e / device-auth-health-e2e runs passing here is encouraging but does not, on its face, demonstrate it covers the dropped-cap direct-root path.

Also

CodeRabbit nit at test/nemoclaw-start.test.ts:4001-4006: HASH_PATH is redundant in the env block since it's already inline at line 3993. Low value, fold in if convenient.

CI is otherwise clean (selective E2E green, CodeQL, ShellCheck, unit-vitest all pass). Once the advisor finding is closed I'm happy to re-review quickly.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…root-startup

github-actions · 2026-05-30T04:46:16Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26674696016
Target ref: d9023fe467adaaec75602c03c5efcaa9560c1bd8
Workflow ref: main
Requested jobs: runtime-overrides-e2e,openclaw-onboard-security-posture-e2e,shields-config-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
openclaw-onboard-security-posture-e2e	✅ success
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

…composition harness Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-30T05:25:11Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26675498624
Target ref: 19e7997e0cdb025c2df5b042dac04839e886695f
Workflow ref: main
Requested jobs: cloud-e2e,shields-config-e2e,runtime-overrides-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
cloud-e2e	✅ success
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

fix(scripts): preserve direct-root nemoclaw-start under CAP_DAC_OVERR…

ce325d1

…IDE drop Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the fix label May 29, 2026

fix(scripts): tighten step-down temp path and broaden #4498/#4512 tes…

f465e27

…t coverage Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(scripts): force HOME=/sandbox on root step-down and lock the invo…

ab1d604

…cation snippet contract Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

refactor(scripts): extract setup_auth_profile_as_sandbox to drop sour…

a773602

…ce-shape test Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the v0.0.55 Release target label May 29, 2026

jyaunches requested changes May 29, 2026

View reviewed changes

jyaunches added R1 v0.0.56 Release target and removed v0.0.55 Release target R1 labels May 29, 2026

laitingsheng added 2 commits May 30, 2026 04:34

test(scripts): add direct-root entrypoint composition harness

e1283f1

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Merge remote-tracking branch 'origin/main' into fix/4498-4512-direct-…

d9023fe

…root-startup

test(scripts): exercise prepare_gateway_token_for_current_command in …

19e7997

…composition harness Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng requested a review from jyaunches May 30, 2026 05:38

Conversation

laitingsheng commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented May 29, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 29, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 29, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 29, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

jyaunches left a comment

Choose a reason for hiding this comment

Unresolved advisor finding

What would unblock

Also

Uh oh!

github-actions Bot commented May 30, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 30, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laitingsheng commented May 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading