fix(scripts): preserve direct-root nemoclaw-start under CAP_DAC_OVERRIDE drop#4528
fix(scripts): preserve direct-root nemoclaw-start under CAP_DAC_OVERRIDE drop#4528laitingsheng wants to merge 7 commits into
Conversation
…IDE drop Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds run_step_down_as_sandbox(), uses it to run auth profile setup and template seeding as the sandbox user, routes .config-hash refresh through the sandbox when running as root, and expands tests to cover helper dispatch, cleanup on failure, heredoc safety, HOME override, and root/non-root hash refresh behavior. ChangesSandbox step-down helper and permission fix
Sequence Diagram(s)sequenceDiagram
participant Entrypoint
participant run_step_down_as_sandbox
participant TempScript
participant STEP_DOWN_PREFIX_SANDBOX
participant SandboxProcess
Entrypoint->>run_step_down_as_sandbox: invoke helper with functions + snippet
run_step_down_as_sandbox->>TempScript: write declare -f bodies + invocation
run_step_down_as_sandbox->>STEP_DOWN_PREFIX_SANDBOX: execute TempScript path
STEP_DOWN_PREFIX_SANDBOX->>SandboxProcess: run TempScript as sandbox user
SandboxProcess-->>run_step_down_as_sandbox: return exit status
run_step_down_as_sandbox->>TempScript: unlink temp file
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 2 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26640502810
|
Selective E2E Results — ✅ All requested jobs passedRun: 26640810179
|
…cation snippet contract Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/nemoclaw-start.test.ts (1)
4001-4006: 💤 Low valueMinor: Redundant
HASH_PATHenvironment variable.
HASH_PATHis already assigned inline in the script at line 3993, so passing it again viaenvat line 4002 is redundant. The inline assignment takes precedence within the bash script. Consider removing the env override for clarity.const result = spawnSync("bash", [scriptPath], { encoding: "utf-8", - env: { ...process.env, HASH_PATH: hashPath }, + env: process.env, timeout: 5000, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/nemoclaw-start.test.ts` around lines 4001 - 4006, Remove the redundant environment override of HASH_PATH when spawning the script: in the test where spawnSync("bash", [scriptPath], { encoding: "utf-8", env: { ...process.env, HASH_PATH: hashPath }, timeout: 5000 }) is used, delete the HASH_PATH entry from the env object (or remove the env override entirely) because the script already sets HASH_PATH inline (see scriptPath content around line 3993); keep spawnSync, scriptPath, hashPath and timeout unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 4001-4006: Remove the redundant environment override of HASH_PATH
when spawning the script: in the test where spawnSync("bash", [scriptPath], {
encoding: "utf-8", env: { ...process.env, HASH_PATH: hashPath }, timeout: 5000
}) is used, delete the HASH_PATH entry from the env object (or remove the env
override entirely) because the script already sets HASH_PATH inline (see
scriptPath content around line 3993); keep spawnSync, scriptPath, hashPath and
timeout unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 53897a42-f883-4077-8213-654b41fb7112
📒 Files selected for processing (2)
scripts/nemoclaw-start.shtest/nemoclaw-start.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/nemoclaw-start.sh
Selective E2E Results — ✅ All requested jobs passedRun: 26642732678
|
…ce-shape test Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26644221816
|
jyaunches
left a comment
There was a problem hiding this comment.
Thanks for the careful split — the helper-level coverage and the heredoc regression test are solid. Holding on merge until the PR Review Advisor "Needs attention" finding is closed.
Unresolved advisor finding
Direct-root CAP_DAC_OVERRIDE-dropped startup acceptance is not validated end-to-end.
The new tests exercise each helper in isolation:
run_step_down_as_sandboxhappy / failure / heredoc-pairsetup_auth_profile_as_sandboxHOME=/sandboxpropagationensure_mutable_openclaw_config_hashroot-mode step-down + EACCES precondition
…but nothing chains them into the uid=0 + dropped-cap startup path that #4498 / #4512 actually hit. The advisor's ask, scoped to this PR, is one assertion that the direct-root entrypoint:
- Passes the mutable hash refresh (
/sandbox/.openclaw/.config-hashwritten660 sandbox:sandbox). - Writes
/tmp/nemoclaw-proxy-env.shwithOPENCLAW_GATEWAY_TOKENexported. - Does not emit
"Missing gateway auth token". - Does not emit
"syntax error near unexpected token `fi'". - Keeps
/sandbox/.bashrcand/sandbox/.profileread-only. - Reaches the expected running/continuation path.
A grep across test/nemoclaw-start.test.ts finds no "Missing gateway auth token" assertion and no combined direct-root + dropped-cap harness, so the regression the linked issues describe could re-land without these unit tests catching it.
What would unblock
Either of:
- (preferred) A single integration-shaped test in
test/nemoclaw-start.test.tsthat drives the direct-root entrypoint withCAP_DAC_OVERRIDEdropped (or a stub mirroring it) and asserts the six clauses above. - (alternative) A written justification on this PR for why the per-helper unit tests are sufficient — specifically why the helpers' composition cannot regress independently — plus a pointer to the existing E2E job that exercises the dropped-cap uid=0 path. The selective
runtime-overrides-e2e/shields-config-e2e/device-auth-health-e2eruns passing here is encouraging but does not, on its face, demonstrate it covers the dropped-cap direct-root path.
Also
- CodeRabbit nit at
test/nemoclaw-start.test.ts:4001-4006:HASH_PATHis redundant in theenvblock since it's already inline at line 3993. Low value, fold in if convenient.
CI is otherwise clean (selective E2E green, CodeQL, ShellCheck, unit-vitest all pass). Once the advisor finding is closed I'm happy to re-review quickly.
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26674696016
|
…composition harness Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26675498624
|
Summary
Restore the direct-root startup path of
scripts/nemoclaw-start.shunder runtimes that dropCAP_DAC_OVERRIDE. Two distinct failures shared the same trigger and the same script, so they are bundled here.Related Issue
Fixes #4498
Fixes #4512
Changes
ensure_mutable_openclaw_config_hashnow routes thesha256sumwrite throughSTEP_DOWN_PREFIX_SANDBOXwhen running as root. The.config-hashis660 sandbox:sandboxin mutable-default mode, so withoutCAP_DAC_OVERRIDEthe previous in-process redirection aborted withEACCES. Letting the file's owner perform the write keeps the same on-disk shape while staying inside the dropped cap set.run_step_down_as_sandbox, which serialises the named functions plus the trailing invocation into a helper script under/tmp(sticky-bit, unlink-protected, unguessablemktempsuffix; the directory is no longer configurable from production env) and runsbash <file>through the step-down prefix. The step-down shell reads the literal source bytes from disk, so the heredoc-bearing function bodies no longer have to survive thebash -c "$(declare -f ...) ..."argv round-trip that aborted withsyntax error near unexpected token 'fi'. The helper documents a trusted-literal-only contract on the invocation snippet.bash -c "$(declare -f ...) ..."call sites withrun_step_down_as_sandboxinvocations. The auth-profile entry point is now wrapped insetup_auth_profile_as_sandbox, which exportsHOME=/sandboxbefore dispatching —setprivpreserves the parent shell's environment, and without thiswrite_auth_profile's~/.openclaw/...expansion would target/root.test/nemoclaw-start.test.tswith:run_step_down_as_sandboxhappy-path cleanup, body-failure cleanup with exit-code propagation, and a regression that exercises two adjacent<<'TAG'heredoc-bearing functions (mirrors the exact shape that triggered the earlier syntax error).setup_auth_profile_as_sandboxobserved-behaviour assertion thatHOME=/sandboxreaches the dispatched function even when the parent env carriesHOME=/root.ensure_mutable_openclaw_config_hashroot-mode step-down (uid=0 routes through the prefix), non-root pass-through, stale-hash overwrite, and an EACCES-precondition test that pre-creates a read-only.config-hash, asserts the direct redirection genuinely fails with "permission denied", then runs the production function under a step-down stub that mirrors the owner-uid recovery.ensure_mutable_openclaw_config_hash→ensure_gateway_token→export_gateway_token→write_runtime_shell_env→lock_rc_files→setup_auth_profile_as_sandboxunder a uid=0 stub and a step-down prefix mirroring theCAP_DAC_OVERRIDE-dropped effective ownership. Asserts the six round-4 review clauses: freshsha256in.config-hashat mode660,OPENCLAW_GATEWAY_TOKENexported in/tmp/nemoclaw-proxy-env.sh, noMissing gateway auth token, nosyntax error near unexpected token 'fi',.bashrc/.profilelocked to0444, continuation reached.E2E / Nightlywas dispatched against this branch withjobs=shields-config-e2e,runtime-overrides-e2e,device-auth-health-e2e; all three selective jobs (plus the surrounding suite) passed.Signed-off-by: Tinson Lai tinsonl@nvidia.com
Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Summary by CodeRabbit
New Features
Bug Fixes
Tests