[spark-compete] fix(security): keep revoke-all best-effort when a module manifest is broken#751
Open
banse wants to merge 1 commit into
Conversation
…ule manifest is broken
resolve_installed_modules_best_effort() only caught `except Exception`, but
load_module now raises SystemExit (a BaseException) on a missing/corrupt module
manifest. SystemExit escaped the guard, so a single broken installed module
aborted `spark security revoke-all` mid-flow -- after deleting secrets, before
disabling custom MCP / pausing missions / writing the support bundle -- defeating
the command's deliberate per-step fault tolerance. Catch SystemExit too so the
helper degrades to {} as intended. Adds a regression test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
packet
Schema:
spark-compete-hotfix-v1· Event:spark-compete-first-event· Submission: public_repo_prteam
The Dudes — His Dudeness, El Duderino, Duder. LLM device holder: His Dudeness (github
banse). GitHub accounts:banse.pr_author
banserepo
vibeforge1111/spark-cliactual_behavior
resolve_installed_modules_best_effort()exists to never fail because of a broken installed module — it wrapsresolve_installed_modules()intry/except Exception: return {}. A recent R24 commit (d2c6bc2, "Adopt R24 CLI provider and module reliability fixes") changedload_moduletoraise SystemExit(...)on a missing or corrupt module manifest.SystemExitis aBaseException, not anException, so it slips straight through theexcept Exceptionguard. As a result, a single installed module with a missing/corruptspark.tomlaborts callers that are supposed to degrade gracefully. The worst-affected caller isspark security revoke-all: it callsrotate_revoke_all_env_keys()anddisable_revoke_all_custom_mcp()directly, both of which callresolve_installed_modules_best_effort(). So the emergency credential-revocation command crashes with a rawSystemExitafter deleting stored secrets but before disabling custom MCP, pausing missions, or writing the support bundle — defeating the command's deliberate per-step fault tolerance (every other step returns{ok: false, error}instead of crashing).expected_behavior
resolve_installed_modules_best_effort()must honor its contract and return{}on any module-resolution failure, including theSystemExitthatload_modulenow raises, so a single broken module manifest cannot abortspark security revoke-all(or any other best-effort caller).repro_steps
spark.toml(e.g.installed.json={"ghost": {"path": "/path/with/no/spark.toml"}}).resolve_installed_modules_best_effort()(directly, or viaspark security revoke-all).SystemExit('Module manifest not found: …/spark.toml')instead of returning{}; inrevoke-allthis aborts the command mid-flow.python -m pytest tests/test_cli.py -k resolve_installed_modules_best_effort_survives_broken_manifest— fails before the fix (helper leaksSystemExit), passes after.before_after_proof
BEFORE (clean master
eda756e): the new test fails — theassertRaises(SystemExit)on the rawresolve_installed_modules()passes, thenresolve_installed_modules_best_effort()also raisesSystemExit('Module manifest not found: …/ghost-module/spark.toml')instead of returning{}(issubclass(SystemExit, Exception) == False). AFTER (this PR): the helper returns{}; the test passes. Full-suite baseline comparison on the same machine: clean master = 33 failed / 623 passed; with-fix = 33 failed / 624 passed (identical pre-existing env failures — macOS/tmpsymlink + TCC sandbox — and exactly +1 new passing test, 0 new failures).tests_or_smoke
python -m pytest tests/test_cli.py -k resolve_installed_modules_best_effort_survives_broken_manifest(fails pre-fix, passes post-fix). Fulltests/test_cli.py: +1 passing test, no new failures vs clean master (33 == 33 pre-existing env failures).duplicate_notes
No open PR/issue addresses the best-effort/
BaseExceptioninteraction. Re-checked open spark-cli PRs by ROOT CAUSE: #277/#592/#658 makeload_moduleitself report missing/malformed manifests — #277 in fact proposed theSystemExitthatd2c6bc2adopted; my finding is the inverse, that the adoptedSystemExitdefeats the pre-existingexcept Exceptionbest-effort guard and abortsrevoke-all. #745 only adds logging to silent handlers (different root cause). Root cause was introduced byd2c6bc2(confirmed ancestor of canonical mastereda756e), with no follow-up fix.risk_notes
Low risk, security-positive. One-line change: broadens the
exceptin theresolve_installed_modules_best_efforthelper fromExceptionto(Exception, SystemExit), restoring the helper's documented "degrade to{}" contract. It strictly adds fault tolerance — it cannot newly suppress a build or swallow a result the helper previously returned (the success path is unchanged).load_module's ownSystemExitbehavior on direct callers (e.g.spark start) is untouched. No installer/CI/dependency/secret surface modified.review_claim
Impact: medium (breaks fault tolerance of a security-emergency command). Evidence: failing_test, passing_test, redacted_terminal_excerpt. Requested review state:
pr_review. Scope: theresolve_installed_modules_best_efforthelper only;load_module's raising behavior and other callers are intentionally unchanged.