Fix sandbox cleanup leak paths#1381
Draft
rasdani wants to merge 1 commit into
Draft
Conversation
| self._client_thread_ids.discard(thread_id) | ||
| return thread_id, True | ||
|
|
||
| def _close_thread_clients(self, wait: bool) -> None: |
There was a problem hiding this comment.
🟡 Medium utils/threaded_sandbox_client.py:133
_close_thread_clients submits one task per pending_thread_id, but ThreadPoolExecutor assigns tasks to arbitrary available workers—not necessarily the threads that own the clients. If workers T1-T3 have clients but T4-T6 pick up the close tasks, the barrier completes, those threads find no clients and return False, leaving the actual clients unclosed. Consider tracking which worker thread owns each client and targeting close operations to those specific threads.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/utils/threaded_sandbox_client.py around line 133:
`_close_thread_clients` submits one task per `pending_thread_id`, but `ThreadPoolExecutor` assigns tasks to arbitrary available workers—not necessarily the threads that own the clients. If workers T1-T3 have clients but T4-T6 pick up the close tasks, the barrier completes, those threads find no clients and return `False`, leaving the actual clients unclosed. Consider tracking which worker thread owns each client and targeting close operations to those specific threads.
Evidence trail:
verifiers/utils/threaded_sandbox_client.py lines 107-170 (REVIEWED_COMMIT): `_close_thread_client` checks thread-local `sandbox_client` on whatever thread runs it; `_close_thread_clients` submits N tasks but ThreadPoolExecutor picks arbitrary idle threads. verifiers/utils/thread_utils.py lines 1-10 (REVIEWED_COMMIT): `THREAD_LOCAL_STORAGE = threading.local()` is module-level, so each thread has its own view. Python docs: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor — no mechanism to target specific worker threads.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes four sandbox cleanup weak spots with focused fake-backed repro tests, without requiring real remote sandboxes or model inference.
ThreadedAsyncSandboxClient.teardown()now closes each worker thread's thread-localAsyncSandboxClientwithaclose()before executor shutdown, so keepalive sockets/fds are not left to interpreter teardown.SandboxMixin.create_sandbox()now handles cancellation during an in-flightcreate()by waiting for the shielded create result and deleting any sandbox that was created before re-raising cancellation, instead of scheduling an untracked orphan delete task.Environment.cleanup()andRubric.cleanup()now attempt all cleanup handlers and then re-raise the first cleanup failure, so later cleanup handlers such as sandbox destruction are not skipped.SandboxMixin.run_background_job()now classifiesprime_sandboxes.CommandTimeoutErroras a sandbox timeout: it setsstate["sandbox_timeout"] = Trueand raisesvf.SandboxErrorconsistently withSandboxTimeoutError.Tests / Repros Added
tests/test_threaded_sandbox_client.py::test_teardown_closes_per_thread_async_clientsaclose()during teardown.tests/test_sandbox_mixin.py::test_create_sandbox_cancellation_deletes_sandbox_created_after_cancelcreate()is pending, returns a fake sandbox after cancellation, blocks fake deletion, and asserts cancellation does not finish until deletion is drained.tests/test_decorator_ranks.py::TestCleanupPriorityOrdering::test_cleanup_failure_does_not_skip_later_handlertests/test_decorator_ranks.py::TestRubricCleanupTeardown::test_rubric_cleanup_failure_does_not_skip_later_handlertests/test_sandbox_mixin.py::test_run_background_job_command_timeoutsandbox_timeoutand raisevf.SandboxError.Validation
Local pre-push hooks also passed: ruff check, ruff format, Semgrep v1 policy, Sync AGENTS.md, and ty (ci parity).
Note
Fix sandbox cleanup leak paths by continuing after handler failures and closing per-thread clients on teardown
Environment.cleanupandRubric.cleanupnow catch exceptions from individual handlers, log them, continue running remaining handlers, and re-raise only the first exception after all handlers complete.ThreadedAsyncSandboxClienttracks which worker threads have created per-threadAsyncSandboxClientinstances and closes them duringteardown, preventing connection leaks on shutdown.SandboxMixin.run_background_jobnow catchesCommandTimeoutError, setsstate['sandbox_timeout'] = True, and raisesvf.SandboxErrorinstead of propagating the raw timeout error.📊 Macroscope summarized 246bf91. 4 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted
🗂️ Filtered Issues