Skip to content

fix: abort output listener task during VM cleanup to prevent hangs#264

Closed
claude-claude[bot] wants to merge 1 commit intofix-output-timeoutfrom
claude/fix-21744606892
Closed

fix: abort output listener task during VM cleanup to prevent hangs#264
claude-claude[bot] wants to merge 1 commit intofix-output-timeoutfrom
claude/fix-21744606892

Conversation

@claude-claude
Copy link
Contributor

@claude-claude claude-claude bot commented Feb 6, 2026

CI Fix

Fixes CI #21743522464

Problem

The test test_snapshot_clone_stress_100_rootless was failing after the removal of the 5-minute read timeout in the output listener (commit 2b0f27c).

While removing the timeout fixed large image imports (10+ minutes), it introduced a new issue: when VMs are killed abruptly during stress tests (spawning/killing 100 VMs rapidly), the vsock connection may not cleanly close, causing read_line() to block indefinitely. The output listener task would hang forever, preventing proper cleanup and causing resource exhaustion.

Solution

This PR ensures the output listener task is explicitly aborted during VM cleanup:

  1. Store the task handle: Changed _output_handle to output_handle (removed underscore prefix) so we can reference it later
  2. Add cleanup parameter: Added output_listener_handle parameter to cleanup_vm() function
  3. Abort on cleanup: Call handle.abort() when cleaning up VM resources

This maintains the benefit of supporting long-running operations (no artificial timeout) while ensuring tasks are properly terminated when VMs are killed, preventing hangs in stress test scenarios.

Testing

The fix allows:

  • Long-running image imports to complete without timeout
  • Stress tests with 100 concurrent VMs to properly clean up all tasks
  • Abrupt VM termination to forcefully terminate stuck readers

Generated by Claude | Fix Run

The removal of the 5-minute read timeout in run_output_listener allowed
long-running image imports to complete, but introduced a hang in stress
tests where 100 VMs are spawned/killed rapidly.

When VMs are killed abruptly, the vsock connection may not cleanly close,
causing read_line() to block indefinitely. This prevents the task from
terminating naturally.

Fix by:
1. Storing the output listener task handle (removing _ prefix)
2. Adding output_listener_handle parameter to cleanup_vm()
3. Explicitly aborting the task when cleanup_vm() is called

This ensures the listener is forcefully terminated during cleanup while
still allowing indefinite waits for legitimate long-running operations.
@ejc3
Copy link
Owner

ejc3 commented Feb 6, 2026

Cherry-picked into PR #263 (fix-output-timeout)

@ejc3 ejc3 closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant