Skip to content

tests: wait for Running in cloud-hypervisor running-fork path#142

Closed
sjmiller609 wants to merge 36 commits intomainfrom
codex/max-speed-initializing-concurrency
Closed

tests: wait for Running in cloud-hypervisor running-fork path#142
sjmiller609 wants to merge 36 commits intomainfrom
codex/max-speed-initializing-concurrency

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 10, 2026

Summary

  • fix TestForkCloudHypervisorFromRunningNetwork flake caused by Initializing/Running race
  • wait for source instance to reach Running before running-state fork assertions
  • accept transient Initializing return from running fork, then wait to Running
  • wait for restored source to return to Running before network assertions

Validation

  • remote (deft-kernel-dev):
    • go test -count=8 -run '^TestForkCloudHypervisorFromRunningNetwork$' -tags containers_image_openpgp -timeout=45m ./lib/instances

Note

Low Risk
Low risk because changes are confined to test timing/expectations, reducing flakiness without altering production logic.

Overview
Improves TestForkCloudHypervisorFromRunningNetwork reliability by waiting for the source instance to reach StateRunning before asserting network reachability and fork behavior.

Adjusts the running-fork assertions to accept a transient StateInitializing return from ForkInstance, then polls until the fork reaches StateRunning, and similarly waits for the restored source to return to StateRunning before performing post-fork network checks.

Written by Cursor Bugbot for commit 32299fc. This will update automatically on new commits. Configure here.

…alizing-concurrency

# Conflicts:
#	lib/instances/qemu_test.go
#	lib/network/allocate.go
#	lib/oapi/oapi.go
cursor[bot]
cursor bot approved these changes Mar 10, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

require.NoError(t, err)
require.Contains(t, []State{StateInitializing, StateRunning}, forked.State)
forked, err = waitForInstanceState(ctx, manager, forked.Id, StateRunning, 20*time.Second)
require.NoError(t, err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forked instance cleanup registered after potential timeout failure

Low Severity

The forked instance's t.Cleanup is registered on line 410, after waitForInstanceState (20s timeout) and two require assertions. If any of those fail, the already-created forked instance leaks. This is inconsistent with the source instance pattern introduced in this same diff, where sourceID is captured and cleanup registered on lines 383–384 before calling waitForInstanceState. The fork's ID is available right after require.NoError on line 404 and the cleanup could be registered there, matching the source pattern.

Additional Locations (1)

Fix in Cursor Fix in Web

@sjmiller609
Copy link
Collaborator Author

duplicate PR, rebasing from main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant