Skip to content

chore: de-flake testing#130

Merged
sjmiller609 merged 9 commits intomainfrom
codex/stabilize-tap-cleanup
Mar 7, 2026
Merged

chore: de-flake testing#130
sjmiller609 merged 9 commits intomainfrom
codex/stabilize-tap-cleanup

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 7, 2026

Note

Medium Risk
Medium risk because it changes instance API serialization for the network field and alters when orphaned TAP devices are cleaned up, which can affect runtime networking behavior. CI changes are low risk but may fail builds if the sudo PATH/toolchain differs on runners.

Overview
CI: Adds a Linux workflow step to verify mkfs.erofs, mkfs.ext4, and iptables are available under sudo PATH before tests run, aiming to catch runner misconfiguration early.

Runtime: Updates instanceToOAPI to populate oapi.Instance.Network via JSON marshal/unmarshal (avoiding tight compile-time coupling to generated OpenAPI structs), and makes CleanupOrphanedTAPs skip cleanup when the running instance list is empty (not just nil) to avoid deleting TAPs from concurrent processes/tests.

Written by Cursor Bugbot for commit f3b95a1. This will update automatically on new commits. Configure here.

@sjmiller609 sjmiller609 changed the title chore: investigate if tests are flaky chore: de-flake testing Mar 7, 2026
@sjmiller609 sjmiller609 marked this pull request as ready for review March 7, 2026 18:32
@sjmiller609
Copy link
Collaborator Author

taking liberty to merge

@sjmiller609 sjmiller609 merged commit b7a4031 into main Mar 7, 2026
7 checks passed
@sjmiller609 sjmiller609 deleted the codex/stabilize-tap-cleanup branch March 7, 2026 18:35
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

// Skip cleanup when we don't have an authoritative running set.
// This avoids deleting TAPs created by other concurrent hypeman processes/tests.
if len(runningInstanceIDs) == 0 {
log.DebugContext(ctx, "skipping TAP cleanup (empty instance list)")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty slice now skips TAP cleanup breaking production intent

High Severity

Changing the guard from runningInstanceIDs == nil to len(runningInstanceIDs) == 0 breaks the production caller in cmd/api/main.go, which explicitly initializes preserveTAPs to []string{} (not nil) when there are no running VMs, with a comment stating "Initialize to empty slice (not nil) so cleanup runs even with no running VMs." The caller relies on the nil vs empty-slice distinction to differentiate "we don't know what's running, skip cleanup" from "nothing is running, clean up all orphaned TAPs." Now both cases skip cleanup, causing orphaned TAP devices to accumulate indefinitely.

Fix in Cursor Fix in Web

@cursor cursor bot requested review from hiroTamada and rgarcia March 7, 2026 18:36
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risk assessment: Medium (review required).

Evidence from the code diff:

  • cmd/api/api/instances.go: instance network mapping changed from direct struct assignment to map[string]any + json.Marshal/json.Unmarshal into oapiInst.Network, introducing behavioral/API-shape risk and a silent failure path if unmarshal fails.
  • lib/network/bridge_linux.go: TAP cleanup semantics changed from nil-only guard to len(...) == 0, which changes cleanup behavior when the running-instance list is empty.
  • .github/workflows/test.yml: CI/toolchain verification updates (lower runtime risk, but still part of overall change set).

Decision:

  • Not approving in this run because risk is Medium and production codepaths are modified.
  • Reviewer requests placed for @hiroTamada and @rgarcia.

Open in Web View Automation 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant