Skip to content

feat: default-disabled guest memory reclaim#129

Draft
sjmiller609 wants to merge 9 commits intomainfrom
codex/guestmemory-reclaim
Draft

feat: default-disabled guest memory reclaim#129
sjmiller609 wants to merge 9 commits intomainfrom
codex/guestmemory-reclaim

Conversation

@sjmiller609
Copy link
Collaborator

@sjmiller609 sjmiller609 commented Mar 7, 2026

Summary

  • add a new lib/guestmemory package with normalized policy defaults, deterministic kernel-arg merge, and runtime behavior docs
  • wire internal hypervisor.memory config into providers/instances without changing public API
  • add hypervisor-agnostic VMConfig.GuestMemory toggles and map them in Cloud Hypervisor, QEMU, Firecracker, and VZ backends
  • add manual-only guest-memory integration tests for Cloud Hypervisor, QEMU, Firecracker, and VZ plus dedicated make targets
  • update manual integration assertions to validate low idle host memory footprint (4GB guest) using Linux PSS (/proc/<pid>/smaps_rollup) and macOS RSS for VZ
  • add concise CLI A/B experiment docs and clarify preferred host-memory metric

Investigation Notes

  • RSS significantly over-reports for these VMMs in this scenario.
  • PSS reflects actual host pressure better for Linux hypervisors and showed low idle memory for CH/Firecracker, with QEMU having a larger fixed process overhead.

Testing

  • go test ./lib/guestmemory
  • go test ./lib/instances -run '^TestGuestMemoryPolicyVZ$'
  • remote uncached:
    • sudo env ... HYPEMAN_RUN_GUESTMEMORY_TESTS=1 go test -count=1 -run '^TestGuestMemoryPolicyCloudHypervisor$' ./lib/instances
    • sudo env ... HYPEMAN_RUN_GUESTMEMORY_TESTS=1 go test -count=1 -run '^TestGuestMemoryPolicyQEMU$' ./lib/instances
    • sudo env ... HYPEMAN_RUN_GUESTMEMORY_TESTS=1 go test -count=1 -run '^TestGuestMemoryPolicyFirecracker$' ./lib/instances
  • remote target:
    • make test-guestmemory-linux

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@cursor cursor bot requested review from hiroTamada and rgarcia March 7, 2026 06:17
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Risk Assessment

Risk level: Medium-High

Why this is not low risk

  1. The PR changes shared VM startup behavior across all hypervisors (Cloud Hypervisor, QEMU, Firecracker, VZ) by adding guest-memory policy wiring into instances + providers + backend config translation.
  2. It introduces new default-on runtime behavior (hypervisor.memory defaults enabled/reclaim on), which affects kernel args and balloon/reclaim device configuration for most instance launches.
  3. It modifies infrastructure-sensitive startup paths (notably QEMU retry/fallback startup sequencing and VZ balloon attach behavior), which increases regression risk and operational blast radius.

Decision

  • Code review is required.
  • Self-approval is not applied for this risk level.
  • Requested reviewers: @rgarcia, @hiroTamada.

Open in Web View Automation 

@sjmiller609 sjmiller609 marked this pull request as draft March 7, 2026 13:57
@sjmiller609 sjmiller609 changed the title feat: cross-hypervisor guest memory reclaim policy and manual integration tests feat: default-disabled guest memory reclaim Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant