cloudhypervisor: avoid sparse serial logs after rotation#211
Closed
alityb wants to merge 1 commit intokernel:mainfrom
Closed
cloudhypervisor: avoid sparse serial logs after rotation#211alityb wants to merge 1 commit intokernel:mainfrom
alityb wants to merge 1 commit intokernel:mainfrom
Conversation
Cloud Hypervisor's serial.mode=File opens the log path without O_APPEND, so after copytruncate rotation the next write lands at the stale fd offset and creates a sparse NUL hole. This is the same class of bug fixed for QEMU in kernel#209. Switch serial config to Socket mode so Cloud Hypervisor binds a Unix listener instead of opening the file directly. Hypeman connects to this socket after CreateVM (or Restore) and copies serial output into app.log through an O_APPEND file descriptor it owns. Post-truncate writes correctly resume at byte 0. Also updates fork/snapshot config rewriting to upgrade existing File-mode serial entries to Socket mode, ensuring forked instances get the fix. Old snapshots with File-mode serial restore gracefully (no logger is started; serial output goes to the existing file path as before).
|
Firetiger deploy monitoring skipped This PR didn't match the auto-monitor filter configured on your GitHub connection:
Reason: PR modifies Cloud Hypervisor hypervisor integration in packages/api/lib/hypervisor, not the kernel API endpoints (packages/api/cmd/api/) or Temporal workflows (packages/api/lib/temporal) specified in the filter. To monitor this PR anyway, reply with |
Collaborator
|
Thanks but already got this going over here #210 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cloud Hypervisor's
serial.mode=Fileopens the log path withoutO_APPEND, so after copytruncate rotation the next write lands at the stale fd offset and creates a sparse NUL hole. This is the same class of bug fixed for QEMU in #209.Socketmode so CH binds a Unix listener instead of opening the file directlyCreateVM(orRestore) and copies serial output intoapp.logthrough anO_APPENDfd it ownsDesign
Cloud Hypervisor's serial socket mode (
ConsoleOutputMode::Socket) creates aUnixListener. Hypeman dials this socket after the VM is configured (before boot) and runs a backgroundio.Copyinto the append-mode log file. When the VM shuts down or is killed, the socket closes and the copy goroutine exits naturally.The serial socket is placed at
{instance_dir}/serial.sock(not insidelogs/) to keep the Unix socket path well within the 108-bytesun_pathlimit.Backward compatibility
File-mode serial restore gracefully —serialLogPathsFromSnapshotreturns empty strings, no logger is started, and CH uses its existing file pathFile→Socketmode, so forked instances get the fix automaticallyTest plan
go test ./lib/hypervisor/cloudhypervisor/...— config generation, snapshot rewriting, socket logger lifecyclego test -race -count=50 ./lib/hypervisor/cloudhypervisor— no racesgo test ./lib/hypervisor/...— no regressions across all hypervisorsgo test ./lib/instances -run 'TestAppLog|TestParseBootMarkers|TestHydrateBootMarkers|TestLog'rotateLogIfNeeded, verify no sparse NUL hole in post-rotationapp.logNote
Medium Risk
Changes Cloud Hypervisor VM start/restore and snapshot-rewrite behavior by switching serial output from direct file writes to a managed Unix-socket logger; failures or path issues could impact VM boot/restore or log capture. Scope is localized to Cloud Hypervisor but touches lifecycle-critical code paths.
Overview
Switches Cloud Hypervisor serial logging from
FiletoSocketmode to avoid sparse/holedapp.logoutput after log rotation/truncation.Hypeman now dials the CH-provided
serial.sockafterCreateVM/Restore, streams serial output intologs/app.logvia anO_APPENDfile descriptor, removes any stale socket path before boot/restore, and closes the logger on shutdown/boot failure.Forked snapshot
config.jsonrewriting is updated to convert serial config to socket mode (and dropfile), with new helpers and tests covering path derivation, snapshot parsing/back-compat, retry/timeout dialing, and logger lifecycle/idempotent close.Reviewed by Cursor Bugbot for commit 8b24d2c. Bugbot is set up for automated code reviews on this repo. Configure here.