Add Initializing state and optimize speed to init#136
Conversation
✱ Stainless preview buildsThis PR will update the
|
There was a problem hiding this comment.
Automated risk assessment for this PR: Medium-High risk.
Why this risk level (from code diff evidence):
- Large behavioral change in production lifecycle logic across
lib/instances/*(newInitializingstate and state-derivation path changes). - Infrastructure/startup-path modifications in
lib/system/init/*(async kernel-header worker, exec/systemd boot gating, service injection). - Cross-surface impact (
openapi.yaml, generatedlib/oapi/oapi.go, API/runtime behavior) with broad blast radius.
Decision:
- Code review is required.
- Per policy for Medium-High risk PRs, I did not self-approve.
- Requested reviewers:
@hiroTamadaand@rgarcia.
…alizing-concurrency # Conflicts: # lib/instances/qemu_test.go # lib/network/allocate.go # lib/oapi/oapi.go
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 5 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
hiroTamada
left a comment
There was a problem hiding this comment.
Thorough and well-structured PR. The Initializing state model is clean, boot marker detection is robust (throttled, time-bounded, rotation-aware), and the latency optimizations are meaningful — async kernel headers, parallel init stages, and event-driven agent readiness gate.
Two inline comments flagged:
- Boot marker filtering in
kernel/kernel: The new sentinels (HYPEMAN-PROGRAM-START,HYPEMAN-AGENT-READY,HYPEMAN-HEADERS-*) will flow throughapp.log→ s2 stream → user-facing logs in builder VMs.shouldEmitBuildLogLineinkernel/kernelneeds to filter these out — tracked as a follow-up. - No metric for time spent in
Initializing: Could be a useful operational signal for detecting slow boots — optional follow-up.
LGTM.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 5 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
sjmiller609
left a comment
There was a problem hiding this comment.
Submitting stale pending review so follow-up thread replies can be posted.


Summary
Add Initializing state
Speed up Initializing
START/READY/FAILED)configureNetwork+mountVolumes) with staged barriersnetwork.targetdependency fromhypeman-agent.serviceTestRegistryPushAndCreateInstanceto use a long-running command (sleep infinity) under stricter Running semanticsInitializing Performance
Measured on
deft-kernel-devwith the same 5-run harness (TestMeasureCreateToRunning5Runs, minimal workload:alpine,cmd=["sleep","infinity"], networking disabled), comparing this branch tocodex/initializing-state-readiness-gate:codex/initializing-state-readiness-gate): 2077mscodex/max-speed-initializing-concurrency): 614msRun samples:
[1958, 1965, 2077, 2116, 2158][596, 601, 614, 634, 728]Note
High Risk
Changes the core VM lifecycle/state model and guest boot orchestration (readiness gating, marker parsing/persistence, async work), which can affect API semantics and operational flows if markers or timing behave unexpectedly across environments.
Overview
Adds a new public instance lifecycle state,
Initializing, and changes state derivation soRunningis only reported once the guest emits boot progress sentinels (program start and, unlessskip_guest_agent=true, guest-agent ready). Instance CRUD/liveness/resource/network behaviors are updated to treatInitializingas VMM-active, allow stop/delete duringInitializing, preserve TAPs for initializing VMs, and propagate the new enum through OpenAPI/SDK generation.Reworks guest startup to reduce time-to-
Running: exec mode now uses an event-driven guest-agent readiness FD handshake (no polling) before launching the entrypoint, kernel-headers setup is moved off the critical path (async worker in exec mode, injected oneshot in systemd mode) with status tracking and serial sentinels, and independent init steps (network + volume mounts) run in parallel. Tests and CI are adjusted for the new immediate semantics (Create/Start/Restore may returnInitializing), including stronger waiting helpers, QEMU availability checks, and more robust apt/QEMU tooling setup.Written by Cursor Bugbot for commit 192f095. This will update automatically on new commits. Configure here.