Summary
A brev create invocation whose API response was truncated with unexpected EOF left the workspace in a state where the server-side build job was never enqueued. The VM was provisioned and reaches status=RUNNING, but build_status stays empty forever and the post-cloud-init setup-push (/opt/setup.sh, /etc/systemd/system/instance-oneshot.service, hostname rewrite to brev-${BREV_ENV_ID}, /etc/brev/metadata.json) never runs.
After 1+ hour the workspace is still in this stuck state. The same command on a different name produces a healthy workspace.
Reproduction
- brev CLI version: v0.6.322
- Org:
vanguard-programming
- Workspace name:
gr-manager
- Workspace ID:
hclv1344v
- Approximate timestamp of the failed create: 2026-05-01 17:43:46 UTC
Command (run from a logged-in dev box):
brev create gr-manager --type n2d-standard-8 --min-disk 589 --detached
CLI output on the first invocation:
[Worker 1] Trying n2d-standard-8 for instance 'gr-manager'...
2026/05/01 17:43:46 WARN RESTY Post "https://brevapi.us-west-2-prod.control-plane.brev.dev/api/organizations/org-31CQRkwV2DY20PnKdmX92lJgtlm/workspaces?cli_version=v0.6.322&local=true&os=linux&utm_source=cli": unexpected EOF, Attempt 1
2026/05/01 17:43:46 ERROR RESTY Post "...": unexpected EOF
[Worker 1] n2d-standard-8 Failed: ... unexpected EOF
Warning: Only created 0/1 instances
could only create 0/1 instances
A second invocation a few seconds later returned duplicate workspace with name gr-manager, confirming the workspace row was created server-side despite the truncated response.
Resulting state
brev ls for the workspace:
NAME STATUS BUILD SHELL ID MACHINE GPU
gr-manager RUNNING NOT READY hclv1344v n2d-standard-8 -
brev ls --json:
{
"name": "gr-manager",
"status": "RUNNING",
"build_status": "",
"shell_status": "NOT READY",
"health_status": "HEALTHY"
}
On the VM:
- Cloud-init completed at 17:44:48 UTC.
boot-finished exists, cloud-init status is done.
/opt/setup.sh: does not exist.
/etc/systemd/system/instance-oneshot.service: does not exist.
/etc/brev/metadata.json: does not exist.
/etc/hostname: still the GCE-default gr-manager-inst-3d8ij8ob5qwnwtaq3pxwc209nru (file mtime is the base-image bake date, never rewritten).
BREV_ENV_ID: unset system-wide.
- Brev's control plane (source
52.13.205.207) successfully SSHes in as ubuntu every ~12-13 seconds; auth.log has dozens of Accepted publickey entries. None of those sessions ever execute the tee /opt/setup.sh / tee /etc/systemd/system/instance-oneshot.service / systemctl start instance-oneshot chain that healthy workspaces show.
- 1+ hour after creation, none of the above has changed.
Healthy comparison
Three other workspaces (test, test2, gr-manager2, all n2d-standard-8 in the same vanguard-programming org, the last two created with the identical --type n2d-standard-8 --min-disk 589 --detached flag combination, the only difference being a clean create with no EOF) reached BUILD=COMPLETED + SHELL=READY within a few minutes. On those VMs auth.log shows a sequence like:
sudo[1832]: ubuntu : COMMAND=/usr/bin/tee -a /opt/setup.sh
sudo[1834]: ubuntu : COMMAND=/usr/bin/chmod +x /opt/setup.sh
sudo[1868]: ubuntu : COMMAND=/usr/bin/tee /etc/systemd/instance-oneshot.env
... (writes the env file with environmentID='<the workspace id>')
sudo[1888]: ubuntu : COMMAND=/usr/bin/tee /etc/systemd/system/instance-oneshot.service
sudo[1923]: ubuntu : COMMAND=/usr/bin/systemctl start instance-oneshot
…running ~16 seconds after cloud-init finished. That sequence never fires on gr-manager.
Hypothesis
Server-side, the workspace creation flow appears to be:
- Insert workspace row.
- Provision VM.
- Enqueue post-cloud-init setup-push job.
The unexpected EOF (presumably from a transient network/proxy/upstream issue mid-response) likely interrupted the handler somewhere between step 1 and step 3. Step 1's effects persisted (workspace row exists, VM later got provisioned), but step 3 never happened, and there appears to be no reconciliation path that re-queues setup for an existing workspace whose build_status is empty.
The CLI side has no follow-up call after POST /workspaces (per pkg/cmd/gpucreate/gpucreate.go — createWorkspace makes a single store call), so the CLI can't usefully retry; the broken state is entirely server-side.
Suggested mitigations
- Make the create POST handler atomic: if any step fails after the row insert, roll back the row (or mark
build_status=FAILED so a subsequent brev reset --hard rebuilds rather than no-ops).
- Reconciler for workspaces with
status=RUNNING && build_status="" for >N minutes: re-enqueue setup, or transition to FAILED.
- CLI-side: when
brev create returns an error but a workspace with that name later turns out to exist, surface a clear "workspace was partially created — run brev reset --hard or brev delete" message.
Happy to provide more on-VM logs, journalctl output, or open an SSH session to gr-manager (workspace hclv1344v) for diagnostics — the instance is intentionally being kept around in this broken state until this is investigated.
Summary
A
brev createinvocation whose API response was truncated withunexpected EOFleft the workspace in a state where the server-side build job was never enqueued. The VM was provisioned and reachesstatus=RUNNING, butbuild_statusstays empty forever and the post-cloud-init setup-push (/opt/setup.sh,/etc/systemd/system/instance-oneshot.service, hostname rewrite tobrev-${BREV_ENV_ID},/etc/brev/metadata.json) never runs.After 1+ hour the workspace is still in this stuck state. The same command on a different name produces a healthy workspace.
Reproduction
vanguard-programminggr-managerhclv1344vCommand (run from a logged-in dev box):
CLI output on the first invocation:
A second invocation a few seconds later returned
duplicate workspace with name gr-manager, confirming the workspace row was created server-side despite the truncated response.Resulting state
brev lsfor the workspace:brev ls --json:{ "name": "gr-manager", "status": "RUNNING", "build_status": "", "shell_status": "NOT READY", "health_status": "HEALTHY" }On the VM:
boot-finishedexists,cloud-init statusisdone./opt/setup.sh: does not exist./etc/systemd/system/instance-oneshot.service: does not exist./etc/brev/metadata.json: does not exist./etc/hostname: still the GCE-defaultgr-manager-inst-3d8ij8ob5qwnwtaq3pxwc209nru(file mtime is the base-image bake date, never rewritten).BREV_ENV_ID: unset system-wide.52.13.205.207) successfully SSHes in asubuntuevery ~12-13 seconds;auth.loghas dozens ofAccepted publickeyentries. None of those sessions ever execute thetee /opt/setup.sh/tee /etc/systemd/system/instance-oneshot.service/systemctl start instance-oneshotchain that healthy workspaces show.Healthy comparison
Three other workspaces (
test,test2,gr-manager2, alln2d-standard-8in the samevanguard-programmingorg, the last two created with the identical--type n2d-standard-8 --min-disk 589 --detachedflag combination, the only difference being a clean create with no EOF) reachedBUILD=COMPLETED + SHELL=READYwithin a few minutes. On those VMsauth.logshows a sequence like:…running ~16 seconds after cloud-init finished. That sequence never fires on
gr-manager.Hypothesis
Server-side, the workspace creation flow appears to be:
The
unexpected EOF(presumably from a transient network/proxy/upstream issue mid-response) likely interrupted the handler somewhere between step 1 and step 3. Step 1's effects persisted (workspace row exists, VM later got provisioned), but step 3 never happened, and there appears to be no reconciliation path that re-queues setup for an existing workspace whosebuild_statusis empty.The CLI side has no follow-up call after
POST /workspaces(perpkg/cmd/gpucreate/gpucreate.go—createWorkspacemakes a single store call), so the CLI can't usefully retry; the broken state is entirely server-side.Suggested mitigations
build_status=FAILEDso a subsequentbrev reset --hardrebuilds rather than no-ops).status=RUNNING && build_status=""for >N minutes: re-enqueue setup, or transition toFAILED.brev createreturns an error but a workspace with that name later turns out to exist, surface a clear "workspace was partially created — runbrev reset --hardorbrev delete" message.Happy to provide more on-VM logs, journalctl output, or open an SSH session to
gr-manager(workspacehclv1344v) for diagnostics — the instance is intentionally being kept around in this broken state until this is investigated.