What
brev create <name> --type <T> sometimes exits with non-zero return code and a stderr body like:
WARN RESTY Post "https://brevapi.us-west-2-prod.control-plane.brev.dev/api/organizations/<org-id>/workspaces?cli_version=v0.6.323&local=true&os=linux&utm_source=cli": unexpected EOF, Attempt 1
ERROR RESTY Post "https://brevapi.us-west-2-prod.control-plane.brev.dev/api/organizations/<org-id>/workspaces?cli_version=v0.6.323&local=true&os=linux&utm_source=cli": unexpected EOF
[Worker 1] m8i-flex.2xlarge Failed: ... Post "...": unexpected EOF
Warning: Only created 0/1 instances
could only create 0/1 instances
…even though the workspace was actually created — it shows up in brev ls --json immediately after, with status=DEPLOYING then status=RUNNING after a few minutes, and is fully usable. The non-zero exit is therefore a false negative: the API call appears to have completed server-side but the client got an EOF before reading the success response.
When it happens
Observed multiple times across our automation today, on m8i-flex.2xlarge (AWS) and previously on n2d-standard-8 (GCP). Intermittent — probably 1-in-5 or so during peak hours. Repro is unfortunately just "run brev create enough times".
Why it matters for automation
Downstream tooling that treats brev create's exit code as truth concludes "create failed" and either:
- Tries to clean up by calling
brev delete <name> (which then succeeds, terminating the perfectly-good workspace).
- Marks an internal record (DB row, etc.) as
destroyed/failed, leaving the orphan VM in Brev that no automation will reconcile.
- Bails the whole pipeline before whatever was supposed to run on the new workspace.
Our workaround (in our deploy script and in the manager's BrevDriver):
async def _workspace_was_created(self, instance_name: str) -> bool:
try:
return (await self.lookup_env_id(instance_name)) is not None
except BrevError:
return False
# After any non-zero `brev create` exit, call _workspace_was_created.
# If the workspace is in `brev ls --json`, treat the create as a success
# (log a warning) and proceed. If it's not, surface the original error.
What would fix this on the CLI side
Pick whichever fits the architecture best:
- Server-side write idempotency + client retry. If
POST /workspaces is keyed by a client-supplied request ID, the CLI can retry on EOF without risking a duplicate workspace. Today's behaviour suggests the write is succeeding before the response is fully returned, so a retry would either re-fetch the result (if the API is idempotent) or surface the actual created object.
- CLI fallback on EOF: when the response is truncated mid-payload, the CLI internally calls
GET /workspaces?name=<name> to check whether the workspace was actually created, and treats that as the source of truth before failing.
- At minimum, a clearer signal in the error message that the EOF might mean "successfully created but response truncated" — so callers know to verify via
brev ls rather than assume failure and clean up.
Workaround for anyone hitting this today
Verify via brev ls --json after every non-zero brev create exit before treating it as a hard failure.
What
brev create <name> --type <T>sometimes exits with non-zero return code and a stderr body like:…even though the workspace was actually created — it shows up in
brev ls --jsonimmediately after, withstatus=DEPLOYINGthenstatus=RUNNINGafter a few minutes, and is fully usable. The non-zero exit is therefore a false negative: the API call appears to have completed server-side but the client got an EOF before reading the success response.When it happens
Observed multiple times across our automation today, on
m8i-flex.2xlarge(AWS) and previously onn2d-standard-8(GCP). Intermittent — probably 1-in-5 or so during peak hours. Repro is unfortunately just "runbrev createenough times".Why it matters for automation
Downstream tooling that treats
brev create's exit code as truth concludes "create failed" and either:brev delete <name>(which then succeeds, terminating the perfectly-good workspace).destroyed/failed, leaving the orphan VM in Brev that no automation will reconcile.Our workaround (in our deploy script and in the manager's BrevDriver):
What would fix this on the CLI side
Pick whichever fits the architecture best:
POST /workspacesis keyed by a client-supplied request ID, the CLI can retry on EOF without risking a duplicate workspace. Today's behaviour suggests the write is succeeding before the response is fully returned, so a retry would either re-fetch the result (if the API is idempotent) or surface the actual created object.GET /workspaces?name=<name>to check whether the workspace was actually created, and treats that as the source of truth before failing.brev lsrather than assume failure and clean up.Workaround for anyone hitting this today
Verify via
brev ls --jsonafter every non-zerobrev createexit before treating it as a hard failure.