Skip to content

brev create -s @file silently drops the startup script (no error, no log) on m8i-flex.2xlarge AND n2d-standard-8 #383

@robobryce

Description

@robobryce

What

brev create <name> -s @<file> (and -s "<inline>") is silently a no-op today — the workspace is created without the supplied startup script anywhere on it. cloud-init never runs the script; no per-instance / per-boot / per-once slot in the cloud-config receives it. There is no warning, no rejection, no log line — the workspace just comes up like the -s flag was never passed.

Verified today (CLI v0.6.323) on both m8i-flex.2xlarge (AWS) and n2d-standard-8 (GCP).

Repro on AWS

$ cat > /tmp/marker-stub.sh <<'EOM'
#!/usr/bin/env bash
set -eu
echo "MARKER ran at $(date -u +%FT%TZ)" | tee /tmp/brev-s-test-marker.txt
logger -t brev-s-test "MARKER startup script ran"
EOM
$ chmod +x /tmp/marker-stub.sh

$ # Three back-to-back creates: -s @file, -s "<inline>", and no -s.
$ brev create gr-sstest  --type m8i-flex.2xlarge -s @/tmp/marker-stub.sh
$ brev create gr-sstest2 --type m8i-flex.2xlarge -s "$(cat /tmp/marker-stub.sh)"
$ brev create gr-sstest3 --type m8i-flex.2xlarge

$ # On each VM after build completes:
$ ssh gr-sstest 'ls /tmp/brev-s-test-marker.txt'
ls: cannot access '/tmp/brev-s-test-marker.txt': No such file or directory
$ ssh gr-sstest 'sudo journalctl -t brev-s-test --no-pager'
-- No entries --

The marker file never appears and nothing lands in journald. Same result on gr-sstest2 and gr-sstest3.

What's actually written to the VM

brev writes a multipart MIME envelope to /var/lib/cloud/instance/user-data.txt regardless of whether -s is passed. The MIME envelope is identical (byte-for-byte) across all three cases — same boundary string ===============7279599212584821875== and three empty base64 slots:

$ ssh gr-sstest 'sudo cat /var/lib/cloud/instance/user-data.txt'
Content-Type: multipart/mixed; boundary="===============7279599212584821875=="
MIME-Version: 1.0

--===============7279599212584821875==
Content-Type: text/x-shellscript-per-boot; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="always.sh"



--===============7279599212584821875==
Content-Type: text/x-shellscript-per-instance; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="instance.sh"



--===============7279599212584821875==
Content-Type: text/x-shellscript-per-once; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="once.sh"



--===============7279599212584821875==--

Notice the three Base64 slots are blank — Brev wrote the envelope template but never inserted the content from -s. cloud-init then unpacks all three blank slots into 0-byte /var/lib/cloud/scripts/per-instance/instance.sh, which fails with Exec format error. Missing #! in script? and the script never runs.

Same on GCP

On n2d-standard-8, the failure mode is slightly different — cloud-init reports Skipping user-data validation. No user-data found. and writes [600] 0 bytes to the user-data file. The GCP startup-script metadata attribute returns 404 (only block-project-ssh-keys, serial-port-enable, ssh-keys are present). Same observable: -s payload never lands.

Why this matters

Anyone using brev create -s @file for VM bootstrap (which is the documented contract of the flag — the help text says: Startup script to run on instance (string or @filepath)) is silently getting workspaces with no bootstrap script. The CLI's help is misleading; the flag is non-functional.

In our case (gateroom — a fleet manager that drives brev create to spin up worker VMs), every worker we ever spawned was sitting waiting for a heartbeat that the bootstrap script never had a chance to fire, because the bootstrap script never reached the VM. We worked around it by not passing -s and instead SSH-pushing the script onto the VM after brev create returns, but that's a non-trivial rebase of our spawn flow. (See https://github.com/brycelelbach-private/gateroom/pull/230 for the workaround.)

Suggested fix

Either:

  1. Make -s actually work — the CLI knows how to write the multipart MIME envelope (it writes the empty template today); it just needs to insert the user-supplied content into the appropriate slot (probably instance.sh if the script is meant to run once on first boot).
  2. Reject -s loudly with an error if the API can't carry user-data on the requested instance type or provider. Silent acceptance + drop is the worst possible outcome — users believe they're getting a working startup script and only find out the workspace is broken when whatever the script was supposed to do hasn't happened.

The current behaviour is the failure mode of the silent-drop documented in https://en.wikipedia.org/wiki/Pareto_principle's anti-pattern: a tool that takes a parameter, says nothing about it, and silently does nothing with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions