docs/changelog.mdx at main · usezombie/docs

title	Changelog
description	Stay up to date with UseZombie product updates, new features, and improvements.

UseZombie is in **Early Access Preview** and pre-production. APIs and agent behavior may change between releases without long deprecation windows.

Bring your own key (BYOK) + credit-pool billing

Tenants can now run zombie events against their own LLM provider account ("BYOK") instead of the platform-managed default. Both modes share the same gate, the same metering, and the same credit pool — they differ in drain rate, not eligibility. Every new tenant gets a $10 starter grant; the gate trips on the next event after exhaustion (no in-flight kill).

What changed

Provider posture is tenant-scoped, not workspace-scoped. A new core.tenant_providers row pins one of two postures per tenant: platform (we charge from your zombie credits) or byok (your provider, your API key, our flat per-event overhead). The legacy PUT|GET|DELETE /v1/workspaces/{ws}/credentials/llm route has been removed; pre-v2.0 carve-out applies — the URL returns 404, not 410, and there is no compat shim.
Two-debit metering. Each event yields up to two charge rows in core.zombie_execution_telemetry: a receive charge committed at gate-pass and a stage charge committed before execution and updated post-run with token counts. The dashboard groups them by event.
Per-token rates. The public _um/<key>/model-caps.json endpoint now carries input_cents_per_mtok and output_cents_per_mtok per model. The API server populates a process-local cache from core.model_caps at boot — compute_stage_charge reads it on the hot path.
Starter grant on signup. tenant_billing.insert_starter_grant runs in the tenant-create transaction. Existing tenants are unaffected; the grant ships once per tenant, never re-applied.

API surface

Tenant provider

GET /v1/tenants/me/provider — resolved config (mode, provider, model, context cap, credential ref). The api_key is never returned.
PUT /v1/tenants/me/provider — flip to BYOK by passing { "mode": "byok", "credential_ref": "<vault-name>" }. Optional model override; otherwise the model in the credential body is used. Tenant-admin only (403 otherwise).
DELETE /v1/tenants/me/provider — equivalent to PUT mode=platform. Resets to the platform default and surfaces a low-balance warning if applicable.

Tenant billing

GET /v1/tenants/me/billing — plan + balance snapshot (already shipped; unchanged).
GET /v1/tenants/me/billing/charges?limit= — newest-first credit-pool charge rows (one per (event_id, charge_type)). Backs the Settings → Billing Usage tab. Note: REST §1 forbids /usage as a final segment (not a plural noun), so the resource is /charges — each row is literally a charge.

Removed

PUT|GET|DELETE /v1/workspaces/{workspace_id}/credentials/llm — never wired to a runtime resolver. Use /v1/tenants/me/provider plus a credential stored in the workspace vault.

CLI (`zombiectl`)

zombiectl tenant provider {get|set|reset} — manage the tenant's active LLM posture. set --credential <name> [--model <override>] requires the credential name explicitly so the link to your vault entry is unmistakable. reset warns if your credit balance falls below 100¢.
zombiectl billing show [--limit N] [--json] — read-only dashboard. Prints the formatted balance plus the last N events (default 10) with receive / stage / total cents columns. Footer points at https://app.usezombie.com/settings/billing. No purchase / topup / configure subcommands in v2.0 — Stripe lands in v2.1.

Dashboard

Settings → LLM Provider (/settings/provider) — mode toggle + BYOK form. Credential dropdown comes from your active workspace vault; if it's empty, the form points you at /credentials first. Save and the page revalidates with the resolved config.
Settings → Billing (/settings/billing) — read-only summary dashboard. Headline balance + disabled "Purchase Credits" button (tooltip: "Coming in v2.1"); Usage tab grouped by event; Invoices and Payment Method tabs render empty states for the v2.1 cutover.

Upgrading

CLI: drop any direct calls to the workspace /credentials/llm route. Store your provider credential in the workspace vault (zombiectl credential add ...), then zombiectl tenant provider set --credential <name>. Verify with zombiectl tenant provider get and run a test event.
Dashboard: existing tenants stay on platform-managed by default; nothing breaks. To switch to BYOK, head to Settings → LLM Provider in Mission Control.
Custom integrations consuming the public model-caps endpoint can now read input_cents_per_mtok / output_cents_per_mtok per model. The shape is additive — old fields still present.

Notes

Pricing visibility. Per-model rates are now in the public-but-unguessable model-caps.json response. Anyone who finds the cryptic URL can read platform margins. We accept the trade-off — it preserves the cacheable, unauthenticated property that lets tenant provider set resolve at low latency without a tenant token. We'll revisit if a competitor uses the data strategically.
No plan tiers. "Free" is not a tier — it's just "the user hasn't exhausted the $10 starter grant yet." Both platform and BYOK postures run through the same processEvent and compute_*_charge functions; they differ in drain rate, not in eligibility.

URL hygiene: `/steer` becomes `/messages`, `/memory/*` collapses into `/memories`

Two REST endpoints lose their verb-shaped URLs in favor of resource collections, and the in-process router moves to a segment-based matcher under the hood. Pre-v1.0 carve-out applies — old URLs return 404, not 410, and there is no compatibility shim.

Upgrading

Two URL renames. CLI and server should upgrade together.

Steering a zombie. POST /v1/workspaces/{ws}/zombies/{zid}/steer → POST /v1/workspaces/{ws}/zombies/{zid}/messages. Request body shape is unchanged. The CLI subcommand stays zombiectl zombie steer — verb on the CLI, noun on the wire.
Memory tools. Four verb endpoints collapse into one resource:
- POST /v1/memory/store → POST /v1/workspaces/{ws}/zombies/{zid}/memories
- GET /v1/memory/recall?... → GET /v1/workspaces/{ws}/zombies/{zid}/memories?query=...
- GET /v1/memory/list?... → GET /v1/workspaces/{ws}/zombies/{zid}/memories (omit ?query=)
- POST /v1/memory/forget → DELETE /v1/workspaces/{ws}/zombies/{zid}/memories/{memory_key}
DELETE is now idempotent — a missing key returns 204 No Content with an empty body. The previous {"deleted": true|false} response is gone.

zombie_id is a path segment everywhere — drop it from the query string. The memory_store / memory_recall agent-tool names are unchanged; this is the HTTP surface only.

What's new

Stricter routing. The dispatcher now parses each request path into segments once at the boundary, so // and trailing slashes no longer silently match wrong handlers. Malformed paths return 404 deterministically.
Single source of truth for v1. The version literal lives in exactly one place in the router. Adding a future v2 is a one-line change.

API reference

POST /v1/workspaces/{workspace_id}/zombies/{zombie_id}/messages — same body and response as the retired /steer.
GET /v1/workspaces/{workspace_id}/zombies/{zombie_id}/memories?query=&category=&limit= — list-or-search collection. Presence of ?query= flips behavior from list-most-recent to fuzzy search across key and content.
POST /v1/workspaces/{workspace_id}/zombies/{zombie_id}/memories — store a single entry.
DELETE /v1/workspaces/{workspace_id}/zombies/{zombie_id}/memories/{memory_key} — idempotent 204.

Retired URLs (/v1/workspaces/{workspace_id}/zombies/{zombie_id}/steer, /v1/memory/{store,recall,list,forget}) return 404 with no body.

REST cleanup: `/complete` and `/kill` move to PATCH on the resource. Config hot-reload lands.

Two legacy verb-suffix endpoints retire in favor of PATCH on the underlying resource. Completing an auth session and killing a zombie now ride a single PATCH per resource — closer to standard REST semantics, easier to discover from OpenAPI, and consistent with how every other workspace and zombie field already behaves. Alongside the rename, zombie config edits now hot-reload mid-loop: an operator updating core.zombies.config_json from Mission Control sees the new tools, network policy, and context budget take effect on the running worker without restarting the zombie thread. The control-stream signal that already existed for kills now also carries config-revision changes; the worker reparses, swaps the in-memory config, and frees the old allocation between events.

Upgrading

Every CLI/SDK call against the retired URLs needs an update. The CLI commands themselves (zombiectl kill, zombiectl login) are unchanged — they always wrapped these URLs internally. Direct API consumers must migrate:

POST /v1/workspaces/{ws}/zombies/{zombie_id}/kill → PATCH /v1/workspaces/{ws}/zombies/{zombie_id} with body { "status": "killed" }. Same auth, same response shape. Re-killing a killed zombie still returns 404 (idempotent-fail).
POST /v1/auth/sessions/{session_id}/complete → PATCH /v1/auth/sessions/{session_id} with body { "status": "complete", "token": "<user-jwt>" }. The response now mirrors the GET poll shape ({ status, token, request_id }).
POST /v1/workspaces/{ws}/zombies/{zombie_id}/steer is unchanged in this release. The steer rename to POST /events with a polymorphic body is scheduled for the next URL hygiene pass.

Both retired URLs return 404 — no 410 stub. CLI and server may be upgraded independently; the CLI was already issuing the new shapes before this release and nothing else in zombiectl needs to change.

What's new

Config hot-reload mid-loop. Edit a zombie's config in Mission Control (or via PATCH /v1/.../zombies/{id} with config_json) and the running worker observes the new revision between events. Tools list, network allowlist, secrets map, and the three context-budget knobs (tool_window, memory_checkpoint_every, stage_chunk_threshold) all swap on the next event boundary. The old config is freed in the same step — no memory leaks on config swap.
One PATCH for combined updates. PATCH /v1/.../zombies/{id} accepts { config_json, status } together. Setting both in one request issues one SQL update and one control-stream signal per dirty surface, so a config-and-kill in one request stays atomic.
Cleaner OpenAPI surface. The bundled spec at /openapi.json shed three verb-suffix paths and three pending-rename carve-outs. Slack and GitHub OAuth callbacks moved to a separate vendor-immortal classification — they're still pinned, but the API hygiene gate now distinguishes external contracts from internal cleanup debt.

API reference

Updated routes (the substantive shape changes):

PATCH /v1/workspaces/{workspace_id}/zombies/{zombie_id} — body is partial: { config_json?, status? }. Both fields optional; an empty body is a 200 no-op. When status is set it must equal "killed". Response: { zombie_id, status?, config_revision }. The status field is present only when the request set it.
PATCH /v1/auth/sessions/{session_id} — body: { status: "complete", token }. Bearer auth (the depositor proves it can mint a user-jwt). Response mirrors the GET poll shape: { status, token, request_id }.

Retired routes (404 in this release, no 410 stub):

POST /v1/workspaces/{ws}/zombies/{zombie_id}/kill
POST /v1/auth/sessions/{session_id}/complete

No new error codes. The validation message for invalid status values is status must be "killed" (returned with UZ-VAL-001).

CLI

No surface change. zombiectl kill <zombie_id> and zombiectl login issue the new URLs internally — anyone scripting against the CLI sees the same exit codes and JSON shapes as before.

Frontmatter cleanup: runtime config moves under `x-usezombie:`

TRIGGER.md frontmatter no longer carries runtime keys at the top level. tools, credentials, network, budget, and trigger now live under a single x-usezombie: block. Top-level stays minimal — just name: for cross-file identity. SKILL.md gains validated authoring metadata: name, description, and version are required at the top level; tags, author, model, and when_to_use pass through. Install rejects any bundle whose SKILL.md name: does not match TRIGGER.md name:.

Upgrading

Every existing zombie bundle needs both files updated. The migration is mechanical:

TRIGGER.md — add x-usezombie: at the top level and indent your existing trigger:, tools:, credentials:, network:, and budget: blocks under it. Keep name: at the top.
SKILL.md — ensure the frontmatter has name:, description:, and version:. Make name: exactly match the value in TRIGGER.md.
zombiectl install --from <dir> parses both files and reports field-level errors. Re-run until clean.

See Authoring skills for the canonical shape and a working platform-ops-zombie example.

What's new

Disciplined parser. Unknown subkeys under x-usezombie: fail loud (UnknownRuntimeKey) so typos surface at install instead of degrading silently. Top-level keys stay permissive — drop in x-amp: or other vendor blocks without breaking install.
Cross-file identity. name: must match across SKILL.md and TRIGGER.md. One identity per zombie bundle, enforced at install.
Real YAML. The bespoke YAML→JSON converter is replaced with kubkon/zig-yaml 0.2.0. Multi-line strings, escapes, the standard scalar tags, and arbitrary nesting depth all work as you'd expect from any YAML 1.2 tool.

API reference

Two new error codes from POST /v1/workspaces/{ws}/zombies:

UZ-ZMB-008 — MSG_ZOMBIE_INVALID_CONFIG now also fires when SKILL.md frontmatter is malformed or missing required fields.
UZ-ZMB-011 — MSG_ZOMBIE_NAME_MISMATCH. Returned when SKILL.md name: and TRIGGER.md name: disagree.

Internal SQL paths into core.zombies.config_json move from config_json->'trigger'->... to config_json->'x-usezombie'->'trigger'->.... No external surface change; mentioned for operators reading raw rows.

Approval inbox: pending gates surface in Mission Control, resolve from the browser

Approval gates used to flow only through Slack DMs. Operators looking at Mission Control saw a "healthy" zombie even when it was stalled at a gate. The inbox closes that loop. Every pending gate now surfaces in a workspace-wide /approvals list and on each zombie's detail page, with the proposed action, blast-radius assessment, evidence, and a timeout countdown rendered next to Approve and Deny buttons. Resolutions go through a single channel-agnostic core shared by Slack and Mission Control, so a click in either place makes the other channel's stale button no-op cleanly with the original outcome and resolver attribution. A background sweeper auto-denies any pending gate whose 24-hour timeout has elapsed, attributing the resolution to system:timeout so operators can tell auto-denials apart from manual ones.

What's new

/approvals page. Workspace-wide list of pending gates, sorted oldest-first (oldest is most urgent). Each row shows the zombie name, gate kind badge, proposed-action one-liner, blast-radius callout, age, and timeout countdown, with inline Approve and Deny buttons. The list refreshes every 5 seconds; resolutions remove the row optimistically. Empty workspaces render a clean "No pending approvals" state.
/approvals/{gate_id} detail page. Full proposed-action prose, evidence rendered as expandable JSON, blast-radius callout, key/value context grid (zombie, tool, action, kind, requested-at, auto-deny-at, action id), and a Resolve panel with an optional reason textarea. Once resolved, the page flips to a Resolution panel showing Resolved as <outcome> by <who> at <when>.
Per-zombie Pending approvals section. The zombie detail page gains a "Pending approvals" panel filtered to that zombie, plus a destructive-variant badge in the page header showing N pending approval(s) (or 50+ past the page-size).
Sidebar nav. New "Approvals" entry between Credentials and Events.
Slack and Mission Control parity. Slack callbacks and Mission Control clicks now share one resolve core. The schema-level append-only trigger plus the WHERE status='pending' precondition give at-most-one-resolution guarantees — the loser sees 409 with the original outcome and resolver attribution, never silent overwrite.
Auto-timeout sweeper. A background thread on the API process scans core.zombie_approval_gates every 60 seconds for pending rows whose timeout_at has passed, and transitions them to timed_out via the same resolve core. Worker treats timed_out as denied for safety on destructive operations. Default timeout is 24 hours.

API reference

GET /v1/workspaces/{ws}/approvals?status=&zombie_id=&gate_kind=&cursor=&limit= — paginated list. Default status=pending, limit=50, max 200. Cursor encodes (requested_at, gate_id) so concurrent inserts don't cause silent skips. Response shape: { items: ApprovalGate[], next_cursor: string|null }. Filterable by zombie_id and gate_kind.
GET /v1/workspaces/{ws}/approvals/{gate_id} — single-row read. 404 when the gate doesn't exist OR belongs to a different workspace (no information leak).
POST /v1/workspaces/{ws}/approvals/{gate_id}:approve — body {reason?: string ≤ 4096}. 200 with {gate_id, action_id, outcome: "approved", resolved_at, resolved_by}. 409 UZ-APPROVAL-006 with the same shape when another channel got there first; the body's outcome and resolved_by reflect the original resolver. 404 on unknown gate id (including cross-workspace).
POST /v1/workspaces/{ws}/approvals/{gate_id}:deny — same shape. outcome is denied on success.
ApprovalGate shape includes the new operator-visible fields (gate_kind, proposed_action, evidence as JSONB, blast_radius, timeout_at, resolved_by) on top of the existing audit fields.

Bug fixes

Slack and Mission Control race no longer overwrites silently. Before this release, the Slack callback wrote the Redis decision key directly without a DB precondition; a Mission Control click that happened to land first could be overwritten by the Slack click moments later. Both paths now go through the same DB UPDATE WHERE status='pending' atomic transition; the loser observes 409 with the original outcome.

Streaming substrate hot-path cleanup

Internal performance pass on the worker → live-tail pipe surfaced by the Apr 28, 2026 streaming substrate review. JSON encoding for activity frames now reuses a per-event scratch buffer, eliminating the per-frame heap alloc on chunk-heavy responses (chunk-encode benchmark drops from ~43µs to ~2µs). The executor transport parses each progress frame once instead of twice (~46% faster). Each worker now opens a dedicated Redis client for activity PUBLISH so the per-frame publish no longer contends with stream commands on the queue client's mutex. The per-zombie events index now leads with (zombie_id, created_at DESC, event_id DESC) — covers the dashboard's primary view and keyset cursor pagination directly, where the prior actor-prefixed index forced a sort-and-scan. No user-visible behavior change; operators may notice steadier live-tail latency once concurrent dashboard tabs grow.

Streaming substrate: every event has provenance, operators can steer, and live activity tails the dashboard

Three things converge in this release. First, every event a zombie processes — operator steer, GitHub webhook, scheduled cron, chunked continuation for long responses, gate-resolved continuation — now lands on a single Redis stream with a normalized envelope and an actor field that carries provenance forward. Second, every event start and end is durably persisted in core.zombie_events with the request payload, the response, token count, wall time, and failure label, queryable through a new history endpoint with cursor pagination, actor glob filters, and a humanized since= parameter. Third, the dashboard now ships a live activity panel that streams tool calls, response chunks, and completion frames over Server-Sent Events with a sub-200 ms publish-to-receive budget.

Operators get two new CLI subcommands. zombiectl steer {id} "<message>" POSTs to the new ingress, opens the SSE stream, and prints [claw] chunks as they arrive — Ctrl-C closes the watcher without killing the zombie. zombiectl events {id} paginates the history with --actor=, --since=, --json, and --cursor= filters.

Upgrading

POST /steer body and response shape changed. The endpoint now does a direct XADD on the per-zombie event stream and returns {event_id} so callers can correlate. The previous SET/GETDEL key-poll path is gone; the legacy zombie:{id}:steer Redis key is no longer touched. If you have a script reading the steer key directly to detect inflight steers, switch to either the SSE stream or the events history endpoint.
GET /v1/.../zombies/{id}/activity is removed. Both the per-zombie variant and the workspace-aggregate variant. Replace per-zombie activity reads with GET /v1/workspaces/{ws}/zombies/{id}/events. Replace workspace-aggregate reads with GET /v1/workspaces/{ws}/events?zombie_id={id} for the drill-down or omit zombie_id for the workspace-wide feed. Both responses now carry actor, status, response_text, tokens, and wall_ms instead of the old event_type/detail shape. zombiectl logs automatically uses the new endpoint; if you have direct API consumers, switch the URL and update the row parser.
core.activity_events table is dropped. Pre-v2.0 teardown — no migration. Anything that read this table directly will break; switch to core.zombie_events. The new table's primary key is composite (zombie_id, event_id) to support idempotent replay under XAUTOCLAIM redelivery.
Executor RPC framing version bumped to v2. Worker and executor binaries must upgrade together — they perform a HELLO handshake on connect and abort with executor.rpc_version_mismatch on a mismatch. Roll the executor first, then the worker.

What's new

One ingress, one durable record per event. Every event landing on zombie:{id}:events produces exactly one new row in core.zombie_events (mutable, lifecycle-tracked), one new row in zombie_execution_telemetry (immutable billing audit), and one mutated row in core.zombie_sessions. All three reference the same event_id, so a single join key threads narrative, billing, and session state. Replays are idempotent via ON CONFLICT DO NOTHING on the composite key plus a unique constraint on telemetry.
Continuation actors stay flat with origin tags. When chunking splits a long response or a blocked gate is resolved, the new event re-enters the stream with actor=continuation:<original_actor> — never continuation:continuation:... no matter how deep the chain. A single actor LIKE '%steer:kishore' filter finds the origin and every continuation in one pass. Each continuation's resumes_event_id points at its immediate parent, so a recursive CTE walks the chain back to its origin.
gate_blocked events are visible but unresolvable until the Approval Inbox ships. The row enters terminal state with status='gate_blocked', failure_label populated, and an XACK so the worker doesn't redeliver. Operators can see stranded events via GET /events?actor=.... The admin-resume fallback was deliberately dropped from this release; resolution is owned by the upcoming Approval Inbox.
Dashboard live panel. /zombies/{id} renders the new <LiveEventsPanel /> above the event history table. Native EventSource connects to a same-origin Next Route Handler that mints an API-audience JWT server-side and proxies the upstream stream — the browser never holds the JWT, the backend never sees a cookie. Reconnects with exponential backoff capped at 15 s; rolling buffer of the last 20 frames.

API reference

POST /v1/workspaces/{ws}/zombies/{id}/steer — body {message: string (≤8192 chars)}. 202 with {status: "accepted", event_id: string}. The event_id is the Redis stream entry id; CLIs and dashboards correlate the SSE feed to this id.
GET /v1/workspaces/{ws}/zombies/{id}/events?cursor=&actor=&since=&limit= — paginated history. actor accepts globs (steer:*, webhook:*) and exact matches (webhook:github). since accepts Go-style durations (15s, 30m, 2h, 7d) or RFC 3339 timestamps (2026-04-25T08:00:00Z). Default limit=50, max 200. since and cursor are mutually exclusive — supplying both returns 400.
GET /v1/workspaces/{ws}/events?cursor=&actor=&zombie_id=&since=&limit= — workspace-aggregate history. Same parameter shape as the per-zombie variant; items carry an extra zombie_id so the workspace overview can group by zombie. Replaces the deleted /activity endpoint.
GET /v1/workspaces/{ws}/zombies/{id}/events/stream — SSE live tail. Content-Type: text/event-stream. Frame kinds: event_received, tool_call_started, tool_call_progress (~2s heartbeat for long tool calls), chunk, tool_call_completed, event_complete. Per-connection sequence ids reset to 0 on every new SUBSCRIBE; the server ignores the Last-Event-ID request header. After a disconnect, clients backfill via GET /events?since=<last_seen> then reopen the stream.

CLI

zombiectl steer {id} "<message>" — batch mode. POSTs the message, opens the SSE stream, filters frames on the returned event_id, prints [claw] <chunk> as response chunks arrive, exits 0 on event_complete with status=processed and non-zero on agent_error. Falls back to polling GET /events?since=<event_id> if the SSE drops, with a 60-second deadline. Interactive REPL mode (no message argument) is deferred to a follow-up release; calling steer {id} without a message currently exits 2 with a helpful pointer.
zombiectl events {id} — paginated history print. --actor=steer, --actor=webhook:github, --since=2h, --since=2026-04-25T08:00:00Z, --json (raw records for piping), --cursor=<token> (resume from a previous page). Default 50 events per page; the next-cursor hint prints below the last row when more results exist.
zombiectl logs {id} — repointed at the new events endpoint (the activity stream is gone). Same flag shape; row format now shows actor + response_text summary instead of event_type + detail.

Install actually works now: contract aligned, parser key matches the sample, doctor preflight tightened

Three small bugs were stacking up to make zombiectl install --from <path> impossible to use against a fresh workspace. The CLI was sending one shape; the API expected another. The shipped sample uses tools: in TRIGGER.md; the parser was looking for skills:. And both install and doctor were exempt from the local auth guard, so missing credentials surfaced as a confusing 401 from the server instead of a clean local "log in first" message. All three are fixed in one pass.

Upgrading

Install POST shape changed. POST /v1/workspaces/{ws}/zombies now accepts {trigger_markdown, source_markdown}. The previous {name, config_json, source_markdown} shape is gone. The server is the single parser of TRIGGER.md frontmatter — name and the persisted config_json are derived server-side from the YAML between the --- fences. If you have a script that POSTs directly to this endpoint, switch to sending the raw two markdown files. Pre-v1.0; no compat shim.
TRIGGER.md key renamed skills: → tools:. The shipped sample (samples/platform-ops/TRIGGER.md) already used tools:; the parser now matches. If you have an older zombie spec with a top-level skills: array, rename that key to tools: before installing. The server returns ERR_ZOMBIE_INVALID_CONFIG with a hint when the canonical key is missing.
zombiectl install and zombiectl doctor now require zombiectl login first. Previously they were exempt from the auth guard and produced opaque 401s on missing credentials. Now they fail locally with AUTH_REQUIRED before any HTTP call. Only login itself is exempt.

What's new

Doctor reports the three things that actually matter. The new check set is server_reachable (GET /healthz with a 5s timeout), workspace_selected (local config has a current workspace), and workspace_binding_valid (your token is bound to that workspace, verified by a 200 from the workspace-scoped zombies list). Previous healthz/readyz/credentials/workspace checks are folded in or dropped — credentials is now covered by the auth guard, readyz overlaps healthz.
Doctor --json returns a stable schema. {ok: bool, api_url: string, checks: [{name, ok, detail}]}. Skills and scripts can consume it without grep on prose. Each failed check carries a one-line detail pointing at the next concrete action.
Install response carries the canonical name. POST /zombies now returns {zombie_id, name, status}. The CLI displays the server-derived name instead of guessing from the directory basename — copy/paste names match what the server stored.

API reference

POST /v1/workspaces/{workspace_id}/zombies — body {trigger_markdown: string (≤64KB), source_markdown: string (≤64KB)}. 201 with {zombie_id, name, status}. 400 ERR_ZOMBIE_INVALID_CONFIG if the frontmatter parse fails (missing name:, missing tools:, missing --- fences, etc.). 400 ERR_INVALID_REQUEST with MSG_ZOMBIE_TRIGGER_REQUIRED if trigger_markdown is empty or oversized.

CLI

zombiectl install --from <path> — POSTs {trigger_markdown, source_markdown} (the previous {source_markdown, trigger_markdown} shape was rejected by the API). The display name in 🎉 <name> is live. comes from the server response, falling back to the directory basename only if the server omits it.
zombiectl doctor — three checks (server_reachable, workspace_selected, workspace_binding_valid). Per-check 5s timeout, exit 0 on all-green and exit 1 on any failure. Now requires authentication; run zombiectl login first.

Worker substrate — install a zombie, see it work in seconds

Zombies installed via POST /v1/workspaces/{ws}/zombies are now claimed by a worker thread within ~1s of the 201 — no worker restart needed. A new POST /v1/workspaces/{ws}/zombies/{id}/kill aborts an in-flight zombie cleanly and propagates the cancel to the executor, replacing the legacy DELETE /…/zombies/{id} shortcut. A new PATCH /v1/workspaces/{ws}/zombies/{id} updates a zombie's config and signals the worker to pick up the new revision on its next loop iteration. SIGTERM on the worker now triggers a graceful drain instead of cutting in-flight events mid-call.

What's new

Atomic install path. The create endpoint now does INSERT into core.zombies + XGROUP CREATE MKSTREAM zombie:{id}:events + XADD zombie:control * type=zombie_created synchronously before returning 201. By the time the API responds, the per-zombie data stream and the control-plane signal both exist; a webhook arriving 1ms after the 201 finds the consumer group already.
Fleet-wide control plane. A new Redis stream zombie:control carries lifecycle signals (created / status_changed / config_changed / drain_request). One watcher thread per worker process consumes it via XREADGROUP and dispatches to spawn / cancel / reconfigure handlers — no more "zombie installed at 14:00 invisible to the worker that started at 13:00" until the next restart.
Per-zombie cancel flag. Each zombie thread observes a per-zombie atomic flag at the top of every loop iteration. POST /kill flips the flag and the thread exits within ~100ms, regardless of where it was in its event loop.
zombiectl kill <zombie_id> now POSTs to /kill and requires an explicit zombie id (was previously a DELETE that defaulted to "kill all in workspace" when no id was passed — that footgun is gone).

API reference

POST /v1/workspaces/{workspace_id}/zombies/{zombie_id}/kill — 200 with {zombie_id, status: "killed", queued_at}. 404 if the zombie does not exist or is already killed (idempotent semantics fold into 404).
PATCH /v1/workspaces/{workspace_id}/zombies/{zombie_id} — body {config_json?: string}. 200 with {zombie_id, config_revision} where config_revision is the new updated_at timestamp (strictly monotonic per zombie).
DELETE /v1/workspaces/{workspace_id}/zombies/{zombie_id} — removed. POST /kill replaces it with a clean verb.

CLI

zombiectl kill <zombie_id> — POST to the new /kill endpoint. Argument is now required.

`platform-ops` — flagship zombie for GitHub Actions deploy failures

A new zombie lives at samples/platform-ops/. It wakes on a GitHub Actions workflow_run.conclusion=failure webhook, gathers evidence from the failed workflow's logs, your hosting provider, and your data-plane, then posts an evidenced diagnosis to a Slack channel. Same zombie is reachable manually via zombiectl steer {id} for a morning health check or any operator-driven investigation. Read-only against GitHub, Fly, and Upstash; its one write path is the Slack post. Credentials are structured {host, api_token} / {host, bot_token} records in the workspace vault; raw token bytes are substituted into outbound HTTPS requests at the credential firewall, after the executor sandbox closes around the agent — they never reach the LLM context, logs, or database.

What's new

samples/platform-ops/ ships with SKILL.md (diagnosis prompt, evidence-gathering flow, budget prose ≤ $8/month, http_request as the primary tool, cron_add gated on the operator asking for recurring polling), TRIGGER.md (trigger.type: webhook for GitHub Actions plus manual steer, the built-in tools actually used, network allowlist for api.github.com / api.fly.io / api.upstash.com / slack.com, $1/day + $8/month budget caps), and a README.md operator walkthrough covering install, wiring the GitHub Actions webhook, chatting, an example diagnosis, and credential hygiene.
Four credential shapes land alongside: github = {host, api_token}, fly = {host, api_token}, upstash = {host, api_token}, slack = {host, bot_token}. Add them via zombiectl credential add <name> --host <host> --api-token <token> (use --bot-token for slack).
Install works via zombiectl install --from samples/platform-ops. The webhook URL printed at install time is the one to paste into your GitHub repo's webhook settings (filter to workflow_run).
Sandbox: bwrap + landlock + cgroups on Linux; the agent runs in a locked-down process with network deny-by-default and only the network.allow hosts reachable.
Every event lands in core.zombie_events with actor=webhook:github (deploy-failure firing) or actor=steer:<operator> (manual investigation), so the timeline reads cleanly regardless of who poked the zombie.

Mission Control — full lifecycle in the browser; kill switch moves to DELETE

Mission Control reaches its first "I can run my day from here" shape. Sign in to app.usezombie.com and you get an overview page with live status tiles + recent activity, a zombies list with cursor pagination and in-view search, an install form, and a per-zombie detail page that shows the webhook URL, the full config, and a one-click kill switch. Firewall, credentials, and settings pages are in place as placeholders; they'll fill in as the underlying features ship.

Multi-workspace operators get a workspace switcher in the header. Selecting a workspace persists the choice in a cookie and revalidates the current page — no sign-out or token reissue required.

We also cleaned up the kill-switch endpoint while we were in the neighborhood: the zombie routes no longer carry action verbs in their paths. Any caller hitting the legacy kill endpoint must migrate; details under Upgrading.

Credit exhaustion is now operator-visible in Mission Control. When a tenant's balance hits zero, a destructive banner appears above the zombies list and a "Balance exhausted" badge renders on each zombie's detail page — both driven by the is_exhausted / exhausted_at fields on GET /v1/tenants/me/billing.

Upgrading

Kill switch moved from POST to DELETE on a new path. Replace any caller hitting POST /v1/workspaces/{ws}/zombies/{id}/stop with DELETE /v1/workspaces/{ws}/zombies/{id}/current-run. Same behavior, same response shape, same 200 / 409 / 404 semantics. The zombie_stopped activity event is unchanged. The old path now returns 404 — pre-1.0 alpha breakage, no deprecation window. If you have dashboards, runbooks, or scripts pointing at .../stop, update them.
The rename is a REST-hygiene change: "current-run" is a singleton sub-resource of the zombie, and DELETE is the idiomatic verb for "kill the running action." It also unblocks a symmetric GET /current-run in a future release for run-state queries without reintroducing action verbs in paths.

What's new

Overview dashboard at / — status tiles for active / paused / stopped zombies and the tenant credit balance, plus a live "Recent Activity" feed. Renders as a Server Component with independent Suspense boundaries so a slow endpoint doesn't block first paint.
Zombies list at /zombies — cursor pagination with a "Load more" button and in-view search across name, id, and status. Built on GET /v1/workspaces/{ws}/zombies?cursor={ts}:{id}&limit=N (see API reference).
Install form at /zombies/new — validates required fields client-side, migrated onto the design-system Form primitive (react-hook-form + zod). Surfaces a clear toast when a name already exists.
Zombie detail at /zombies/[id] — webhook URL with one-click copy, trigger panel, firewall-rules panel, zombie config (rename / describe / delete-with-confirm), and a React-19 useOptimistic-powered kill switch with 409 auto-recovery.
Workspace switcher in the header — backed by a new GET /v1/tenants/me/workspaces endpoint (see API reference) plus a Server Action that writes the active_workspace_id cookie and revalidates. Works without re-issuing your session.
Placeholder pages at /firewall, /credentials, /settings so the sidebar shows the full shape of what's coming.
Credit-exhaustion banner + per-zombie badge wired to GET /v1/tenants/me/billing — no configuration needed, appears automatically when a tenant runs out.
Auth abstraction: every @clerk/nextjs call now flows through lib/auth/server.ts and lib/auth/client.ts. Switching auth provider in a future release is a two-file edit.
Same-origin /backend proxy: browser-side fetches go through /backend/:path* which the Next config rewrites to API_BACKEND_URL. No more CORS surprises in dev, preview, or prod.
Animated loading states on every async action (install, delete, route navigation, workspace switch).

API reference

New: GET /v1/tenants/me/workspaces — returns every workspace the caller's tenant owns. Backs the workspace switcher.

{
  "items": [
    { "id": "ws_01HW...", "name": "Production", "created_at": 1713700000000 },
    { "id": "ws_01HX...", "name": "Staging",    "created_at": 1713700000001 }
  ],
  "total": 2
}

Changed: GET /v1/workspaces/{workspace_id}/zombies now accepts ?cursor={timestamp}:{id}&limit=N (default 20, max 100). Response gains a nullable cursor field holding the key for the next page (null at the end). Unpaginated callers keep working — absent cursor / limit yields the first 20.

GET /v1/workspaces/ws_01HW.../zombies?limit=2

{
  "items":  [ { "id": "zom_01...", "name": "alpha", "status": "active" }, ... ],
  "total":  2,
  "cursor": "1713700050000:zom_01..."
}

Renamed (breaking):

DELETE /v1/workspaces/{workspace_id}/zombies/{zombie_id}/current-run

Transitions the zombie's status from active or paused to stopped and records a zombie_stopped activity event. Returns 200 {zombie_id, workspace_id, status: "stopped", request_id}. Returns 409 UZ-ZMB-010 if already stopped or killed, 404 UZ-ZMB-009 if the zombie is not in the path workspace. Requires the operator role.

CLI

zombiectl --help now surfaces the full zombie-lifecycle commands — install | up | status | kill | logs | credential — alongside the existing login / workspace / specs / doctor commands.
zombiectl list [--workspace-id ID] [--cursor C] [--limit N] [--json] — new subcommand that mirrors the /zombies list in Mission Control (cursor-paginated, honours the same limit ≤ 100 clamp).
zombiectl workspace show [--workspace-id ID] — new subcommand that mirrors the Mission Control /settings page (prints workspace ID, name, and active-workspace status).
Active workspace is now persistent. zombiectl workspace use <workspace_id> writes it to ~/.config/zombiectl/workspaces.json; subsequent commands (zombiectl list, zombiectl status, workspace show, etc.) default to it when --workspace-id is omitted. Mission Control's active_workspace_id cookie and the CLI's config file stay independent — setting one doesn't affect the other.
zombiectl up still prints the 🎉 Woohoo! Your zombie is installed and ready to run. success line (unchanged).
zombiectl kill is unaffected by the kill-switch path rename — it continues to call DELETE /zombies/{id} (full delete, not the current-run kill) as it always did.

Admin-by-env-var is gone; credit exhaustion is now observable

The env-var API_KEY bypass that minted an admin principal with no tenant and no audit identity has been removed. Admin authentication now flows exclusively through Clerk sessions with publicMetadata.role=admin — set once per operator in the Clerk Dashboard, revoked instantly from the same place, and carried into every request JWT. Programmatic admin access uses a tenant-minted zmb_t_… key from POST /v1/api-keys (shipped in v0.26.0). Separately, the tenant billing response now surfaces credit exhaustion, and a new policy knob lets operators decide what the worker does when a tenant hits zero.

Upgrading

Remove API_KEY from your server environment. If your deployment still passes API_KEY, it is now ignored. The server refuses to start without OIDC (OIDC_JWKS_URL, OIDC_ISSUER, OIDC_AUDIENCE) — no fallback.
Promote your admin user in Clerk. Dashboard → Users → select user → Metadata → Public metadata → set {"role": "admin"}. The operator playbook at playbooks/012_usezombie_admin_bootstrap/001_playbook.md walks through dev + prod step-by-step and ends by minting a zmb_t_… key for CI / scripts and stowing it at op://ZMB_CD_<env>/usezombie-admin/api_key.
If you consumed the balance_cents == 0 branch, switch to reading is_exhausted / exhausted_at on GET /v1/tenants/me/billing (see below).

What's new

BALANCE_EXHAUSTED_POLICY={continue|warn|stop} (default warn). stop pre-empts delivery for an exhausted tenant — the zombie never runs, Redis gets an XACK so the event doesn't retry, and a balance_gate_blocked activity event is recorded. warn logs and emits a rate-limited balance_exhausted activity event (1 per workspace per 24h). continue is the old "log and let it run free" behavior, made explicit.
First-exhausting debit stamps balance_exhausted_at atomically and writes a one-shot balance_exhausted_first_debit activity event. Replays do not double-emit.

API reference

GET /v1/tenants/me/billing gains two fields on every response:

is_exhausted — boolean, true once the tenant's balance has hit zero on a worker debit.
exhausted_at — integer (epoch ms) or null. Non-null only once is_exhausted is true.

The OpenAPI schema lists both as required with exhausted_at nullable.

Observability: per-workspace + per-zombie token counter now wired, OTLP histograms now exported

Two observability paths that looked live but weren't are now actually live. The per-workspace Prometheus token counter is now emitted on every successful zombie delivery, and a new zombie_id label lets you slice the same counter by zombie. The OTLP JSON exporter now forwards histogram data points (_bucket, _sum, _count) instead of silently dropping them.

What's new

Prometheus counter zombie_agent_tokens_by_workspace_total now carries both workspace_id and zombie_id labels and reports real data after each completed delivery. Useful for top-N spend dashboards at either granularity.
zombie_workspace_metrics_overflow_total is exposed so operators can detect when the fixed-capacity slot table (4096 (workspace_id, zombie_id) pairs) saturates and falls back to an _other aggregation bucket.

Bug fixes

Per-workspace token counter was a no-op: the helper existed but no production code path called it. It now fires from the same spot that records zombie_tokens_total, so Grafana queries against the per-workspace family return real values instead of zero.
OTLP JSON exporter silently dropped _bucket / _sum / _count lines, so histograms (zombie_execution_seconds, zombie_agent_duration_seconds, zombie_executor_agent_duration_seconds) never reached an OTLP collector. The exporter now emits OTLP histogram data points with cumulative-to-delta bucket conversion, explicitBounds, and aggregationTemporality: 2 (CUMULATIVE).
Removed the zombie_gate_repair_loops_by_workspace_total and zombie_gate_repair_loops_total counters (plus their helpers) — gate-repair is a pipeline-era concept with no zombie-era call site, so these counters always read zero and misled operators into expecting data that could never appear.

Docs follow-up — rewritten for the v2 MVP

The docs.usezombie.com site has been rewritten end-to-end against the current product. The new quickstart walks a fresh operator from Clerk sign-up to a live zombie firing webhook events in under ten minutes, against the shared $10 tenant balance introduced earlier this month. Stale pre-Clerk vocabulary — redemption flows, legacy "lead-collector"-centric examples — has been cleared from every page outside the historical changelog entries that predate this release.

What's new

New quickstart. Sign up → dashboard → create zombie → copy webhook URL → curl trigger → verify the credit debit in the billing UI. End-to-end in one page.
New CLI reference at /cli/zombiectl — every zombiectl command with copyable examples.
Self-hosting section under /operator — deployment architecture, configuration, security, observability, and operations pages for running the control plane yourself.
Concepts page updated to cover the four nouns (tenant, workspace, zombie, skill) and the tenant-scoped credit model.
Billing pages rewritten around the single-wallet, multi-workspace model.

CLI

Tenant-scoped billing

Billing now lives at the tenant, not the workspace. Every new signup gets exactly one billing.tenant_billing row at plan_tier=free, plan_sku=free_default, and a 1000¢ free-credit balance. Any zombie run in any workspace owned by that tenant debits the same shared balance — creating a second workspace no longer grants additional credits, and plan changes no longer have to fan out across workspace rows. Per-workspace credit state and the workspace-scoped billing lifecycle endpoints are removed.

Removed

POST /v1/workspaces/{workspace_id}/billing/events
POST /v1/workspaces/{workspace_id}/billing/scale
GET /v1/workspaces/{workspace_id}/billing/summary
GET /v1/workspaces/{workspace_id}/zombies/{zombie_id}/billing/summary
POST /v1/workspaces/{workspace_id}/scoring/config

What's new

One tenant, one billing row: billing.tenant_billing holds (plan_tier, plan_sku, balance_cents, grant_source, updated_at) with the tenant id as the primary key.
Worker debits the tenant balance atomically on every completed run via a conditional UPDATE ... WHERE balance_cents >= $cents RETURNING — an exhausted balance returns UZ-BILLING-005 CreditExhausted instead of producing a partial debit.
Schema slots resequenced contiguously to 001..018 to tidy up pre-alpha gaps before the v2.0 baseline.

API reference

New: GET /v1/tenants/me/billing — returns the caller's tenant billing snapshot.

{
  "plan_tier": "free",
  "plan_sku": "free_default",
  "balance_cents": 1000,
  "updated_at": 1713700000000
}

Auth: Bearer Clerk JWT (operator or admin). Returns 401 UZ-AUTH-001 without a valid token.

Clerk-powered signup

New users can now sign up through Clerk and have their account provisioned automatically. A Clerk user.created webhook delivered to POST /v1/webhooks/clerk atomically creates a tenant, a user record bound to the Clerk OIDC subject, an owner membership, and a default workspace with a Heroku-style name (jolly-harbor-482) and a 0-cent credit state. Replayed webhooks are idempotent — a re-delivered user.created returns the existing workspace with created: false and makes no new writes.

What's new

Clerk signup webhook at POST /v1/webhooks/clerk. Svix signature verified inline against CLERK_WEBHOOK_SECRET; stale timestamps (>5 min drift) rejected.
Heroku-style default workspace names. 1,024,000-combo name space (32 adjectives × 32 nouns × 1000 suffixes) with per-tenant uniqueness guaranteed by a partial index.
Internal identity model: a new core.users table (indexed by Clerk OIDC subject) and core.memberships table wire users to tenants with a role. Ready for team accounts in a later release.

API reference

POST /v1/webhooks/clerk — request body is a Clerk user.created event envelope; headers svix-id, svix-timestamp, svix-signature required. Responses: 200 {workspace_id, workspace_name, created}; 400 UZ-REQ-001 (malformed JSON or missing primary email); 401 UZ-WH-010 (invalid signature); 401 UZ-WH-011 (stale timestamp); 413 UZ-REQ-002 (body over 2 MB); 500 UZ-INTERNAL-* (operator misconfig or DB error). Non-user.created event types are 200-ignored so Clerk stops retrying them.

Observability

Three new Prometheus counters on /metrics (six time series): zombie_signup_bootstrapped_total, zombie_signup_replayed_total, and zombie_signup_failed_total with reason label (bad_sig, stale_ts, missing_email, db_error). One new PostHog event, signup_bootstrapped, with distinct_id = oidc_subject so funnels stitch across retries; email domain included, full email never is. Server log lines (clerk.bad_sig, clerk.stale_ts, clerk.bad_request) flow through the existing OTLP log exporter. The operator metrics reference (Metrics reference) has been reconciled against the live exporter.

Unified design system across the dashboard and marketing site

Buttons, cards, dialogs, inputs, and other UI primitives now come from a single @usezombie/design-system package. The dashboard and marketing site share one source of truth — tweak a variant once, both surfaces update.

The new /agents page adds an interactive hero and animated terminal. Landing JS is under 90 kB gzipped, with a size-limit CI gate guarding bundle size. PostHog loads on idle so first paint is no longer blocked.

One credential surface for zombies

Workspace credentials now flow through a single path: zombiectl credential add writes to the workspace vault, and that's what every zombie reads at runtime. No parallel surfaces, no guessing which command owns a given secret.

Zombie lifecycle is the unified product model

The CLI and API now speak one language: zombiectl install → up → status → logs → kill. zombiectl --help is shorter, the API surface is tighter, and the docs, product, and code all describe the same thing. See Zombies.

Docs reshaped around the zombie lifecycle

A new Zombies section walks through installing a template, adding credentials, running, observing, and killing a zombie. Pages describing the legacy v1 pipeline have been retired; the old /specs/* and /runs/* URLs now 404.

New pages: overview, install, running, credentials, webhooks, skills, templates.

Tenant API keys

Tenant admins can now mint named, rotatable API keys via POST /v1/api-keys — scoped to the tenant, revocable, and audited. Raw keys (zmb_t_…) are shown once on creation; only the hash is stored. The legacy API_KEY env var still works as a bootstrap fallback.

Workspace-scoped external agent keys were renamed to agent keys: /v1/workspaces/{ws}/external-agents → /v1/workspaces/{ws}/agent-keys.

Unified webhook authentication — seven first-class providers

Every per-zombie webhook flows through one fail-closed middleware that handles URL-embedded secrets, Bearer tokens, HMAC signatures, and Svix multi-signature rotation with constant-time comparisons.

Seven providers ship first-class: agentmail, Grafana, Slack, GitHub, Linear, Jira, and Clerk (via Svix). Onboarding takes one field in TRIGGER.md; secrets are workspace-vaulted and rotate without a zombie redeploy. See Webhooks.

Operator dashboard foundation

Workspace-wide activity feed, operator kill switch for runaway zombies, and per-zombie billing summary that mirrors the workspace view. Billing numbers now come from real execution telemetry (previously zeroed since v0.10).

Ships with six accessible React primitives — StatusCard, EmptyState, Pagination, DataTable, ConfirmDialog, ActivityFeed — and Tailwind v4 semantic design tokens.

Consistent pagination and full OpenAPI coverage

Every list endpoint returns the same { items, total, cursor? } envelope so SDK generators can emit a single Paginated<T> type. Memory reads moved to GET, and openapi.json now documents every route the server exposes — 26 previously undocumented operations are authored in.

Workspace-scoped REST paths

Identity — workspace, zombie, grant — is now always in the URL path (/v1/workspaces/{ws}/zombies/{id}), and query parameters are reserved for pagination and search. Every handler authorizes workspace membership after authentication; cross-workspace lookups return 404, so the API does not leak the existence of resources you cannot see.

Live zombie steering

Redirect a running zombie mid-execution without killing it. POST /v1/workspaces/{ws}/zombies/{id}/steer injects a message into the zombie's event stream — delivered mid-execution if the zombie is running, queued otherwise (300-second TTL).

Persistent zombie memory

Zombies remember facts across executions. Memory is row-scoped per zombie and persists in Postgres — a lead-collector zombie doesn't re-research the same lead, a support zombie doesn't re-ask customers their plan. Tools: memory_store, memory_recall, memory_list, memory_forget.

Integration grants + credentialed proxy

Zombies — internal or external (LangGraph, CrewAI) — call external services through UseZombie's credentialed proxy. Credentials never leave the platform: injected server-side, stripped from response echoes, and logged to the activity stream.

A zombie requests a grant, humans approve once via Slack/Discord/dashboard, and the grant is reusable until revoked. Launch providers: Slack, Gmail/AgentMail, Discord, Grafana. New CLI: zombiectl agent create|list|delete, zombiectl grant list|revoke.

Zombie execution telemetry

Every event delivery records token_count, time_to_first_token_ms, wall_seconds, and credit_deducted_cents, queryable per-zombie via GET /v1/workspaces/{ws}/zombies/{id}/telemetry. Each delivery also emits an OpenTelemetry zombie.delivery span that lines up correctly in Grafana Tempo.

Slack plugin

Connect Slack via "Add to Slack" OAuth or zombiectl credential add slack. Bot tokens live in the vault; events and interactions are HMAC-verified with constant-time comparison. Any zombie with a slack_event trigger fires automatically on matching messages.

Zombie observability

Every trigger and delivery shows up in Grafana and PostHog. Prometheus exposes zombies_triggered_total, zombies_completed_total, zombies_failed_total, zombie_tokens_total, and a zombie_execution_seconds histogram; PostHog fires zombie_triggered and zombie_completed with tokens, wall-time, and exit status.

Zombie credit metering

Free-plan zombies deduct from consumed_credit_cents after each successful delivery at 1 cent per agent-second; Scale is unlimited and short-circuits without a DB write. Crash replay is idempotent on event_id, and a DB hiccup never drops or double-charges an event.

Zombie directory format, AI Firewall, error standardization, pipeline v1 removal

Zombie directory format

Zombies are now two-file directories (SKILL.md + TRIGGER.md) instead of a single .md file. SKILL.md follows the ClaHub registry format — the same file you upload to the CLI is publishable to the skill registry. TRIGGER.md carries deployment config: trigger, chain, budget, network policy, credentials. zombiectl install scaffolds both files; zombiectl up sends them raw to the API.

Dynamic skills (no compiled Zig per skill)

Skills are now config-driven. The NullCraw executor reads SKILL.md instructions and uses built-in tools (shell, http, file_read) to call external APIs. Adding a new skill requires only a new directory — no rebuild of the server binary.

AI Firewall — 4-layer outbound inspection

Every outbound request from a Zombie now passes through an AI Firewall before reaching external APIs:

Domain allowlist — only domains declared in TRIGGER.md network.allow can be reached
Endpoint policy — per-endpoint rules in TRIGGER.md firewall: section (e.g., allow GET, deny POST)
Prompt injection detection — scans outbound bodies for instruction override, role hijacking, and jailbreak patterns
Content scanning — inspects response bodies for credential leakage and PII (credit cards, SSNs, API keys) All firewall decisions are logged as activity events. Fails closed on errors.

API error format standardized (RFC 7807)

All error responses now use application/problem+json with UZ- prefixed error codes. Every error code has a stable HTTP status — callers no longer need to parse HTTP status codes independently.

Pipeline v1 removed

The v1 GitHub PR-solver pipeline has been removed. All /v1/runs/* and /v1/specs endpoints return HTTP 410 Gone with error code ERR_PIPELINE_V1_REMOVED. Use zombie-native SSE stream and chat-inject API instead (see v0.5.0 release notes).

Webhook auth — URL-embedded secret

Preferred webhook URL format: POST /v1/webhooks/{zombie_id}/{secret}. Bearer token remains supported as fallback.

Handler context layer (internal)

All HTTP handler boilerplate (arena setup, request ID, Bearer auth) is now handled by a shared hx.zig wrapper. Handlers contain only business logic. No user-visible behavior change.

Lead Zombie — v2 core ships

UseZombie is now a runtime for always-on agents. Two commands, running agent:

zombiectl install lead-collector
zombiectl up

What's new

Zombie config format

YAML frontmatter (trigger, skills, credentials, budget) + markdown body (agent instructions). The CLI compiles YAML → JSON before upload; the server only ever sees JSON. Supports voice-transcribed instructions as the instruction body.

Webhook ingestion

Every zombie gets a stable inbound URL: POST /v1/webhooks/{zombie_id}. Routing is by primary key — no source name collisions, no JSONB index. Bearer token auth per zombie. Idempotency via Redis SET NX (24h TTL). Returns 202 Accepted or 200 Duplicate.

Activity stream

Append-only audit log (core.activity_events). Every zombie action — event received, skill invoked, response returned — is timestamped and queryable. zombiectl logs streams the activity log. Cursor-based pagination for replay.

Credential injection

Credentials are resolved from the vault at runtime and injected into the sandbox. No credentials in config files. Add credentials with zombiectl credential add.

Session checkpoint

The zombie's conversation context is checkpointed to Postgres after each event. On crash and restart, the zombie resumes from the last checkpoint — no lost context.

New CLI commands

zombiectl install, zombiectl up, zombiectl status, zombiectl kill, zombiectl logs, zombiectl credential add, zombiectl credential list.

Schema additions

core.zombies — zombie registry with JSONB config
core.zombie_sessions — session checkpoint (context upserted after each event)
core.activity_events — append-only audit log (UPDATE/DELETE blocked by trigger)

Applied automatically by zombied migrate. No changes to existing tables.

API reference updated

16 v1 endpoints removed from the OpenAPI spec (agents, harness, specs endpoints no longer in v2 path). POST /v1/webhooks/{zombie_id} added. Mintlify sync required — see API Reference.

Version tooling

make sync-version / make check-version prevent VERSION drift across build.zig.zon and zombiectl/package.json.

Bug fixes

Fixed YAML parser silently dropping array items in CLI config upload
Fixed UTF-8 truncation splitting multi-byte characters in session context

Steer running agents mid-run

Interrupt a running agent without aborting it. Send a message via zombiectl runs interrupt <run_id> <message> or POST /v1/runs/{id}:interrupt — the agent picks it up at the next gate checkpoint. Two modes: queued (next checkpoint) and instant (IPC delivery).

Live run streaming (CLI)

zombiectl run --spec <file> --watch now streams gate results in real time. Reconnect with Last-Event-ID replays only missed events — no duplicate floods. Ctrl+C works cleanly.

Run replay (CLI)

zombiectl runs replay <run_id> prints a per-gate narrative for completed runs — exit codes, stdout/stderr, wall time, step by step.

Workspace billing breakdown

zombiectl workspace billing --workspace-id <id> shows completed, non-billable, and score-gated runs with optional --period and --json flags. Backed by GET /v1/workspaces/{id}/billing/summary.

Agent run observability

Every run now produces a full trace tree in Grafana Tempo — query {run.id="<id>"} for a waterfall of agent calls and gate checks. Per-workspace Prometheus metrics: token consumption, run outcomes, and gate repair loop distribution.

Resource efficiency scoring

Agent runs are now scored on actual memory and CPU usage. Agents that stay within their resource limits score higher. Score formula updated to v2 with real resource data.

Breaking change

SSE id: field on live events changed from sequential counter to created_at Unix milliseconds. Clients parsing Last-Event-ID as a sequence number must update.

Live run streaming (API)

The SSE stream endpoint is live: GET /v1/runs/{id}:stream emits gate results in real time as the agent works. CLI support (--watch) is coming in a future release.

Run replay (API)

Replay any finished run step by step via the API: GET /v1/runs/{id}:replay returns a structured gate narrative with exit codes, stdout/stderr, and wall time. CLI support (zombiectl runs replay) is coming in a future release.

Per-run cost control

Set token budgets, wall-time limits, and repair loop caps on each run. Runs that exceed limits are cancelled automatically.

OpenAPI spec

A complete OpenAPI 3.1 specification covering all 43 API endpoints is now published.

`@usezombie/zombiectl` on npm

The CLI is now available as a scoped npm package.

FilesExpand file tree

changelog.mdx

Latest commit

History

changelog.mdx

File metadata and controls

Bring your own key (BYOK) + credit-pool billing

What changed

API surface

Tenant provider

Tenant billing

Removed

CLI (zombiectl)

Dashboard

Upgrading

Notes

URL hygiene: /steer becomes /messages, /memory/* collapses into /memories

Upgrading

What's new

API reference

REST cleanup: /complete and /kill move to PATCH on the resource. Config hot-reload lands.

Upgrading

What's new

API reference

CLI

Frontmatter cleanup: runtime config moves under x-usezombie:

Upgrading

What's new

API reference

Approval inbox: pending gates surface in Mission Control, resolve from the browser

What's new

API reference

Bug fixes

Streaming substrate hot-path cleanup

Streaming substrate: every event has provenance, operators can steer, and live activity tails the dashboard

Upgrading

What's new

API reference

CLI

Install actually works now: contract aligned, parser key matches the sample, doctor preflight tightened

Upgrading

What's new

API reference

CLI

Worker substrate — install a zombie, see it work in seconds

What's new

API reference

CLI

platform-ops — flagship zombie for GitHub Actions deploy failures

What's new

Mission Control — full lifecycle in the browser; kill switch moves to DELETE

Upgrading

What's new

API reference

CLI

Admin-by-env-var is gone; credit exhaustion is now observable

Upgrading

What's new

API reference

Observability: per-workspace + per-zombie token counter now wired, OTLP histograms now exported

What's new

Bug fixes

Docs follow-up — rewritten for the v2 MVP

What's new

CLI

Tenant-scoped billing

Removed

What's new

API reference

Clerk-powered signup

What's new

API reference

Observability

Unified design system across the dashboard and marketing site

One credential surface for zombies

Zombie lifecycle is the unified product model

Docs reshaped around the zombie lifecycle

Tenant API keys

Unified webhook authentication — seven first-class providers

Operator dashboard foundation

CLI (`zombiectl`)

URL hygiene: `/steer` becomes `/messages`, `/memory/*` collapses into `/memories`

REST cleanup: `/complete` and `/kill` move to PATCH on the resource. Config hot-reload lands.

Frontmatter cleanup: runtime config moves under `x-usezombie:`

`platform-ops` — flagship zombie for GitHub Actions deploy failures

`@usezombie/zombiectl` on npm