From bc1fb9b65b9f93b7ae5cd02d83aa0b14485c9cf2 Mon Sep 17 00:00:00 2001 From: Kishore Kumar Date: Wed, 27 May 2026 11:22:29 +0530 Subject: [PATCH 1/2] docs(changelog): runner fleet execution-plane split Add a May 27 for the control-plane / execution-plane cutover: execution moved to a host-resident zombie-runner over HTTPS, lease-based ownership with fencing, mandatory sandbox. User-visible behavior unchanged; host runners not yet enabled in production. Notes the 30s lease-renewal limit. Co-Authored-By: Claude Opus 4.7 (1M context) --- changelog.mdx | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/changelog.mdx b/changelog.mdx index 22dc20b..231a2b3 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -18,6 +18,20 @@ export const STAGE_RATE_M65 = "$0.10"; usezombie is in **stealth-mode testing** and pre-production. APIs and agent behavior may change between releases without long deprecation windows. Email [usezombie@agentmail.to](mailto:usezombie@agentmail.to) if you want a hand calibrating a zombie or to join as a design partner. + + ## Execution moves to a host-resident runner fleet + + Behind an unchanged user surface, an event's agent now runs in a separate `zombie-runner` daemon instead of inside the API server. `zombied` became the control plane — it owns Postgres, Redis, and the Vault, and hands work to runners over an authenticated HTTPS protocol — while the runner is the execution plane that runs each event in a forked, sandboxed child holding no datastore credentials. Steering, webhooks, cron, the live event tail, and history behave exactly as before; what changed is where the work runs. + + ## What's new + + - **Control plane / execution plane split.** A runner leases an event, runs it, and reports the result; `zombied` does the durable writes. Work can run on hosts that never see a database credential. + - **Lease-based ownership with fencing.** Each lease carries a deadline and a monotonic fencing token. A runner that dies mid-event has its work reclaimed and re-run by another runner; a late report from the dead runner is rejected, so state is never double-written. + - **Sandbox is mandatory.** Every event runs in a Landlock + cgroups + network-namespace sandbox; a sandbox that fails to start fails closed rather than running unprotected. + + Host-resident runners are not enabled in production yet — this release lands the architecture; turning them on follows in a later update. One known limit ships with it: an agent that runs longer than the 30-second lease window is reclaimed and re-run, so long single events wait on a follow-up that adds lease renewal. + + ## `zombiectl login` — verification-code device flow + non-interactive token auth From 772d66d5943ff8ea692d9d1abc968806532a66b5 Mon Sep 17 00:00:00 2001 From: Kishore Kumar Date: Wed, 27 May 2026 16:56:56 +0530 Subject: [PATCH 2/2] docs(changelog): address greptile P2s on runner-fleet entry Wrap the two forward-looking statements (production enablement, lease-renewal follow-up) in callouts per the docs style rule, and soften the execution-plane internal detail (fork mechanics, fencing token, sandbox kernel primitives) down to operator level. Third-person voice kept to match the existing "Internal"-tagged entries. Co-Authored-By: Claude Opus 4.7 (1M context) --- changelog.mdx | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/changelog.mdx b/changelog.mdx index 231a2b3..10f7f5a 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -21,15 +21,21 @@ export const STAGE_RATE_M65 = "$0.10"; ## Execution moves to a host-resident runner fleet - Behind an unchanged user surface, an event's agent now runs in a separate `zombie-runner` daemon instead of inside the API server. `zombied` became the control plane — it owns Postgres, Redis, and the Vault, and hands work to runners over an authenticated HTTPS protocol — while the runner is the execution plane that runs each event in a forked, sandboxed child holding no datastore credentials. Steering, webhooks, cron, the live event tail, and history behave exactly as before; what changed is where the work runs. + Behind an unchanged user surface, an event's agent now runs in a separate `zombie-runner` daemon instead of inside the API server. `zombied` became the control plane — it owns Postgres, Redis, and the Vault, and hands work to runners over an authenticated HTTPS protocol — while the runner is the execution plane that runs each event in an isolated sandbox holding no datastore credentials. Steering, webhooks, cron, the live event tail, and history behave exactly as before; what changed is where the work runs. ## What's new - **Control plane / execution plane split.** A runner leases an event, runs it, and reports the result; `zombied` does the durable writes. Work can run on hosts that never see a database credential. - - **Lease-based ownership with fencing.** Each lease carries a deadline and a monotonic fencing token. A runner that dies mid-event has its work reclaimed and re-run by another runner; a late report from the dead runner is rejected, so state is never double-written. - - **Sandbox is mandatory.** Every event runs in a Landlock + cgroups + network-namespace sandbox; a sandbox that fails to start fails closed rather than running unprotected. + - **Lease-based ownership.** Each lease carries a deadline. A runner that dies mid-event has its work reclaimed and re-run by another runner; a late report from the dead runner is rejected, so state is never double-written. + - **Sandbox is mandatory.** Every event runs in a sandbox; a sandbox that fails to start fails closed rather than running unprotected. - Host-resident runners are not enabled in production yet — this release lands the architecture; turning them on follows in a later update. One known limit ships with it: an agent that runs longer than the 30-second lease window is reclaimed and re-run, so long single events wait on a follow-up that adds lease renewal. + + Host-resident runners are not enabled in production yet — this release lands the architecture; turning them on follows in a later update. + + + + One known limit: an agent that runs longer than the 30-second lease window is reclaimed and re-run, so long single events wait on a follow-up that adds lease renewal. +