-
Notifications
You must be signed in to change notification settings - Fork 0
docs: reflect v2 pivot — agent hosting, credential firewall #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,151 +1,108 @@ | ||
| --- | ||
| title: "How it works" | ||
| description: "The spec-to-PR lifecycle with self-repairing agents." | ||
| description: "The UseZombie agent hosting model — credential firewall, always-on execution, and observability." | ||
| --- | ||
|
|
||
| ## The spec-to-PR lifecycle | ||
| ## The agent hosting model | ||
|
|
||
| UseZombie turns a markdown spec into a validated pull request through a deterministic pipeline: validate, implement, gate, score, ship. | ||
| UseZombie sits between your agent and the outside world. You bring the agent logic. We provide the runtime: a sandboxed process, a credential firewall, wired webhooks, and a kill switch. | ||
|
|
||
| ```mermaid | ||
| flowchart LR | ||
| A[Write spec] --> B[Submit via CLI/API] | ||
| B --> C[Validate spec] | ||
| C --> D[Agent implements] | ||
| D --> E[Gate loop] | ||
| E -->|Pass| F[Score run] | ||
| E -->|Fail| G[Self-repair] | ||
| G --> E | ||
| F --> H[Open PR with scorecard] | ||
| A[Your agent code] --> B[UseZombie sandbox] | ||
| B --> C[Credential firewall] | ||
| C --> D[External APIs / LLMs] | ||
| E[Webhooks: GitHub, Slack, email] --> B | ||
| F[zombiectl / Mission Control] --> B | ||
| ``` | ||
|
|
||
| Your agent never sees raw credentials. It makes requests. The firewall intercepts them, injects the right token, and forwards. Audit logs record every action. | ||
|
|
||
| ## Step by step | ||
|
|
||
| <Steps> | ||
| <Step title="Write a spec"> | ||
| A spec is a markdown file describing what you want built. It can follow any format — structured sections, free-form prose, bullet lists. The agent reads natural language and infers intent from your codebase context. | ||
|
|
||
| You describe **what** to build. The agent figures out **how**. | ||
| <Step title="Connect your agent"> | ||
| Push your agent code to a workspace. UseZombie wraps it in a sandboxed process with resource limits (CPU, memory, wall time). The agent starts running immediately and restarts automatically on crash. | ||
| </Step> | ||
|
|
||
| <Step title="Submit"> | ||
| Submit via `zombiectl run --spec <path>` or the REST API (`POST /v1/runs`). On submission, UseZombie validates that referenced files exist in the workspace, deduplicates against in-flight runs, and enqueues the work. | ||
| <Step title="Store credentials — once"> | ||
| Add API keys, tokens, and secrets to the workspace credential store via `zombiectl skill-secret put` or Mission Control. Credentials are encrypted at rest and never passed into the sandbox. | ||
| </Step> | ||
|
|
||
| <Step title="Agent implements"> | ||
| The NullClaw agent runtime picks up the run and works inside an **isolated git worktree** — a fresh working directory branched from your default branch. The agent receives the spec plus injected codebase context (relevant file contents, module structure) to produce an accurate implementation. | ||
|
|
||
| No changes touch your main branch until you approve the PR. | ||
| </Step> | ||
|
|
||
| <Step title="Gate loop"> | ||
| After implementation, the agent runs your project's standard validation gates in sequence: | ||
|
|
||
| 1. `make lint` — linting and type checks | ||
| 2. `make test` — unit tests | ||
| 3. `make build` — production build | ||
|
|
||
| If any gate fails, the agent reads the error output, diagnoses the issue, and self-repairs. This loop runs up to **3 times** by default. If all repair attempts fail, the run is marked `FAILED` with full error context. | ||
| <Step title="Firewall injects credentials per-request"> | ||
| When your agent makes an outbound request, the firewall intercepts it, matches the target against your credential policy, and injects the token before forwarding. The agent code never contains a key — it just makes requests. | ||
|
|
||
| <Info> | ||
| The repair limit is configurable per agent profile. See [Gate loop](/runs/gate-loop) for details. | ||
| This is the core security guarantee: credential injection happens at the network boundary, outside the sandbox. A compromised agent cannot exfiltrate credentials it never received. | ||
| </Info> | ||
| </Step> | ||
|
|
||
| <Step title="Score"> | ||
| Every completed run receives a **scorecard** with four weighted dimensions: | ||
|
|
||
| | Dimension | Weight | What it measures | | ||
| |-----------|--------|------------------| | ||
| | Completion | 40% | Did the agent implement everything the spec asked for? | | ||
| | Error rate | 30% | How many gate failures occurred before passing? | | ||
| | Latency | 20% | Wall-clock time from enqueue to PR. | | ||
| | Resource efficiency | 10% | Token and compute usage relative to task complexity. | | ||
|
|
||
| Scores map to tiers: | ||
|
|
||
| | Tier | Score range | | ||
| |------|-------------| | ||
| | Bronze | 0 -- 39 | | ||
| | Silver | 40 -- 69 | | ||
| | Gold | 70 -- 89 | | ||
| | Elite | 90 -- 100 | | ||
| <Step title="Webhooks arrive without ngrok"> | ||
| Register webhook sources (GitHub, Slack, email, custom HTTP) on the workspace. UseZombie provides a stable inbound endpoint and routes matching events to your agent process. No tunneling, no port forwarding, no custom servers. | ||
| </Step> | ||
|
|
||
| <Step title="PR"> | ||
| The agent pushes a branch named `zombie/<run_id>/<slug>` and opens a pull request on GitHub. The PR body contains an agent-generated explanation of what was implemented and why. A scorecard comment is posted with the quality metrics. | ||
|
|
||
| From here, it's a normal code review. Approve, request changes, or close. | ||
| <Step title="Observe and control"> | ||
| Every agent action is timestamped in the audit log: what ran, when, what it called, and what it cost. Budget alerts fire before you hit limits. The kill switch stops any agent mid-action from the CLI or dashboard. | ||
| </Step> | ||
| </Steps> | ||
|
|
||
| ## Runtime architecture | ||
| ## Credential firewall architecture | ||
|
|
||
| Under the hood, the CLI, API server, queue, worker, and executor coordinate to move a run from submission to PR. | ||
| ```mermaid | ||
| sequenceDiagram | ||
| participant A as Agent (sandbox) | ||
| participant F as Credential firewall | ||
| participant V as Credential store | ||
| participant E as External API | ||
|
|
||
| A->>F: GET api.openai.com/v1/chat/completions | ||
| F->>V: lookup(workspace_id, target_host) | ||
| V-->>F: Bearer sk-... | ||
| F->>E: GET api.openai.com/v1/chat/completions<br/>Authorization: Bearer sk-... | ||
| E-->>F: 200 OK | ||
| F-->>A: 200 OK | ||
| Note over F: audit_log.append(action, ts, cost_tokens) | ||
| ``` | ||
|
|
||
| The agent makes a plain HTTP request. The firewall resolves the right credential from the store, injects it, and forwards. The agent receives the response. The credential value never crosses the sandbox boundary. | ||
|
|
||
| ## Runtime architecture | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant CLI as zombiectl | ||
| participant API as zombied API | ||
| participant Q as Redis Streams | ||
| participant W as zombied worker | ||
| participant E as zombied-executor | ||
| participant GH as GitHub | ||
| participant S as Sandbox process | ||
| participant F as Credential firewall | ||
|
|
||
| CLI->>API: POST /v1/runs (spec) | ||
| API->>Q: enqueue run_id | ||
| CLI->>API: POST /v1/agents (agent config) | ||
| API->>Q: enqueue agent_start | ||
| Q->>W: claim work | ||
| W->>E: StartStage (agent config, tools) | ||
| E->>E: NullClaw agent implements | ||
| E->>W: ExecutionResult | ||
| W->>W: Gate loop (lint/test/build) | ||
| W->>GH: push branch + open PR | ||
| W->>GH: post scorecard comment | ||
| W->>S: spawn sandboxed process | ||
| S->>F: outbound requests (no credentials) | ||
| F->>F: inject credentials + log | ||
| W->>CLI: SSE: status, logs, cost | ||
| ``` | ||
|
|
||
| **Component responsibilities:** | ||
|
|
||
| - **zombiectl** — CLI client. Submits specs, checks status, streams logs. | ||
| - **zombied API** — HTTP server. Validates specs, manages runs and workspaces, serves the dashboard. | ||
| - **`zombiectl`** — CLI client. Deploys agents, checks status, manages secrets, streams logs. | ||
| - **`zombied` API** — HTTP server. Manages agent lifecycle, credential store, webhook routing, billing. | ||
| - **Redis Streams** — Work queue. Durable, ordered, with consumer group semantics for worker fleet scaling. | ||
| - **zombied worker** — Claim runs, orchestrate the gate loop, push results to GitHub. Supports drain and rolling deploys. | ||
| - **zombied-executor** — Sidecar process that owns the sandbox lifecycle. Spawns NullClaw agents, manages worktrees, enforces resource limits. | ||
| - **GitHub** — Target forge. Branch push, PR creation, scorecard comments. | ||
|
|
||
| ## Agent relay model | ||
| - **`zombied` worker** — Owns the sandbox lifecycle. Spawns agents, enforces resource limits, handles restarts. | ||
| - **Credential firewall** — Network-layer proxy. Intercepts outbound requests, injects credentials, records audit logs. | ||
|
|
||
| For lightweight, interactive agent sessions (`spec init`, `run --preview`), UseZombie uses a different execution model: the **agent relay**. Instead of queuing work for a sandbox, `zombied` acts as a stateless pass-through between the CLI and the workspace's LLM provider. | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant CLI as zombiectl | ||
| participant API as zombied API | ||
| participant LLM as LLM Provider | ||
|
|
||
| CLI->>API: POST /v1/agent/stream (mode, tools, messages) | ||
| API->>LLM: Forward with system prompt + API key | ||
| LLM-->>API: tool_use: list_dir(".") | ||
| API-->>CLI: SSE: event: tool_use | ||
| Note over CLI: Executes locally on laptop | ||
| CLI->>API: POST /v1/agent/stream (messages + tool_result) | ||
| API->>LLM: Forward accumulated messages | ||
| LLM-->>API: tool_use: read_file("go.mod") | ||
| API-->>CLI: SSE: event: tool_use | ||
| Note over CLI: Reads file locally | ||
| CLI->>API: POST /v1/agent/stream (messages + tool_result) | ||
| API->>LLM: Forward accumulated messages | ||
| LLM-->>API: text: "# M5_001: Rate Limiting..." | ||
| API-->>CLI: SSE: event: text_delta + done {usage} | ||
| ``` | ||
| ## Spend control | ||
|
|
||
| **Key differences from the pipeline model:** | ||
| Every workspace has configurable limits that prevent runaway costs: | ||
|
|
||
| | | Pipeline (full runs) | Agent relay (spec init, preview) | | ||
| |---|---|---| | ||
| | **Execution** | Sandbox on worker, queued | Direct handler, no queue | | ||
| | **File access** | Agent reads files in sandbox | CLI reads files locally, sends to model on demand | | ||
| | **Duration** | 1-5 minutes | 3-8 seconds | | ||
| | **State** | Durable (DB + Redis) | Stateless (CLI manages conversation) | | ||
| | **Provider** | Configured per workspace | Same, resolved by `zombied` | | ||
| | Control | What it does | | ||
| |---------|-------------| | ||
| | Token budget | Max tokens per agent execution window | | ||
| | Wall time limit | Max wall-clock time before forced stop | | ||
| | Cost ceiling | Max USD spend per billing period | | ||
| | Kill switch | Manual stop from CLI or Mission Control at any time | | ||
|
|
||
| The relay model is inspired by how Claude Code and OpenCode work: the CLI holds tool definitions, the model calls them on demand, and the API layer is a relay. The difference is `zombied` sits between CLI and provider because API keys are server-side secrets managed per-workspace. | ||
| When a limit is hit, the agent receives a graceful shutdown signal. The audit log records the reason. No surprises on the invoice. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.