The AI agent that respects your resources.
Single binary. Minimal hardware. Maximum context efficiency.
Every token counts — Zeph makes sure none are wasted.
Most AI agent frameworks dump every tool description, skill, and raw output into the context window — and bill you for it. Zeph takes the opposite approach: automated context engineering. Only relevant data enters the context. The result — lower costs, faster responses, and an agent that runs on hardware you already have.
- Semantic skill selection — embeds skills as vectors, retrieves only top-K relevant per query instead of injecting all
- Smart output filtering — command-aware filters strip 70-99% of noise before context injection; oversized responses offloaded to filesystem
- Resilient context compaction — reactive retry on context overflow, middle-out progressive tool response removal, 9-section structured compaction prompt, LLM-free metadata fallback
- Tool-pair summarization — when visible tool call/response pairs exceed a configurable cutoff, the oldest pair is summarized via LLM and originals hidden from context
- Accurate token counting — tiktoken-based cl100k_base tokenizer with DashMap cache replaces chars/4 heuristic
- Proportional budget allocation — context space distributed by purpose, not arrival order
- Optimized agent loop hot-path — compaction check is O(1) via cached token count;
EnvironmentContextbuilt once at bootstrap and partially refreshed on skill reload; doom-loop hashing done in-place with no intermediate allocations; token counting for tool output pruning reduced to a single call per part
Tip
curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | shOther methods
cargo install zeph # crates.io
cargo install --git https://github.com/bug-ops/zeph # from source
docker pull ghcr.io/bug-ops/zeph:latest # DockerPre-built binaries: GitHub Releases · Docker guide
zeph init # interactive setup wizard
zeph # run the agent
zeph --tui # run with TUI dashboardFull setup guide → · Configuration reference →
| Hybrid inference | Ollama, Claude, OpenAI, Candle (GGUF), any OpenAI-compatible API. Multi-model orchestrator with fallback chains. Response cache with blake3 hashing and TTL |
| Skills-first architecture | YAML+Markdown skill files with semantic matching, self-learning evolution, 4-tier trust model, and compact prompt mode for small-context models |
| Semantic memory | SQLite + Qdrant (or embedded SQLite vector search) with MMR re-ranking, temporal decay scoring, resilient compaction (reactive retry, middle-out tool response removal, 9-section structured prompt, LLM-free fallback), durable compaction with message visibility control, tool-pair summarization (LLM-based, configurable cutoff), credential scrubbing, cross-session recall, vector retrieval, autosave assistant responses, snapshot export/import, configurable SQLite pool, and background response-cache cleanup |
| Multi-channel I/O | CLI, Telegram, Discord, Slack, TUI — all with streaming. Vision and speech-to-text input |
| Protocols | MCP client (stdio + HTTP), A2A agent-to-agent communication, ACP server for IDE integration (stdio + HTTP+SSE + WebSocket, multi-session with LRU eviction, persistence, idle reaper, permission persistence, multi-modal prompts, runtime model switching, MCP server management via ext_method, session export/import), sub-agent orchestration |
| Defense-in-depth | Shell sandbox (blocklist + confirmation patterns for process substitution, here-strings, eval), tool permissions, secret redaction, SSRF protection (HTTPS-only, DNS validation, address pinning, redirect chain re-validation), skill trust quarantine, audit logging. Secrets held in memory as Zeroizing<String> — wiped on drop |
| TUI dashboard | ratatui-based with syntax highlighting, live metrics, file picker, command palette, daemon mode |
| Single binary | ~15 MB, no runtime dependencies, ~50ms startup, ~20 MB idle memory |
Architecture → · Feature flags → · Security model →
Zeph implements the Agent Client Protocol — use it as an AI backend in Zed, Helix, VS Code, or any ACP-compatible editor via stdio, HTTP+SSE, or WebSocket transport.
zeph acp # stdio (editor spawns as subprocess)
zeph acp --http :8080 # HTTP+SSE (shared/remote)
zeph acp --ws :8080 # WebSocketbug-ops.github.io/zeph — installation, configuration, guides, architecture, and API reference.
See CONTRIBUTING.md for development workflow and guidelines.
The workspace enforces unsafe_code = "deny" at the Cargo workspace lint level — no unsafe Rust is permitted in any crate without an explicit override.
Found a vulnerability? Please use GitHub Security Advisories for responsible disclosure.
