Grammar-free code intelligence daemon, CLI, MCP server, and API proxy.
One binary. All local. No server required.
Save 30-50% on API tokens across any agent framework — Pi, Claude Code, Cline, OpenCode.
cargo install reliary-agent
# Auto-detect and configure your agents (Pi, Claude, Cline, OpenCode)
reliary-agent init
# Or start manually
reliary-agent serve & # daemon + proxy on :9090
reliary-agent index ./project # build FTS5 search indexAfter init, your agents have access to the daemon's MCP tools (search, risk, heal).
For proxy-based conversation compression, see API Proxy.
cargo install reliary-agentOr download a release tarball:
curl -sSfL https://github.com/Reliary/reliary-agent/releases/latest/download/reliary-$(uname -m)-unknown-linux-gnu.tar.gz | tar xz
cd reliary-* && ./install.sh| Layer | Where | Savings | How |
|---|---|---|---|
| Reasoning compression | Gate.js (Pi) / proxy (all agents) | 30-50% | Strip LLM reasoning fluff ("Let me analyze...") before it reaches your bill |
| Conversation window | Proxy | 15-25% | Collapse verbose tool results older than 8 turns into summary markers |
| Response cache | Proxy | 0-100% | Repeated requests (same model, same messages) return cached results — zero cost on retries |
| Tool schema stripping | Proxy | ~150t/turn | Remove redundant tool descriptions the LLM already knows |
flowchart LR
A[Raw conversation] --> B[Reasoning compression<br/>30-50% saved]
B --> C[Conv window collapse<br/>15-25% saved]
C --> D[Tool schema strip<br/>~150t/turn]
D --> E[Response cache<br/>0-100% on repeats]
E --> F[Billed tokens]
reliary-agent search "bm25_idf" ./project # FTS5 search
reliary-agent risk ./src/main.rs # Pre-edit risk analysis
reliary-agent compress "Let me think..." # Reasoning compression
reliary-agent dead ./project # Dead code detectionEvery tool also available through MCP — works with Claude Code, Cline, OpenCode.
When the LLM edits a file, reliary shadow-applies the change, runs tests, and reverts if tests fail. The LLM never sees the failure spiral.
flowchart LR
A[LLM sends edit] --> B{Daemon intercepts}
B --> C[Shadow-apply to temp]
C --> D[Run cargo test]
D --> E{Tests pass?}
E -->|Yes| F[Write to real file]
E -->|No| G[Revert temp file]
G --> H[Return REVERTED to LLM]
F --> I[Return OK to LLM]
- Identifier veto: blocks edits that reference hallucinated API names
- Risk gate: warns before editing files with high blast radius
- Bash guard: blocks destructive commands; routes
sed -ithrough self-healing - Muzzle: pauses background scavenger during active LLM sessions
| Agent | What reliary-agent init does |
Savings |
|---|---|---|
| Pi | Installs gate.js (tool-level compression + safety) | 30-50% |
| Claude Code | Injects MCP server config (reliary-agent mcp) |
15-25% |
| Cline | Injects MCP server config (reliary-agent mcp) |
15-25% |
| OpenCode | Injects MCP server config (reliary-agent mcp) |
15-25% |
# Explore
reliary-agent index ./project # Build FTS5 search index
reliary-agent search "query" ./path # Search index
reliary-agent risk ./src/file.rs # Pre-edit risk analysis
reliary-agent dead ./project # Dead code detection
# Edit
reliary-agent fix-dir ./project # Apply stored fix patterns
reliary-agent fix-file file old new # Apply pattern to single file
# Services
reliary-agent serve # Daemon + proxy (:9090)
reliary-agent init # Auto-configure agents
reliary-agent doctor # System health check
reliary-agent status # Project intelligence overview
reliary-agent logs # Tail daemon logs
# Config
reliary-agent config # Show current settings
reliary-agent config mode strict # Set safety level (fast/reactive/strict)The serve command starts an OpenAI-compatible compression proxy on localhost:9090.
Point any agent at it to get conversation compression without installing gate.js:
# Start proxy
reliary-agent serve &
# Point your agent to it (choose the right base URL for your provider)
export DEEPSEEK_BASE_URL=http://localhost:9090/v1 # Pi, Cline, OpenCode
export ANTHROPIC_BASE_URL=http://localhost:9090/ # Claude Code only
pi --model deepseek/deepseek-v4-flash --print "fix bug"Note:
reliary-agent initonly configures MCP tools. To get proxy compression, set theBASE_URLenvironment variable per the table above — or configure it directly in your agent's config file (Pi, Cline, and OpenCode all supportbaseUrlin their provider settings).
flowchart LR
A[Agent Request] --> B{Auth Routing}
B --> C[Compression]
C --> D{Cache Hit?}
D -->|Yes| E[Return cached]
D -->|No| F[Forward to API]
F --> G[Stream back to agent]
G --> H[Cache response]
Provider-agnostic routing: The proxy automatically detects the API provider
from the Authorization header. OpenAI, Anthropic, and DeepSeek keys all route
to the correct upstream without manual configuration.
True SSE streaming: The proxy streams chunks back to the client in real-time, preserving the typewriter effect in your agent's UI.
What the proxy compresses:
- Conversation history: old assistant reasoning messages are compressed before being sent to the API (~15-25% fewer billable tokens)
- Response cache: identical requests (same model, same messages) return cached responses — zero API cost on repeat edits
- Tool schemas: redundant description text is stripped from the tools array sent with each request (~150t saved per turn)
- Context filter: tool results older than 8 turns are collapsed into summary markers, preventing unbounded context growth
# Human (default)
reliary-agent search "merge_sort" ./project
# Agent (compact)
reliary-agent -f compact search "merge_sort" ./project
# → 4.2294 ./src/sort.rs
# CI (JSON)
reliary-agent -f json dead ./project | jq '.[] | select(contains("HIGH"))'See CONFIG.md for the full documentation.
| Env var | Effect |
|---|---|
RELIARY_MODE=fast |
Maximum compression (no safety rails) |
RELIARY_MODE=reactive |
Safety escalates on unsafe behavior (default) |
RELIARY_MODE=strict |
Full sandbox (bash blocked, edits always healed) |
RELIARY_FEATURES=+editMerge,-taskTargets |
Toggle individual features |
RELIARY_UPSTREAM_URL=https://api.openai.com/v1 |
Set API upstream (default: auth-based routing) |
DEEPSEEK_BASE_URL=http://localhost:9090/v1 |
Route Pi/Cline/OpenCode through proxy |
ANTHROPIC_BASE_URL=http://localhost:9090/ |
Route Claude Code through proxy |
This binary consolidates 9 crates — each ported from a standalone tool — into one binary. Shared tokenizer, shared session state, no IPC overhead.
graph TD
A[CLI] --> D[Daemon Core]
B[MCP Server] --> D
C[API Proxy :9090] --> D
D --> E[(FTS5 Index)]
D --> F[(Chronicle)]
D --> G[(Co-occurrence)]
C --> H[Upstream API<br/>DeepSeek / OpenAI / Anthropic]
I[Gate.js<br/>Pi Agent] --> C
J[Claude Code] --> C
K[Cline] --> C
- search: BM25 + FTS5, Porter stemming, phrase extraction (from stria)
- compress: IR reasoning compression (from gate.js)
- sift: Structural compression, entropy/diversity gates (from sift + maxwell)
- risk: Pre-edit risk scores, blast radius (from quale)
- memory: HDC 10K-bit vectors, Hebbian learning (from cortex-rs)
- fix: Pattern extraction, content matching, signature matching (from cortex-rs)
- dead: Grammar-free dead code via occurrence counting (from carrion)
- agent: Binary — daemon, proxy (axum + tokio), CLI, MCP
cargo build --release
cargo test --release
reliary-agent serve & # start daemon + proxy- CONFIG.md — Mode system, feature flags, config cascade
- SECURITY.md — Vulnerability disclosure and security policy
- CONTRIBUTING.md — Build, test, PR workflow
MIT