Give your AI agent a brain for your codebase.
AI coding agents waste most of their tool calls fumbling through your codebase with grep, cat, find, and file reads. rpg-encoder fixes that. It builds a semantic graph of your code with Tree-sitter — not just what calls what, but what every function does — and gives your AI assistant whole-repo understanding via MCP in a single tool call.
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-serverOne command. Works with Claude Code, Cursor, opencode, Windsurf, or any MCP-compatible agent. No Rust toolchain, no cloning, no building — npx downloads a pre-built binary for your platform.
Then open any repo and tell your agent:
"Build and lift the RPG for this repo"
Your agent handles everything: indexes entities (seconds), reads each function and adds intent-level features (a few minutes), organizes them into a semantic hierarchy, and commits .rpg/graph.json for your team.
For repos with ~100+ entities, lifting_status will tell your agent to delegate the lifting loop to a sub-agent or a cheaper model — feature extraction is pattern-matching, not novel reasoning. If your runtime has no sub-agent mechanism, run rpg-encoder lift --provider anthropic|openai from the terminal with an API key — the CLI drives an external LLM directly with no agent involvement. After the CLI finishes, call reload_rpg in your session to load the updated graph. The CLI lifts entities with no features; re-lifting stale entities (features present but outdated after code changes) is handled by the in-session MCP flow, not the CLI.
Once lifted, try:
- "What handles authentication?" — finds code even when nothing is named "auth"
- "Show everything that depends on the database connection"
- "Plan a change to add rate limiting to API endpoints"
The server instructions tell your agent to reach for RPG tools FIRST for any
question about code structure or behavior. That reflex matters — grep, cat,
and ad-hoc file reads burn tokens and miss semantic relationships RPG already
knows.
| If you'd otherwise reach for... | Use this instead |
|---|---|
grep -r / rg (by intent) |
search_node(query="...") |
grep -r / rg (by name) |
search_node(query="...", mode="snippets") |
cat / reading a function |
fetch_node(entity_id="file:name") |
| chained greps for callers/callees | explore_rpg(entity_id="...", direction="...") |
| recursive grep for "what depends on X" | impact_radius(entity_id="...") |
wc -l / find / tree |
rpg_info |
| reading many files for context | semantic_snapshot |
| manual search → fetch → explore chains | context_pack(query="...") |
| "how do I refactor X safely" | plan_change(goal="...") |
Fall back to grep, cat, or file reads only when the query is about literal text
(string search, comments, TODOs, log messages) — not about structure.
- Parse — Tree-sitter extracts entities (functions, classes, methods) and dependency edges (imports, calls, inheritance) from 15 languages.
- Lift — An LLM (your agent, or a cheap API like Haiku) reads each entity and writes verb-object features: "validate JWT tokens", "serialize config to disk".
- Organize — Features cluster into a 3-level semantic hierarchy (Area → Category → Subcategory) that emerges from what the code does, not the file tree.
- Understand —
semantic_snapshotcompresses the whole graph into ~25K tokens. Your LLM reads it once and knows the repo.
Instead of grepping through files, the LLM calls semantic_snapshot once and receives:
- Hierarchy — every functional area with aggregate features
- Entities — every function, class, method grouped by area, with its semantic features
- Dependency skeleton — condensed call graph with qualified names
- Hot spots — top 10 most-connected entities (the architectural backbone)
~25K tokens covers ~1000 entities. That's 2-3% of a 1M context window — the LLM starts every session already knowing your repo.
Whenever your working tree changes — committed, staged, or unstaged — the MCP server automatically re-syncs before responding to the next query. A changeset hash over (path, size, mtime) means repeated saves of the same file trigger one sync, and idle queries trigger none. Reverts are detected too: if a previously-dirty file returns to its HEAD state, the graph is restored.
| Mode | Command | Cost | Who pays |
|---|---|---|---|
| Agent lifting | "Build and lift the RPG" | Subscription tokens | Your Claude Code / Cursor subscription |
| Autonomous lifting | auto_lift(provider="anthropic", api_key_env="ANTHROPIC_API_KEY") |
~$0.02 per 100 entities | External API key (Haiku, GPT-4o-mini, OpenRouter, Gemini) |
auto_lift calls a cheap external LLM directly — your coding subscription never touches the lifting work. Use api_key_env to resolve keys from environment variables so they never appear in tool call transcripts.
Seven Rust crates, one MCP server binary, one CLI binary:
| Crate | Role |
|---|---|
rpg-core |
Graph types (RPGraph, Entity, HierarchyNode), storage, LCA algorithm |
rpg-parser |
Tree-sitter entity + dependency extraction (15 languages) |
rpg-encoder |
Encoding pipeline, lifting utilities, incremental evolution |
rpg-nav |
Search, fetch, explore, snapshot, TOON serialization |
rpg-lift |
Autonomous LLM lifting (Anthropic, OpenAI, OpenRouter, Gemini) |
rpg-cli |
CLI binary (rpg-encoder) |
rpg-mcp |
MCP server binary (rpg-mcp-server) with 27 tools |
Build & Maintain (4 tools)
| Tool | Description |
|---|---|
build_rpg |
Index the codebase (run once, instant) |
update_rpg |
Incremental update from git changes |
reload_rpg |
Reload graph from disk after external changes |
rpg_info |
Graph statistics, hierarchy overview, per-area lifting coverage |
Navigate & Search (5 tools)
| Tool | Description |
|---|---|
semantic_snapshot |
Whole-repo semantic understanding in one call (~25K tokens for 1000 entities) |
search_node |
Search entities by intent or keywords (hybrid embedding + lexical scoring) |
fetch_node |
Get entity metadata, source code, dependencies, and hierarchy context |
explore_rpg |
Traverse dependency graph (upstream, downstream, or both) |
context_pack |
Single-call search + fetch + explore with token budget |
Plan & Analyze (7 tools)
| Tool | Description |
|---|---|
impact_radius |
BFS reachability analysis — "what depends on X?" |
plan_change |
Change planning — find relevant entities, modification order, blast radius |
find_paths |
K-shortest dependency paths between two entities |
slice_between |
Extract minimal connecting subgraph between entities |
analyze_health |
Code health: coupling, instability, god objects, clone detection |
detect_cycles |
Find circular dependencies and architectural cycles |
reconstruct_plan |
Dependency-safe reconstruction execution plan |
Semantic Lifting (11 tools)
| Tool | Description |
|---|---|
auto_lift |
One-call autonomous lifting via cheap LLM API (Haiku, GPT-4o-mini, OpenRouter, Gemini) |
lifting_status |
Dashboard — coverage, per-area progress, NEXT STEP |
get_entities_for_lifting |
Get entity source code for your agent to analyze |
submit_lift_results |
Submit the agent's semantic features back to the graph |
finalize_lifting |
Aggregate file-level features, rebuild hierarchy metadata |
get_files_for_synthesis |
Get file-level entity features for holistic synthesis |
submit_file_syntheses |
Submit holistic file-level summaries |
build_semantic_hierarchy |
Get domain discovery + hierarchy assignment prompts |
submit_hierarchy |
Apply hierarchy assignments to the graph |
get_routing_candidates |
Get entities needing semantic routing (drifted or newly lifted) |
submit_routing_decisions |
Submit routing decisions (hierarchy path or "keep") |
15 languages via Tree-sitter:
| Language | Entity Extraction | Dependency Resolution |
|---|---|---|
| Python | Functions, classes, methods | imports, calls, inheritance |
| Rust | Functions, structs, traits, impl methods | use, calls, trait impls |
| TypeScript | Functions, classes, methods, interfaces | imports, calls, inheritance |
| JavaScript | Functions, classes, methods | imports, calls, inheritance |
| Go | Functions, structs, methods, interfaces | imports, calls |
| Java | Classes, methods, interfaces | imports, calls, inheritance |
| C / C++ | Functions, classes, methods, structs | includes, calls, inheritance |
| C# | Classes, methods, interfaces | using, calls, inheritance |
| PHP | Functions, classes, methods | use, calls, inheritance |
| Ruby | Classes, methods, modules | require, calls, inheritance |
| Kotlin | Functions, classes, methods | imports, calls, inheritance |
| Swift | Functions, classes, structs, protocols | imports, calls, inheritance |
| Scala | Functions, classes, objects, traits | imports, calls, inheritance |
| Bash | Functions | source, calls |
# Claude Code
claude mcp add rpg -- npx -y -p rpg-encoder rpg-mcp-server
# Cursor — add to ~/.cursor/mcp.json
{
"mcpServers": {
"rpg": {
"command": "npx",
"args": ["-y", "-p", "rpg-encoder", "rpg-mcp-server"]
}
}
}The server auto-detects the project root from the current working directory — no path argument needed.
CLI
npm install -g rpg-encoder
# Build a graph
rpg-encoder build
# Query
rpg-encoder search "parse entities from source code"
rpg-encoder fetch "src/parser.rs:extract_entities"
rpg-encoder explore "src/parser.rs:extract_entities" --direction both --depth 2
rpg-encoder info
# Autonomous lifting via API
rpg-encoder lift --provider anthropic --dry-run # estimate cost
rpg-encoder lift --provider anthropic # lift with Haiku (~$0.02/100 entities)
# Incremental update
rpg-encoder update
# Pre-commit hook (auto-updates graph on commit)
rpg-encoder hook installBuild from source
git clone https://github.com/userFRM/rpg-encoder.git
cd rpg-encoder && cargo build --releaseThen point your MCP config at target/release/rpg-mcp-server.
- How RPG Compares — honest comparison with GitNexus, Serena, Repomix, and others
- Paper Fidelity — algorithm-by-algorithm comparison with the research paper
- Use Cases — practical examples of what RPG enables
- CHANGELOG — release history
rpg-encoder is built on the theoretical framework from the RPG-Encoder research paper, with original extensions inspired by tools across the code intelligence landscape:
- RPG-Encoder paper (Luo et al., 2026, Microsoft Research) — semantic lifting model, 3-level hierarchy construction, incremental evolution algorithms, formal graph model
G = (V_H ∪ V_L, E_dep ∪ E_feature). - GitNexus — precomputed relational intelligence, blast radius analysis, Claude Code hooks. Showed that a code graph tool must be invisible to be essential.
- Serena — symbol-level precision via LSP. Demonstrated that real-time code awareness matters more than batch analysis.
- TOON — Token-Oriented Object Notation for LLM-optimized output.
This is an independent implementation. All code is original work under the MIT license. Not affiliated with or endorsed by Microsoft.




