Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@
"repository": "https://github.com/denfry/codebase-index",
"license": "MIT",
"keywords": [
"claude-code",
"code-search",
"semantic-code-search",
"codebase-index",
"mcp",
"ai-agents",
"local-first",
"tree-sitter",
"rag",
"sqlite",
Expand Down
27 changes: 27 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,33 @@ All notable changes to this project are documented here. The format is based on

## [Unreleased]

### Added
- **`clean` is now implemented** (it was a documented-but-stubbed `_todo` since M0).
`codebase-index clean` resets the index database (`index.sqlite` + WAL/SHM
sidecars); `codebase-index clean --all` wipes the whole per-project cache
directory. It prompts before deleting (skip with `--yes`), supports `--json`,
and never touches the installed skill. Locked in by `tests/test_clean_cli.py`.
- **`docs/PRODUCT_UPGRADE_PLAN.md`**: positioning, target users, competitor matrix,
differentiators, current weaknesses, a ranked roadmap, and documentation /
benchmark / distribution / technical task lists.
- **`docs/RELEASE_CHECKLIST.md`**: a repeatable release checklist (version sync,
tests, benchmarks, doctor, install/plugin/MCP smoke, changelog) with signed
checksums + SBOM tracked as future hardening.

### Changed
- **README**: added "Who Is It For?" and a "How Is This Different?" section that
answers why-not-grep / Cursor / Aider repo-map / Sourcegraph / Codebase-Memory
MCP on the first screen, plus a proven-today-vs-roadmap table.
- **`docs/COMPARISON.md`**: explicit rows and "choose them when / choose us when"
guidance for Continue, Sourcegraph/Cody/Amp, and Codebase-Memory MCP.
- **`docs/BENCHMARKS.md`**: a status table separating proven / toy / honest
surfaces, an explicit "claims that should NOT be made yet" list, and a
TODO-friendly benchmark task checklist with a no-overclaim procedure.

### Fixed
- `docs/FAQ.md`: removed a dangling/duplicated sentence in "Is it
production-ready?" and documented the real `clean` / `clean --all` behavior.

## [1.3.0] - 2026-06-09

### Added
Expand Down
81 changes: 80 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ references without scanning an entire repository.
[![SQLite](https://img.shields.io/badge/database-SQLite-blue.svg)](docs/DATABASE_SCHEMA.md)
[![Tree-sitter](https://img.shields.io/badge/parsing-Tree--sitter-orange.svg)](docs/ARCHITECTURE.md)

<p align="center">
<img src="assets/demo.png" width="820"
alt="codebase-index ranking a local search for 'where is user authentication implemented?' into scored files with recommended file:line ranges to read">
</p>

## What Is codebase-index?

**codebase-index is a private, offline retrieval layer for AI code search.** It
Expand All @@ -27,6 +32,24 @@ an AI coding agent can read instead of opening broad file sets.
Use it when you want Cursor-like codebase awareness in terminal-based AI tools
while keeping source code, snippets, and search metadata on your machine.

> **codebase-index is not an IDE and not a coding agent.** It is the local
> retrieval/index layer that gives terminal and MCP-based AI agents precise
> codebase context. The agent stays your interface; this gives it better aim.

## Who Is It For?

- **Claude Code / Codex CLI / OpenCode users** on medium-to-large repos who want
the agent to read 3 ranked files instead of grepping and scanning 60.
- **Privacy-constrained teams** (proprietary or regulated code) who cannot send
source to a cloud code-intelligence service.
- **MCP power users** who want a stable, queryable code index as a tool, not a
black box baked into one agent's prompt.
- **Tooling authors** who need scriptable retrieval (`--json`, SQLite, MCP) that
other tools can build on.

Not for you if you want a full IDE, org-scale multi-repo search, or a hosted
platform — use Cursor or Sourcegraph for those.

## Start Here

If you are opening this repository for the first time, follow this order:
Expand Down Expand Up @@ -145,6 +168,61 @@ Developers get Cursor-like codebase awareness in Claude Code, Codex CLI, and
OpenCode without leaving the terminal or sending code to a remote indexing
service.

## How Is This Different?

Short answers to the questions people actually ask. The full, honest matrix —
including when you should pick the other tool — is in
[docs/COMPARISON.md](docs/COMPARISON.md).

- **Why not just `grep`/`rg`?** Grep returns every match with no ranking, no
symbol awareness, and no idea which files relate. codebase-index ranks results,
knows a definition from a call, expands along the dependency graph, and returns
specific line ranges under a token budget — so the agent reads less and answers
with citations.
- **Why not Cursor?** Cursor is a great AI IDE with strong codebase awareness, but
it is proprietary and IDE-centric. codebase-index is a local, open retrieval
layer for **terminal and MCP** agents, offline by default, with no IDE lock-in.
If you live inside Cursor, keep using Cursor.
- **Why not Aider repo-map?** Aider's repo-map is a good graph-ranked,
token-budgeted context map — but it is optimized to feed Aider's own chat.
codebase-index is a **reusable, queryable index**: CLI/JSON/MCP commands return
ranked `file:line` ranges, symbols, references, and impact that *any*
shell-capable agent can consume, with freshness and security gates.
- **Why not Sourcegraph / Cody / Amp?** They are excellent enterprise-grade,
cross-repo code intelligence platforms. They are also heavier and
account/platform-oriented. codebase-index is single-repo, local, and
lightweight — no server, no account, no code leaving the machine by default.
- **Why not Codebase-Memory MCP?** It is the closest direct alternative — a
broader graph engine with a static binary and wide language/agent coverage. We
do **not** claim to beat it globally. We differentiate on simplicity, a strict
privacy model, token-budgeted retrieval packets, a transparent Python
implementation, the Claude/Codex/OpenCode workflow, and honest benchmarks. If
you need its broader graph and language reach today, choose it.

**What makes it trustworthy?** No telemetry, no network by default, a multi-gate
exclusion pipeline (secrets/binaries/generated/dependencies never indexed),
output-time secret redaction, a `doctor --strict` safety self-check, and a
public benchmark suite wired as a CI regression gate. Claims that aren't proven
in this repo are marked as roadmap, not done.

### Proven today vs. roadmap

| Capability | Status |
|---|---|
| Hybrid retrieval (path + symbol + FTS5 + graph), token-budgeted packets | ✅ Shipped |
| Tree-sitter symbols for 12 Tier-A languages + Tier-B generic path | ✅ Shipped |
| Import/call/reference/inheritance graph, `refs`/`impact` | ✅ Shipped |
| Optional local embeddings; external embeddings gated 3 ways | ✅ Shipped |
| stdio MCP server; CLI/skill/MCP share one service layer | ✅ Shipped |
| Honest 55k LOC Java benchmark (recall@3 70% vs 40% `rg`, ~13× fewer tokens) | ✅ Shipped |
| 10k/100k/1M LOC public-repo benchmarks | 🚧 Roadmap |
| Framework-aware typed edges (route→handler→service→model) | 🚧 Roadmap |
| PyPI / `uvx` / Homebrew, signed checksums, SBOM | 🚧 Roadmap |
| Verified per-client MCP docs, paged/progressive results | 🚧 Roadmap |

See [docs/PRODUCT_UPGRADE_PLAN.md](docs/PRODUCT_UPGRADE_PLAN.md) for the full
upgrade plan and ranked roadmap.

## How Does codebase-index Work?

`codebase-index` builds a local hybrid index that combines:
Expand Down Expand Up @@ -537,7 +615,8 @@ Yes. The CLI is agent-agnostic. Any agent that can run shell commands can use
### How do I reset the index?

```bash
codebase-index clean
codebase-index clean # reset the index DB (keeps the skill)
codebase-index clean --all # wipe the whole .claude/cache/codebase-index/ dir
# Or manually: rm -rf .claude/cache/codebase-index/
codebase-index index
```
Expand Down
Binary file added assets/demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/social-preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ upward to nearest `.git`/`.claude`), and `--quiet`. Search-family commands accep
| `explain` | `"<query>"`, `--token-budget` | 0 | intent-aware bundle |
| `stats` | — | 0 | counts, coverage %, freshness |
| `doctor` | `--strict` | non-zero if unsafe config found | findings list |
| `clean` | `--yes` | removes cache | confirmation |
| `clean` | `--yes`, `--all` | resets index DB (`--all` wipes cache dir) | removed-count |
| `watch` | `--debounce ms` | long-running | event log |

The skill only ever calls the **read-only** family (`search`, `symbol`, `refs`, `impact`,
Expand Down
58 changes: 47 additions & 11 deletions docs/BENCHMARKS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,28 @@
# Benchmarks

`codebase-index` has three benchmark surfaces.
`codebase-index` has three benchmark surfaces. Read them with their status in
mind — the whole point of this page is to keep evidence and aspiration separate.

| Surface | What it is | Status | Use it as |
|---|---|---|---|
| Public suite (`tests/benchmark_public.py`) | Deterministic synthetic multi-language fixture with the full metric framework | **Toy/synthetic** | CI regression gate + metric shape, **not** product-quality evidence |
| Smoke/perf (`test_perf_smoke.py`, `test_benchmark_comparison.py`) | Latency + output-size guards on a tiny fixture | **Toy/smoke** | Regression checks only |
| Honest real-repo (`tests/benchmark_honest.py`) | 55k LOC Java repo, recall@3 vs disciplined `rg` baseline, symmetric token accounting | **Proven (one repo)** | The only headline product-quality number we stand behind today |

### Claims that should NOT be made yet

Do not write, imply, or ship any of these until a run with published logs exists:

- Any 10k / 100k / 1M LOC scale or speed claim (no real run at that size).
- "Beats Cursor / Sourcegraph / Codebase-Memory MCP" — no head-to-head exists.
- Per-language quality claims beyond Java (the honest run is Java-only).
- Generic "Nx faster" / "Nx fewer tokens" without naming the baseline and repo.
- Latency claims — the honest run explicitly does not headline latency
(Python process start dominates; real `rg` is tens of ms).

The defensible headline today is exactly: **on one 55k LOC Java repo, recall@3 was
70% (index) vs 40% (`rg`+window), using ~13× fewer answer tokens.** Everything
else is roadmap.

## Public benchmark suite

Expand Down Expand Up @@ -67,13 +89,27 @@ objective recall@3 ground truth.
latency and output-size behavior. They are useful regression checks, not product
quality evidence.

## Remaining benchmark work

The public suite now has the metric framework, but the next step is adding
larger public or documented external repositories:

- 10k, 100k, and 1M LOC scale targets
- More real-world Python, TypeScript, Java, Go, Rust, C#, PHP repos
- Agent answer grading with human-reviewed expected answers
- Comparisons against repo-map style context and vanilla agent exploration
- Framework graph tasks: route -> handler -> service -> DB, migrations, config consumers, CI/infra
## Remaining benchmark work (TODO checklist)

The public suite has the metric framework; the next step is real, larger,
documented repositories. Each task must publish raw logs alongside any headline
number (the pattern set by `tests/benchmark_honest_RESULTS.md`).

- [ ] **10k LOC public repo** — Recall@1/3/5, MRR, nDCG, token economy; named repo + commit SHA.
- [ ] **100k LOC public repo** — same metrics, plus full index build time and incremental update latency.
- [ ] **1M LOC target** — feasibility + scale counters (files/symbols/edges/bytes); may be partial.
- [ ] **Multi-language repo** (≥3 Tier-A languages) — per-language recall and answer-correctness breakdown.
- [ ] **vs vanilla agent grep/read** — tokens and recall against an undisciplined agent exploring the same questions.
- [ ] **vs repo-map-style context** — tokens and recall against an Aider-repo-map-style context blob.
- [ ] **Graph task benchmark** — `refs`, `impact`, and route→handler→service paths against hand-labeled ground truth.
- [ ] **Answer grading** — human-reviewed expected answers, not just file-level recall proxies.
- [ ] **Framework graph tasks** — migrations, config consumers, CI/infra wiring once typed edges land.

How to add one without overclaiming:

1. Pick a public repo; record its URL and commit SHA.
2. Derive ground truth independently of the index (e.g. naming convention), so the
index cannot grade its own homework.
3. Use a symmetric token estimator and read window on both sides.
4. Commit the raw run output next to a short `*_RESULTS.md` summary.
5. Only then update README/COMPARISON headline numbers.
99 changes: 97 additions & 2 deletions docs/COMPARISON.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@ platform.
| codebase-index | Local CLI/skill/MCP retrieval for Claude Code, Codex CLI, OpenCode, and MCP clients | Broad framework-aware graph is still a roadmap item |
| Cursor indexing | Integrated AI IDE workflow | Proprietary and tied to Cursor |
| Aider repo-map | Aider chat sessions with compact repository context | Context map, not a reusable local search API |
| Sourcegraph Cody | Enterprise-scale code intelligence across many repos | Cloud/account setup and heavier platform surface |
| Serena / MCP tools | MCP-first local tool integration | Quality and schemas vary by server |
| Sourcegraph / Cody / Amp | Enterprise-scale code intelligence across many repos | Cloud/account setup and heavier platform surface |
| Continue | Open-source coding agent for IDE + CLI | An agent with context features, not a standalone retrieval index |
| Codebase-Memory MCP | Local graph-based code-memory over MCP | Broader/heavier graph engine; different simplicity/privacy tradeoffs |
| Manual grep/read | Exact ad hoc search | No ranking, graph, symbol contract, or token budgeting |

## Criteria
Expand Down Expand Up @@ -46,6 +47,100 @@ platform.
| Update model | Manual `index`/`update`, hooks, optional watcher | IDE-managed | Rebuilt as Aider manages context | Platform-managed | Varies | Always live but manual |
| Extensibility | CLI `--json`; MCP schema v1.0; SQLite local DB | Limited external contract | Aider internals/context | Sourcegraph APIs | MCP by design | Shell pipelines |

## When to choose what

Honest, per-tool guidance. None of these are attacks — each tool is good at the
job it was built for. The question is which layer you actually need.

### Manual grep / read

- **Good at:** exact string matching, zero setup, always live, universally
available. For a single known identifier in a small scope, nothing beats `rg`.
- **Where codebase-index differs:** ranking, symbol awareness (definition vs
call), graph expansion to related files, and token-budgeted line ranges instead
of every matching line.
- **Choose grep when:** you know the exact string, the repo is small, or you only
need one match.
- **Choose codebase-index when:** the question is conceptual ("where is auth
implemented?"), the repo is large, or an AI agent will pay for every irrelevant
line it reads.

### Cursor

- **Good at:** an integrated AI IDE with strong, low-friction codebase awareness
for people who work inside Cursor.
- **Where codebase-index differs:** it is a local, open retrieval layer for
**terminal and MCP** agents, offline by default, with no IDE lock-in and a
scriptable CLI/JSON/MCP contract.
- **Choose Cursor when:** you want an AI-native IDE and are comfortable with a
proprietary, IDE-centric workflow.
- **Choose codebase-index when:** your agent is Claude Code, Codex CLI, OpenCode,
or any MCP client in the terminal, and you want code to stay on your machine.

### Aider repo-map

- **Good at:** a compact, graph-ranked, token-budgeted repository map that feeds
Aider's chat context well. It is not "just grep" — it ranks with a graph
algorithm over source and dependencies.
- **Where codebase-index differs:** it is a reusable, queryable index rather than
context injection for one agent. CLI/JSON/MCP commands return ranked `file:line`
ranges, symbols, references, and `impact` that any shell-capable agent can
consume, with freshness checks and security/ignore gates.
- **Choose Aider repo-map when:** Aider is your agent and you want its built-in
context with nothing extra to run.
- **Choose codebase-index when:** you want one index shared across multiple agents
(Claude Code, Codex, OpenCode, MCP) with a stable, scriptable contract.

### Sourcegraph / Cody / Amp

- **Good at:** enterprise-grade, cross-repo code intelligence, search, and code
graph at organization scale, with mature platform features.
- **Where codebase-index differs:** single-repo, local, and lightweight — no
server, no account, no code leaving the machine by default. It is a retrieval
layer for an agent, not a platform.
- **Choose Sourcegraph/Cody/Amp when:** you need org-wide search across many
repositories, team features, and are fine with a hosted/account-based platform.
- **Choose codebase-index when:** you want per-repo retrieval for a terminal/MCP
agent with a strict local-first privacy model and minimal moving parts.

### Continue

- **Good at:** an open-source coding **agent** with IDE and CLI integrations and
built-in context features. It is a full assistant, not just an index.
- **Where codebase-index differs:** it is the **retrieval/index layer itself**,
not an agent. It exposes a CLI/JSON/MCP contract that an agent (including, in
principle, agents like Continue) can query, and it focuses on token-budgeted
packets and a strict privacy model rather than on being the chat surface.
- **Choose Continue when:** you want the agent — an open assistant to drive your
edits.
- **Choose codebase-index when:** you already have an agent and want to give it
precise, local, ranked codebase context.

### Codebase-Memory MCP

This is the closest direct alternative, so the comparison is the most careful.

- **Good at:** a broader graph engine with a static binary, wide language and
agent coverage, and more advanced graph features than codebase-index ships
today.
- **Where codebase-index differs — and we do not claim to beat it globally:**
- **Simplicity and safety:** a small pure-Python surface, a multi-gate exclusion
pipeline, output-time secret redaction, and a `doctor --strict` self-check.
- **Strict privacy model:** no telemetry, no network by default; external
embeddings are opt-in and gated three ways.
- **Token-budgeted retrieval packets:** ranked `file:line` ranges and
`recommended_reads` under an explicit budget, tuned for the Claude/Codex/
OpenCode workflow.
- **Transparency:** readable Python, 80% coverage gate, golden CLI snapshots,
and a public benchmark suite wired as a CI regression gate.
- **Honest benchmarks:** we publish raw logs (see the 55k LOC Java run) and mark
unproven scale/graph claims as roadmap.
- **Choose Codebase-Memory MCP when:** you need its broader graph engine,
static-binary distribution, or wider language/agent reach today.
- **Choose codebase-index when:** you want a simpler, privacy-strict, transparent
retrieval layer tuned for terminal AI agents with token-budgeted output and
benchmarks you can audit.

## Aider repo-map clarification

Aider repo-map should not be described as "just grep" or as lacking ranking.
Expand Down
Loading
Loading