denfry · denfry · Jun 12, 2026 · Jun 12, 2026
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -10,7 +10,13 @@
   "repository": "https://github.com/denfry/codebase-index",
   "license": "MIT",
   "keywords": [
+    "claude-code",
     "code-search",
+    "semantic-code-search",
+    "codebase-index",
+    "mcp",
+    "ai-agents",
+    "local-first",
     "tree-sitter",
     "rag",
     "sqlite",

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,33 @@ All notable changes to this project are documented here. The format is based on
 
 ## [Unreleased]
 
+### Added
+- **`clean` is now implemented** (it was a documented-but-stubbed `_todo` since M0).
+  `codebase-index clean` resets the index database (`index.sqlite` + WAL/SHM
+  sidecars); `codebase-index clean --all` wipes the whole per-project cache
+  directory. It prompts before deleting (skip with `--yes`), supports `--json`,
+  and never touches the installed skill. Locked in by `tests/test_clean_cli.py`.
+- **`docs/PRODUCT_UPGRADE_PLAN.md`**: positioning, target users, competitor matrix,
+  differentiators, current weaknesses, a ranked roadmap, and documentation /
+  benchmark / distribution / technical task lists.
+- **`docs/RELEASE_CHECKLIST.md`**: a repeatable release checklist (version sync,
+  tests, benchmarks, doctor, install/plugin/MCP smoke, changelog) with signed
+  checksums + SBOM tracked as future hardening.
+
+### Changed
+- **README**: added "Who Is It For?" and a "How Is This Different?" section that
+  answers why-not-grep / Cursor / Aider repo-map / Sourcegraph / Codebase-Memory
+  MCP on the first screen, plus a proven-today-vs-roadmap table.
+- **`docs/COMPARISON.md`**: explicit rows and "choose them when / choose us when"
+  guidance for Continue, Sourcegraph/Cody/Amp, and Codebase-Memory MCP.
+- **`docs/BENCHMARKS.md`**: a status table separating proven / toy / honest
+  surfaces, an explicit "claims that should NOT be made yet" list, and a
+  TODO-friendly benchmark task checklist with a no-overclaim procedure.
+
+### Fixed
+- `docs/FAQ.md`: removed a dangling/duplicated sentence in "Is it
+  production-ready?" and documented the real `clean` / `clean --all` behavior.
+
 ## [1.3.0] - 2026-06-09
 
 ### Added

diff --git a/README.md b/README.md
@@ -17,6 +17,11 @@ references without scanning an entire repository.
 [![SQLite](https://img.shields.io/badge/database-SQLite-blue.svg)](docs/DATABASE_SCHEMA.md)
 [![Tree-sitter](https://img.shields.io/badge/parsing-Tree--sitter-orange.svg)](docs/ARCHITECTURE.md)
 
+<p align="center">
+  <img src="assets/demo.png" width="820"
+       alt="codebase-index ranking a local search for 'where is user authentication implemented?' into scored files with recommended file:line ranges to read">
+</p>
+
 ## What Is codebase-index?
 
 **codebase-index is a private, offline retrieval layer for AI code search.** It
@@ -27,6 +32,24 @@ an AI coding agent can read instead of opening broad file sets.
 Use it when you want Cursor-like codebase awareness in terminal-based AI tools
 while keeping source code, snippets, and search metadata on your machine.
 
+> **codebase-index is not an IDE and not a coding agent.** It is the local
+> retrieval/index layer that gives terminal and MCP-based AI agents precise
+> codebase context. The agent stays your interface; this gives it better aim.
+
+## Who Is It For?
+
+- **Claude Code / Codex CLI / OpenCode users** on medium-to-large repos who want
+  the agent to read 3 ranked files instead of grepping and scanning 60.
+- **Privacy-constrained teams** (proprietary or regulated code) who cannot send
+  source to a cloud code-intelligence service.
+- **MCP power users** who want a stable, queryable code index as a tool, not a
+  black box baked into one agent's prompt.
+- **Tooling authors** who need scriptable retrieval (`--json`, SQLite, MCP) that
+  other tools can build on.
+
+Not for you if you want a full IDE, org-scale multi-repo search, or a hosted
+platform — use Cursor or Sourcegraph for those.
+
 ## Start Here
 
 If you are opening this repository for the first time, follow this order:
@@ -145,6 +168,61 @@ Developers get Cursor-like codebase awareness in Claude Code, Codex CLI, and
 OpenCode without leaving the terminal or sending code to a remote indexing
 service.
 
+## How Is This Different?
+
+Short answers to the questions people actually ask. The full, honest matrix —
+including when you should pick the other tool — is in
+[docs/COMPARISON.md](docs/COMPARISON.md).
+
+- **Why not just `grep`/`rg`?** Grep returns every match with no ranking, no
+  symbol awareness, and no idea which files relate. codebase-index ranks results,
+  knows a definition from a call, expands along the dependency graph, and returns
+  specific line ranges under a token budget — so the agent reads less and answers
+  with citations.
+- **Why not Cursor?** Cursor is a great AI IDE with strong codebase awareness, but
+  it is proprietary and IDE-centric. codebase-index is a local, open retrieval
+  layer for **terminal and MCP** agents, offline by default, with no IDE lock-in.
+  If you live inside Cursor, keep using Cursor.
+- **Why not Aider repo-map?** Aider's repo-map is a good graph-ranked,
+  token-budgeted context map — but it is optimized to feed Aider's own chat.
+  codebase-index is a **reusable, queryable index**: CLI/JSON/MCP commands return
+  ranked `file:line` ranges, symbols, references, and impact that *any*
+  shell-capable agent can consume, with freshness and security gates.
+- **Why not Sourcegraph / Cody / Amp?** They are excellent enterprise-grade,
+  cross-repo code intelligence platforms. They are also heavier and
+  account/platform-oriented. codebase-index is single-repo, local, and
+  lightweight — no server, no account, no code leaving the machine by default.
+- **Why not Codebase-Memory MCP?** It is the closest direct alternative — a
+  broader graph engine with a static binary and wide language/agent coverage. We
+  do **not** claim to beat it globally. We differentiate on simplicity, a strict
+  privacy model, token-budgeted retrieval packets, a transparent Python
+  implementation, the Claude/Codex/OpenCode workflow, and honest benchmarks. If
+  you need its broader graph and language reach today, choose it.
+
+**What makes it trustworthy?** No telemetry, no network by default, a multi-gate
+exclusion pipeline (secrets/binaries/generated/dependencies never indexed),
+output-time secret redaction, a `doctor --strict` safety self-check, and a
+public benchmark suite wired as a CI regression gate. Claims that aren't proven
+in this repo are marked as roadmap, not done.
+
+### Proven today vs. roadmap
+
+| Capability | Status |
+|---|---|
+| Hybrid retrieval (path + symbol + FTS5 + graph), token-budgeted packets | ✅ Shipped |
+| Tree-sitter symbols for 12 Tier-A languages + Tier-B generic path | ✅ Shipped |
+| Import/call/reference/inheritance graph, `refs`/`impact` | ✅ Shipped |
+| Optional local embeddings; external embeddings gated 3 ways | ✅ Shipped |
+| stdio MCP server; CLI/skill/MCP share one service layer | ✅ Shipped |
+| Honest 55k LOC Java benchmark (recall@3 70% vs 40% `rg`, ~13× fewer tokens) | ✅ Shipped |
+| 10k/100k/1M LOC public-repo benchmarks | 🚧 Roadmap |
+| Framework-aware typed edges (route→handler→service→model) | 🚧 Roadmap |
+| PyPI / `uvx` / Homebrew, signed checksums, SBOM | 🚧 Roadmap |
+| Verified per-client MCP docs, paged/progressive results | 🚧 Roadmap |
+
+See [docs/PRODUCT_UPGRADE_PLAN.md](docs/PRODUCT_UPGRADE_PLAN.md) for the full
+upgrade plan and ranked roadmap.
+
 ## How Does codebase-index Work?
 
 `codebase-index` builds a local hybrid index that combines:
@@ -537,7 +615,8 @@ Yes. The CLI is agent-agnostic. Any agent that can run shell commands can use
 ### How do I reset the index?
 
 ```bash
-codebase-index clean
+codebase-index clean          # reset the index DB (keeps the skill)
+codebase-index clean --all    # wipe the whole .claude/cache/codebase-index/ dir
 # Or manually: rm -rf .claude/cache/codebase-index/
 codebase-index index
 ```

diff --git a/assets/demo.png b/assets/demo.png
diff --git a/assets/social-preview.png b/assets/social-preview.png
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
@@ -153,7 +153,7 @@ upward to nearest `.git`/`.claude`), and `--quiet`. Search-family commands accep
 | `explain` | `"<query>"`, `--token-budget` | 0 | intent-aware bundle |
 | `stats` | — | 0 | counts, coverage %, freshness |
 | `doctor` | `--strict` | non-zero if unsafe config found | findings list |
-| `clean` | `--yes` | removes cache | confirmation |
+| `clean` | `--yes`, `--all` | resets index DB (`--all` wipes cache dir) | removed-count |
 | `watch` | `--debounce ms` | long-running | event log |
 
 The skill only ever calls the **read-only** family (`search`, `symbol`, `refs`, `impact`,

diff --git a/docs/BENCHMARKS.md b/docs/BENCHMARKS.md
@@ -1,6 +1,28 @@
 # Benchmarks
 
-`codebase-index` has three benchmark surfaces.
+`codebase-index` has three benchmark surfaces. Read them with their status in
+mind — the whole point of this page is to keep evidence and aspiration separate.
+
+| Surface | What it is | Status | Use it as |
+|---|---|---|---|
+| Public suite (`tests/benchmark_public.py`) | Deterministic synthetic multi-language fixture with the full metric framework | **Toy/synthetic** | CI regression gate + metric shape, **not** product-quality evidence |
+| Smoke/perf (`test_perf_smoke.py`, `test_benchmark_comparison.py`) | Latency + output-size guards on a tiny fixture | **Toy/smoke** | Regression checks only |
+| Honest real-repo (`tests/benchmark_honest.py`) | 55k LOC Java repo, recall@3 vs disciplined `rg` baseline, symmetric token accounting | **Proven (one repo)** | The only headline product-quality number we stand behind today |
+
+### Claims that should NOT be made yet
+
+Do not write, imply, or ship any of these until a run with published logs exists:
+
+- Any 10k / 100k / 1M LOC scale or speed claim (no real run at that size).
+- "Beats Cursor / Sourcegraph / Codebase-Memory MCP" — no head-to-head exists.
+- Per-language quality claims beyond Java (the honest run is Java-only).
+- Generic "Nx faster" / "Nx fewer tokens" without naming the baseline and repo.
+- Latency claims — the honest run explicitly does not headline latency
+  (Python process start dominates; real `rg` is tens of ms).
+
+The defensible headline today is exactly: **on one 55k LOC Java repo, recall@3 was
+70% (index) vs 40% (`rg`+window), using ~13× fewer answer tokens.** Everything
+else is roadmap.
 
 ## Public benchmark suite
 
@@ -67,13 +89,27 @@ objective recall@3 ground truth.
 latency and output-size behavior. They are useful regression checks, not product
 quality evidence.
 
-## Remaining benchmark work
-
-The public suite now has the metric framework, but the next step is adding
-larger public or documented external repositories:
-
-- 10k, 100k, and 1M LOC scale targets
-- More real-world Python, TypeScript, Java, Go, Rust, C#, PHP repos
-- Agent answer grading with human-reviewed expected answers
-- Comparisons against repo-map style context and vanilla agent exploration
-- Framework graph tasks: route -> handler -> service -> DB, migrations, config consumers, CI/infra
+## Remaining benchmark work (TODO checklist)
+
+The public suite has the metric framework; the next step is real, larger,
+documented repositories. Each task must publish raw logs alongside any headline
+number (the pattern set by `tests/benchmark_honest_RESULTS.md`).
+
+- [ ] **10k LOC public repo** — Recall@1/3/5, MRR, nDCG, token economy; named repo + commit SHA.
+- [ ] **100k LOC public repo** — same metrics, plus full index build time and incremental update latency.
+- [ ] **1M LOC target** — feasibility + scale counters (files/symbols/edges/bytes); may be partial.
+- [ ] **Multi-language repo** (≥3 Tier-A languages) — per-language recall and answer-correctness breakdown.
+- [ ] **vs vanilla agent grep/read** — tokens and recall against an undisciplined agent exploring the same questions.
+- [ ] **vs repo-map-style context** — tokens and recall against an Aider-repo-map-style context blob.
+- [ ] **Graph task benchmark** — `refs`, `impact`, and route→handler→service paths against hand-labeled ground truth.
+- [ ] **Answer grading** — human-reviewed expected answers, not just file-level recall proxies.
+- [ ] **Framework graph tasks** — migrations, config consumers, CI/infra wiring once typed edges land.
+
+How to add one without overclaiming:
+
+1. Pick a public repo; record its URL and commit SHA.
+2. Derive ground truth independently of the index (e.g. naming convention), so the
+   index cannot grade its own homework.
+3. Use a symmetric token estimator and read window on both sides.
+4. Commit the raw run output next to a short `*_RESULTS.md` summary.
+5. Only then update README/COMPARISON headline numbers.
diff --git a/docs/COMPARISON.md b/docs/COMPARISON.md
@@ -14,8 +14,9 @@ platform.
 | codebase-index | Local CLI/skill/MCP retrieval for Claude Code, Codex CLI, OpenCode, and MCP clients | Broad framework-aware graph is still a roadmap item |
 | Cursor indexing | Integrated AI IDE workflow | Proprietary and tied to Cursor |
 | Aider repo-map | Aider chat sessions with compact repository context | Context map, not a reusable local search API |
-| Sourcegraph Cody | Enterprise-scale code intelligence across many repos | Cloud/account setup and heavier platform surface |
-| Serena / MCP tools | MCP-first local tool integration | Quality and schemas vary by server |
+| Sourcegraph / Cody / Amp | Enterprise-scale code intelligence across many repos | Cloud/account setup and heavier platform surface |
+| Continue | Open-source coding agent for IDE + CLI | An agent with context features, not a standalone retrieval index |
+| Codebase-Memory MCP | Local graph-based code-memory over MCP | Broader/heavier graph engine; different simplicity/privacy tradeoffs |
 | Manual grep/read | Exact ad hoc search | No ranking, graph, symbol contract, or token budgeting |
 
 ## Criteria
@@ -46,6 +47,100 @@ platform.
 | Update model | Manual `index`/`update`, hooks, optional watcher | IDE-managed | Rebuilt as Aider manages context | Platform-managed | Varies | Always live but manual |
 | Extensibility | CLI `--json`; MCP schema v1.0; SQLite local DB | Limited external contract | Aider internals/context | Sourcegraph APIs | MCP by design | Shell pipelines |
 
+## When to choose what
+
+Honest, per-tool guidance. None of these are attacks — each tool is good at the
+job it was built for. The question is which layer you actually need.
+
+### Manual grep / read
+
+- **Good at:** exact string matching, zero setup, always live, universally
+  available. For a single known identifier in a small scope, nothing beats `rg`.
+- **Where codebase-index differs:** ranking, symbol awareness (definition vs
+  call), graph expansion to related files, and token-budgeted line ranges instead
+  of every matching line.
+- **Choose grep when:** you know the exact string, the repo is small, or you only
+  need one match.
+- **Choose codebase-index when:** the question is conceptual ("where is auth
+  implemented?"), the repo is large, or an AI agent will pay for every irrelevant
+  line it reads.
+
+### Cursor
+
+- **Good at:** an integrated AI IDE with strong, low-friction codebase awareness
+  for people who work inside Cursor.
+- **Where codebase-index differs:** it is a local, open retrieval layer for
+  **terminal and MCP** agents, offline by default, with no IDE lock-in and a
+  scriptable CLI/JSON/MCP contract.
+- **Choose Cursor when:** you want an AI-native IDE and are comfortable with a
+  proprietary, IDE-centric workflow.
+- **Choose codebase-index when:** your agent is Claude Code, Codex CLI, OpenCode,
+  or any MCP client in the terminal, and you want code to stay on your machine.
+
+### Aider repo-map
+
+- **Good at:** a compact, graph-ranked, token-budgeted repository map that feeds
+  Aider's chat context well. It is not "just grep" — it ranks with a graph
+  algorithm over source and dependencies.
+- **Where codebase-index differs:** it is a reusable, queryable index rather than
+  context injection for one agent. CLI/JSON/MCP commands return ranked `file:line`
+  ranges, symbols, references, and `impact` that any shell-capable agent can
+  consume, with freshness checks and security/ignore gates.
+- **Choose Aider repo-map when:** Aider is your agent and you want its built-in
+  context with nothing extra to run.
+- **Choose codebase-index when:** you want one index shared across multiple agents
+  (Claude Code, Codex, OpenCode, MCP) with a stable, scriptable contract.
+
+### Sourcegraph / Cody / Amp
+
+- **Good at:** enterprise-grade, cross-repo code intelligence, search, and code
+  graph at organization scale, with mature platform features.
+- **Where codebase-index differs:** single-repo, local, and lightweight — no
+  server, no account, no code leaving the machine by default. It is a retrieval
+  layer for an agent, not a platform.
+- **Choose Sourcegraph/Cody/Amp when:** you need org-wide search across many
+  repositories, team features, and are fine with a hosted/account-based platform.
+- **Choose codebase-index when:** you want per-repo retrieval for a terminal/MCP
+  agent with a strict local-first privacy model and minimal moving parts.
+
+### Continue
+
+- **Good at:** an open-source coding **agent** with IDE and CLI integrations and
+  built-in context features. It is a full assistant, not just an index.
+- **Where codebase-index differs:** it is the **retrieval/index layer itself**,
+  not an agent. It exposes a CLI/JSON/MCP contract that an agent (including, in
+  principle, agents like Continue) can query, and it focuses on token-budgeted
+  packets and a strict privacy model rather than on being the chat surface.
+- **Choose Continue when:** you want the agent — an open assistant to drive your
+  edits.
+- **Choose codebase-index when:** you already have an agent and want to give it
+  precise, local, ranked codebase context.
+
+### Codebase-Memory MCP
+
+This is the closest direct alternative, so the comparison is the most careful.
+
+- **Good at:** a broader graph engine with a static binary, wide language and
+  agent coverage, and more advanced graph features than codebase-index ships
+  today.
+- **Where codebase-index differs — and we do not claim to beat it globally:**
+  - **Simplicity and safety:** a small pure-Python surface, a multi-gate exclusion
+    pipeline, output-time secret redaction, and a `doctor --strict` self-check.
+  - **Strict privacy model:** no telemetry, no network by default; external
+    embeddings are opt-in and gated three ways.
+  - **Token-budgeted retrieval packets:** ranked `file:line` ranges and
+    `recommended_reads` under an explicit budget, tuned for the Claude/Codex/
+    OpenCode workflow.
+  - **Transparency:** readable Python, 80% coverage gate, golden CLI snapshots,
+    and a public benchmark suite wired as a CI regression gate.
+  - **Honest benchmarks:** we publish raw logs (see the 55k LOC Java run) and mark
+    unproven scale/graph claims as roadmap.
+- **Choose Codebase-Memory MCP when:** you need its broader graph engine,
+  static-binary distribution, or wider language/agent reach today.
+- **Choose codebase-index when:** you want a simpler, privacy-strict, transparent
+  retrieval layer tuned for terminal AI agents with token-budgeted output and
+  benchmarks you can audit.
+
 ## Aider repo-map clarification
 
 Aider repo-map should not be described as "just grep" or as lacking ranking.