Neverdecel · Neverdecel · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
@@ -43,7 +43,7 @@ jobs:
           enable-cache: true
 
       - name: Install
-        run: uv pip install --system -e ".[dev,server,ui,openai]"
+        run: uv pip install --system -e ".[dev,server,ui,openai,mcp]"
 
       - name: Lint (ruff)
         run: ruff check .

diff --git a/AGENTS.md b/AGENTS.md
@@ -8,7 +8,7 @@
 - `coderag/store/`: `sqlite_store.py` (source of truth + FTS5) and `vector_index.py` (FAISS Flat/IVF cache).
 - `coderag/retrieval/`: Hybrid dense + BM25 search fused with RRF.
 - `coderag/indexer.py`, `coderag/watch.py`: Incremental indexing and the debounced watcher.
-- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `webui.py` — thin adapters over the facade.
+- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `webui.py`, `mcp_server.py` (MCP, for AI agents) — thin adapters over the facade.
 - `tests/`: pytest suite (offline by default via the `fake` provider; real model behind `-m integration`).
 - `example.env` → copy to `.env`; CI lives in `.github/`.
 

diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -30,7 +30,7 @@ coderag/
 │   ├── sqlite_store.py #   files/chunks/vectors + FTS5 lexical search
 │   └── vector_index.py #   FaissVectorIndex: Flat (exact) / IVF (scale)
 ├── retrieval/          # Hybrid search: dense + BM25, fused with RRF
-└── surfaces/           # cli.py · http_api.py (FastAPI) · webui.py
+└── surfaces/           # cli.py · http_api.py (FastAPI) · webui.py · mcp_server.py (MCP)
 ```
 
 ### Design invariants (don't break these)
@@ -41,9 +41,14 @@ coderag/
 - **`chunks.id` is the FAISS id and is `AUTOINCREMENT`** — ids are never reused, which keeps
   a stale cache from resurrecting deleted content.
 - **Delete-before-add.** A changed file's old chunks are removed from both SQLite and FAISS
-  before new ones are added (`Indexer._index_file`). This is the bug the old `monitor.py` had.
+  before new ones are added (`Indexer._write`). This is the bug the old `monitor.py` had.
 - **The embedding dimension comes from the provider**, never a hard-coded constant. A model
   change is detected via `meta.embed_dim` and triggers a clean rebuild.
+- **Writes serialize; reads don't block.** All indexing/deletion goes through one lock on the
+  `CodeRAG` facade (`_index_lock`), and `FaissVectorIndex` guards its own add/remove/search/
+  rebuild — so the MCP server's background index and live watcher run safely alongside
+  concurrent agent searches. Indexing may parallelize chunk+embed across `index_workers`
+  threads, but the SQLite/FAISS writes stay single-writer (`Indexer._write`).
 
 ## Quality gate
 

diff --git a/README.md b/README.md
@@ -21,12 +21,40 @@ files that matter — ranked by meaning, not just string match.
 It runs **entirely on your machine with no API key** (a local ONNX embedding model is the
 default), keeps its index **up to date as you edit**, and is built to stay fast on **large
 codebases**. Use it from the **CLI**, embed it as a **Python library**, self-host it as an
-**HTTP service**, or browse with the **web UI**.
+**HTTP service**, browse with the **web UI**, or plug it into an **AI coding agent over MCP**
+so it searches a warm index instead of grepping.
 
 > Built for the cases off-the-shelf IDE assistants don't cover well: a codebase that's too
 > big, too private, or too custom — or a search/RAG capability you want to own and embed in
 > your own tools.
 
+## ⚡ Find the right code in one call — not a grep loop
+
+Coding agents like Claude Code and Codex locate code by *running searches* — grep, glob, read,
+repeat — which burns tokens and round-trips and reduces to literal keyword matching. CodeRAG
+turns the workspace into a **warm, pre-indexed** engine: a single query returns the right
+functions and files ranked by **meaning *and* keyword**, with exact `path:line` citations. The
+embedding model loads once, so each query is one in-process lookup (FAISS + BM25 + fusion), not
+a multi-round shell loop — and over MCP (`coderag mcp`, below) it becomes the agent's search tool.
+
+**Proof from the eval harness** — this repo's 24 natural-language → file queries (90 files /
+553 chunks), local `bge-small`, one warm query each (reproduce with `coderag eval --compare
+--dataset coderag/eval/datasets/coderag_self.jsonl`):
+
+| retrieval | MRR | R@1 | R@5 | Hit@10 |
+| --- | :---: | :---: | :---: | :---: |
+| BM25 — ranked keyword search (already stronger than raw grep) | 0.751 | 0.604 | 0.854 | 1.000 |
+| dense — semantic only | 0.784 | 0.604 | 0.938 | 1.000 |
+| **hybrid — CodeRAG's default** | **0.822** | **0.688** | **1.000** | **1.000** |
+
+Hybrid puts a relevant file in the **top-5 for every query** and ranks it **#1 ≈69%** of the
+time — beating the ranked-keyword search a grep-based agent leans on (raw grep is weaker still:
+unranked literal match) by adding semantic understanding on top. To measure the *latency* and
+*token-cost* gap against an actual grep loop on your own repo, run
+[`scripts/bench_vs_grep.py`](scripts/bench_vs_grep.py). The fuller story — symbol-level
+localization, the reranker (**+55% R@1** where there's headroom), multi-repo generalization,
+and the honest caveats — is in [`docs/eval.md`](docs/eval.md).
+
 ---
 
 ## ✨ Highlights
@@ -35,10 +63,11 @@ codebases**. Use it from the **CLI**, embed it as a **Python library**, self-hos
 - **Bring your own model platform.** Built for self-hosted and local models first (any OpenAI-compatible server — Ollama, vLLM, LM Studio, LocalAI), with first-class **OpenAI API** and **Anthropic API** support when you want it.
 - **Symbol-aware chunking.** Indexes *functions, classes, and methods* (Python via `ast`; JS/TS/Go/Rust/Java via tree-sitter), not crude fixed-size blocks — so results point at real code units with `file:line` citations.
 - **Hybrid retrieval, with optional reranking.** Dense vector search **+** BM25 keyword search, fused with Reciprocal Rank Fusion — great at both "what does this *mean*" and exact-identifier lookups. Add an optional local **cross-encoder reranker** (two-stage retrieve-then-rerank, `CODERAG_RERANK=1`, no API key) to sharpen the top results.
+- **Drop-in for AI coding agents.** An **MCP server** (`coderag mcp`) lets Claude Code, Codex, and Cursor search a warm, pre-indexed workspace instead of slow grep/glob/read loops — ranked `path:line` results from a single call, with the index kept live as you edit. Works on a plain file directory too, not just code.
 - **Measured, not guessed.** A built-in **evaluation harness** (`coderag eval`) scores retrieval quality — recall@k, MRR, nDCG@k at file *or* symbol level — and can mine a benchmark straight from your git history. Every default (1:1 hybrid, reranker opt-in, adaptive fusion off) is the choice the harness validated, including across an external repo.
 - **Incremental & live.** Content-hashed indexing only re-embeds files that changed; a debounced watcher keeps the index current as you code. No duplicate or stale vectors.
 - **Built to scale.** Exact `Flat` search for small repos, automatic switch to approximate `IVF` past a threshold so it stays fast at 100k+ chunks.
-- **Four surfaces, one engine.** CLI · Python library · HTTP/REST · web UI — all thin wrappers over the same `CodeRAG` object.
+- **Five surfaces, one engine.** CLI · Python library · HTTP/REST · web UI · MCP server — all thin wrappers over the same `CodeRAG` object.
 
 ## 🚀 Quick start
 
@@ -47,6 +76,7 @@ pip install -e .            # core engine (local embeddings included)
 # optional extras:
 pip install -e ".[server]"     # HTTP/REST API
 pip install -e ".[ui]"         # built-in web UI (FastAPI + Jinja + Pygments)
+pip install -e ".[mcp]"        # MCP server for AI coding agents (Claude Code, Codex, Cursor)
 pip install -e ".[openai]"     # OpenAI (or self-hosted OpenAI-compatible) embeddings / answers
 pip install -e ".[anthropic]"  # Anthropic (Claude) LLM answers
 pip install -e ".[all]"        # everything above
@@ -69,7 +99,7 @@ coderag search "where are duplicate vectors removed on file change" --watched-di
 By default the index lives in `./.coderag/`. Set `CODERAG_WATCHED_DIR` / `CODERAG_STORE_DIR`
 (or copy `example.env` to `.env`) to avoid repeating flags.
 
-## 🧑‍💻 The four surfaces
+## 🧑‍💻 The surfaces
 
 ### CLI
 
@@ -79,6 +109,7 @@ coderag search "QUERY" [-k 8]     # hybrid search; add --json or --answer
 coderag watch                     # index, then keep it live as files change
 coderag serve --port 8000         # run the HTTP API  (needs [server])
 coderag ui                        # launch the web UI (needs [ui])
+coderag mcp                       # MCP server for AI agents (needs [mcp]); --all-text for any dir
 coderag status                    # index stats (files, chunks, model, index type)
 coderag eval --dataset d.jsonl --compare  # retrieval quality: dense vs BM25 vs hybrid
 ```
@@ -131,6 +162,57 @@ browser**, index status, a one-click **Reindex**, and an optional streamed LLM a
 enhanced — every page works with JavaScript disabled, and there's no CDN/runtime network
 dependency, so it stays local-first.
 
+### MCP — let an AI coding agent search instead of grepping  (`coderag mcp`)
+
+Tools like Claude Code and Codex locate code with iterative `grep`/`glob`/read loops. CodeRAG
+exposes the same workspace as a **Model Context Protocol** server, so an agent gets fast,
+ranked `path:line` results from a single call against a **warm, pre-indexed** workspace — the
+embedding model loads once and every query is then one in-process lookup (FAISS + BM25 +
+fusion), not a multi-round shell search.
+
+```bash
+pip install -e ".[mcp]"
+coderag mcp                 # index the current dir, keep it live, serve over stdio
+coderag mcp --all-text      # index ALL text files (docs/notes/config), not just code
+```
+
+It auto-indexes the working directory on startup (in the **background**, so it's responsive
+immediately) and keeps the index live with the watcher — zero manual steps. Tools exposed:
+**`search_code`** (hybrid search, compact snippets + `path:line`), **`get_file`** (read a
+precise range of an indexed file), **`index_status`** (coverage/freshness), and **`reindex`**.
+
+Wire it into an agent (the server defaults to the directory it's launched in):
+
+```bash
+# Claude Code
+claude mcp add coderag -- coderag mcp
+```
+
+```jsonc
+// Cursor: .cursor/mcp.json  —  or Claude Code: .mcp.json  (at the repo root)
+{ "mcpServers": { "coderag": { "command": "coderag", "args": ["mcp"] } } }
+```
+
+```toml
+# Codex: ~/.codex/config.toml
+[mcp_servers.coderag]
+command = "coderag"
+args = ["mcp"]
+```
+
+> If `coderag` isn't on the launcher's PATH, use an absolute path (or `python -m coderag.surfaces.cli mcp`).
+> To index a directory other than where the client launches, add `"--watched-dir", "/abs/path"` to `args`.
+> Fast by default (local `bge-small`, no reranker); set `CODERAG_RERANK=1` to trade ~30 ms/query for sharper top results.
+
+**Why bother? Measure it.** [`scripts/bench_vs_grep.py`](scripts/bench_vs_grep.py) scores
+indexed search against a raw grep baseline on the same eval dataset — accuracy
+(recall@k / nDCG@k / MRR via the eval harness), latency per query, and approximate context
+tokens (compact chunks vs reading whole files):
+
+```bash
+python scripts/bench_vs_grep.py --watched-dir . --dataset coderag/eval/datasets/coderag_self.jsonl
+```
+
 ## 🐳 Docker (beta)
 
 Prebuilt **multi-arch** images (`linux/amd64` + `linux/arm64`) are published to GHCR on
@@ -228,18 +310,25 @@ Everything is configurable via `CODERAG_*` environment variables or a `.env` fil
 | `CODERAG_ANTHROPIC_MODEL` | `claude-opus-4-8` | Anthropic chat model for answers |
 | `CODERAG_API_KEY` | – | If set, the HTTP API **requires** it (`Authorization: Bearer <key>` or `X-API-Key`). Set whenever the server is reachable beyond localhost. |
 | `CODERAG_CORS_ORIGINS` | – | Comma-separated CORS allowlist for the HTTP API (never `*`). Empty ⇒ no cross-origin browser access. |
+| `CODERAG_WORKERS` | `4` | Worker threads for chunking + embedding during indexing (`1` = serial). |
+| `CODERAG_INDEX_ALL_TEXT` | `false` | Index any UTF-8 text file (docs/config/extensionless), not just code — turns a plain directory into a searchable workspace. Binary files are always skipped. |
+| `CODERAG_MCP_AUTO_INDEX` | `true` | MCP server indexes the watched dir on startup (in the background). |
+| `CODERAG_MCP_WATCH` | `true` | MCP server keeps the index live via the filesystem watcher. |
+| `CODERAG_MCP_SNIPPET_LINES` | `12` | Lines of a chunk returned in a `search_code` snippet by default. |
 
 ## 🧩 Supported languages
 
 Symbol-aware (function/class/method level): **Python, JavaScript, TypeScript/TSX, Go, Rust,
-Java**. Many other languages and docs (C/C++, Ruby, PHP, Markdown, YAML, …) are indexed with
-a line-window fallback, so they remain searchable.
+Java**. Many other languages and docs (C/C++, Ruby, PHP, Markdown, YAML, HTML/CSS, …) are
+indexed with a line-window fallback, so they remain searchable. Set `CODERAG_INDEX_ALL_TEXT=1`
+(or `coderag mcp --all-text`) to index **any** UTF-8 text file — including extensionless ones
+like `Dockerfile` — so a plain document/notes directory becomes searchable too, not just code.
 
 ## 🛠️ Development
 
 ```bash
 python -m venv venv && source venv/bin/activate
-pip install -e ".[dev,server,openai]"
+pip install -e ".[dev,server,ui,mcp,openai]"
 
 pytest -m "not integration"     # fast, offline (uses a deterministic fake embedder)
 pytest -m integration           # exercises the real local model (downloads once)

diff --git a/coderag/api.py b/coderag/api.py
@@ -8,6 +8,7 @@
 from __future__ import annotations
 
 import logging
+import threading
 from pathlib import Path
 from typing import TYPE_CHECKING, List, Optional, Union
 
@@ -38,6 +39,10 @@ def __init__(self, config: Optional[Config] = None) -> None:
         # Set when the store's embedding model/dim changed and the FAISS cache must
         # be rebuilt from scratch (consumed when the vector index is first opened).
         self._rebuild_required: bool = False
+        # Serializes all indexing/deletion so concurrent writers (the CLI, the HTTP
+        # surface, the MCP server's background index, and the live watcher) can't
+        # interleave a file's delete-before-add sequence. Reads (search) are unaffected.
+        self._index_lock = threading.Lock()
 
     # --- lazily constructed collaborators ---
 
@@ -116,7 +121,8 @@ def index(
         force a clean rebuild.
         """
         target = Path(path).expanduser() if path else self.config.watched_dir
-        return self.indexer.index(target, full=full)
+        with self._index_lock:
+            return self.indexer.index(target, full=full)
 
     def search(self, query: str, top_k: Optional[int] = None) -> List[SearchHit]:
         """Hybrid (dense + lexical) search over the indexed codebase."""
@@ -161,10 +167,11 @@ def delete_path(self, path: Union[str, Path]) -> int:
             rel = Path(path).resolve().relative_to(root).as_posix()
         except ValueError:
             return 0
-        removed = self.store.delete_file(rel)
-        if removed:
-            self.vectors.remove(removed)
-            self.vectors.save()
+        with self._index_lock:
+            removed = self.store.delete_file(rel)
+            if removed:
+                self.vectors.remove(removed)
+                self.vectors.save()
         return len(removed)
 
     def status(self) -> dict:

diff --git a/coderag/chunking/languages.py b/coderag/chunking/languages.py
@@ -49,12 +49,49 @@
     ".json": "json",
     ".cfg": "ini",
     ".ini": "ini",
+    # Common markup/web/config text — searchable in most repos, line-window chunked.
+    ".xml": "xml",
+    ".html": "html",
+    ".htm": "html",
+    ".css": "css",
+    ".scss": "scss",
+    ".less": "less",
+    ".vue": "vue",
+    ".svelte": "svelte",
+    ".properties": "properties",
+    ".gradle": "gradle",
 }
 
+# Well-known text files that have no (or an unconventional) extension. Matched on the
+# lowercased file *name* when the extension lookup misses.
+FILENAME_TO_LANGUAGE = {
+    "dockerfile": "dockerfile",
+    "makefile": "make",
+    "license": "text",
+    "notice": "text",
+    "readme": "text",
+    "codeowners": "text",
+    ".env": "text",
+    ".gitignore": "text",
+    ".dockerignore": "text",
+}
+
+
+def detect_language(path: str | Path, *, all_text: bool = False) -> Optional[str]:
+    """Return the language for ``path``, or ``None`` if it should not be indexed.
 
-def detect_language(path: str | Path) -> Optional[str]:
-    """Return the language for ``path``, or ``None`` if it should not be indexed."""
-    return EXTENSION_TO_LANGUAGE.get(Path(path).suffix.lower())
+    With ``all_text=True`` any unrecognized file is treated as plain ``"text"`` so a whole
+    directory (docs, notes, config) becomes searchable, not just code. Binary files are
+    still rejected later by the indexer's NUL-byte sniff, so this stays safe.
+    """
+    p = Path(path)
+    lang = EXTENSION_TO_LANGUAGE.get(p.suffix.lower())
+    if lang:
+        return lang
+    lang = FILENAME_TO_LANGUAGE.get(p.name.lower())
+    if lang:
+        return lang
+    return "text" if all_text else None
 
 
 def extensions_for(languages: Iterable[str]) -> List[str]:

diff --git a/coderag/config.py b/coderag/config.py
@@ -117,6 +117,12 @@ class Config:
     # --- What to index ---
     languages: Tuple[str, ...] = DEFAULT_LANGUAGES
     ignore_globs: Tuple[str, ...] = DEFAULT_IGNORE_GLOBS
+    # Index any UTF-8-decodable file as plain text, even with an unknown/absent extension
+    # (Dockerfile, Makefile, LICENSE, .log, ...). Off by default so code repos aren't
+    # polluted; turn on (CODERAG_INDEX_ALL_TEXT / `coderag mcp --all-text`) to make
+    # CodeRAG a general document/file-directory search engine. Binary files are still
+    # skipped (NUL-byte sniff in the indexer).
+    index_all_text: bool = False
     max_file_bytes: int = 1_000_000  # skip files larger than this
     max_chunk_lines: int = 200  # split oversized symbols into windows above this
     window_lines: int = 60  # fallback line-window size
@@ -182,6 +188,18 @@ class Config:
     # wildcard lets any website the user visits exfiltrate them via the browser.
     cors_origins: Tuple[str, ...] = ()
 
+    # --- MCP server surface (optional [mcp] surface) ---
+    # `coderag mcp` runs a persistent, warm process so the embedding model loads once and
+    # every query is fast — the win over an agent's cold, repeated grep/read loop. By
+    # default it indexes the watched dir on startup (in the background, so the server is
+    # responsive immediately) and keeps it live via the filesystem watcher, so an agent
+    # gets fresh results with zero manual steps.
+    mcp_auto_index: bool = True
+    mcp_watch: bool = True
+    # Lines of a chunk returned in a search_code snippet by default (the agent can request
+    # the full text, or fetch a precise range via get_file) — keeps responses token-cheap.
+    mcp_snippet_lines: int = 12
+
     # --- Demo mode (public, untrusted UI) ---
     # When on, the Streamlit UI shows a notice, hides the Reindex button, and limits
     # LLM answers per browser session. The per-session limit is soft (session-state
@@ -252,6 +270,12 @@ def from_env(cls, **overrides: object) -> "Config":
             ),
             api_key=os.getenv("CODERAG_API_KEY"),
             cors_origins=_env_tuple("CODERAG_CORS_ORIGINS", cls.cors_origins),
+            index_all_text=_env_bool("CODERAG_INDEX_ALL_TEXT", cls.index_all_text),
+            mcp_auto_index=_env_bool("CODERAG_MCP_AUTO_INDEX", cls.mcp_auto_index),
+            mcp_watch=_env_bool("CODERAG_MCP_WATCH", cls.mcp_watch),
+            mcp_snippet_lines=_env_int(
+                "CODERAG_MCP_SNIPPET_LINES", cls.mcp_snippet_lines
+            ),
             demo_mode=_env_bool("CODERAG_DEMO_MODE", cls.demo_mode),
             demo_max_answers=_env_int("CODERAG_DEMO_MAX_ANSWERS", cls.demo_max_answers),
             demo_cooldown_seconds=_env_int(