diff --git a/.github/workflows/ci-tests.yml b/.github/workflows/ci-tests.yml
index 97bca1d..ddd65ba 100644
--- a/.github/workflows/ci-tests.yml
+++ b/.github/workflows/ci-tests.yml
@@ -43,7 +43,7 @@ jobs:
           enable-cache: true
 
       - name: Install
-        run: uv pip install --system -e ".[dev,server,ui,openai]"
+        run: uv pip install --system -e ".[dev,server,ui,openai,mcp]"
 
       - name: Lint (ruff)
         run: ruff check .
diff --git a/AGENTS.md b/AGENTS.md
index e5c5fef..bf45334 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -8,7 +8,7 @@
 - `coderag/store/`: `sqlite_store.py` (source of truth + FTS5) and `vector_index.py` (FAISS Flat/IVF cache).
 - `coderag/retrieval/`: Hybrid dense + BM25 search fused with RRF.
 - `coderag/indexer.py`, `coderag/watch.py`: Incremental indexing and the debounced watcher.
-- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `webui.py` — thin adapters over the facade.
+- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `webui.py`, `mcp_server.py` (MCP, for AI agents) — thin adapters over the facade.
 - `tests/`: pytest suite (offline by default via the `fake` provider; real model behind `-m integration`).
 - `example.env` → copy to `.env`; CI lives in `.github/`.
 
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
index 99cb751..3be3ecc 100644
--- a/DEVELOPMENT.md
+++ b/DEVELOPMENT.md
@@ -30,7 +30,7 @@ coderag/
 │   ├── sqlite_store.py #   files/chunks/vectors + FTS5 lexical search
 │   └── vector_index.py #   FaissVectorIndex: Flat (exact) / IVF (scale)
 ├── retrieval/          # Hybrid search: dense + BM25, fused with RRF
-└── surfaces/           # cli.py · http_api.py (FastAPI) · webui.py
+└── surfaces/           # cli.py · http_api.py (FastAPI) · webui.py · mcp_server.py (MCP)
 ```
 
 ### Design invariants (don't break these)
@@ -41,9 +41,14 @@ coderag/
 - **`chunks.id` is the FAISS id and is `AUTOINCREMENT`** — ids are never reused, which keeps
   a stale cache from resurrecting deleted content.
 - **Delete-before-add.** A changed file's old chunks are removed from both SQLite and FAISS
-  before new ones are added (`Indexer._index_file`). This is the bug the old `monitor.py` had.
+  before new ones are added (`Indexer._write`). This is the bug the old `monitor.py` had.
 - **The embedding dimension comes from the provider**, never a hard-coded constant. A model
   change is detected via `meta.embed_dim` and triggers a clean rebuild.
+- **Writes serialize; reads don't block.** All indexing/deletion goes through one lock on the
+  `CodeRAG` facade (`_index_lock`), and `FaissVectorIndex` guards its own add/remove/search/
+  rebuild — so the MCP server's background index and live watcher run safely alongside
+  concurrent agent searches. Indexing may parallelize chunk+embed across `index_workers`
+  threads, but the SQLite/FAISS writes stay single-writer (`Indexer._write`).
 
 ## Quality gate
 
diff --git a/README.md b/README.md
index 8056174..ec8e815 100644
--- a/README.md
+++ b/README.md
@@ -21,12 +21,40 @@ files that matter — ranked by meaning, not just string match.
 It runs **entirely on your machine with no API key** (a local ONNX embedding model is the
 default), keeps its index **up to date as you edit**, and is built to stay fast on **large
 codebases**. Use it from the **CLI**, embed it as a **Python library**, self-host it as an
-**HTTP service**, or browse with the **web UI**.
+**HTTP service**, browse with the **web UI**, or plug it into an **AI coding agent over MCP**
+so it searches a warm index instead of grepping.
 
 > Built for the cases off-the-shelf IDE assistants don't cover well: a codebase that's too
 > big, too private, or too custom — or a search/RAG capability you want to own and embed in
 > your own tools.
 
+## ⚡ Find the right code in one call — not a grep loop
+
+Coding agents like Claude Code and Codex locate code by *running searches* — grep, glob, read,
+repeat — which burns tokens and round-trips and reduces to literal keyword matching. CodeRAG
+turns the workspace into a **warm, pre-indexed** engine: a single query returns the right
+functions and files ranked by **meaning *and* keyword**, with exact `path:line` citations. The
+embedding model loads once, so each query is one in-process lookup (FAISS + BM25 + fusion), not
+a multi-round shell loop — and over MCP (`coderag mcp`, below) it becomes the agent's search tool.
+
+**Proof from the eval harness** — this repo's 24 natural-language → file queries (90 files /
+553 chunks), local `bge-small`, one warm query each (reproduce with `coderag eval --compare
+--dataset coderag/eval/datasets/coderag_self.jsonl`):
+
+| retrieval | MRR | R@1 | R@5 | Hit@10 |
+| --- | :---: | :---: | :---: | :---: |
+| BM25 — ranked keyword search (already stronger than raw grep) | 0.751 | 0.604 | 0.854 | 1.000 |
+| dense — semantic only | 0.784 | 0.604 | 0.938 | 1.000 |
+| **hybrid — CodeRAG's default** | **0.822** | **0.688** | **1.000** | **1.000** |
+
+Hybrid puts a relevant file in the **top-5 for every query** and ranks it **#1 ≈69%** of the
+time — beating the ranked-keyword search a grep-based agent leans on (raw grep is weaker still:
+unranked literal match) by adding semantic understanding on top. To measure the *latency* and
+*token-cost* gap against an actual grep loop on your own repo, run
+[`scripts/bench_vs_grep.py`](scripts/bench_vs_grep.py). The fuller story — symbol-level
+localization, the reranker (**+55% R@1** where there's headroom), multi-repo generalization,
+and the honest caveats — is in [`docs/eval.md`](docs/eval.md).
+
 ---
 
 ## ✨ Highlights
@@ -35,10 +63,11 @@ codebases**. Use it from the **CLI**, embed it as a **Python library**, self-hos
 - **Bring your own model platform.** Built for self-hosted and local models first (any OpenAI-compatible server — Ollama, vLLM, LM Studio, LocalAI), with first-class **OpenAI API** and **Anthropic API** support when you want it.
 - **Symbol-aware chunking.** Indexes *functions, classes, and methods* (Python via `ast`; JS/TS/Go/Rust/Java via tree-sitter), not crude fixed-size blocks — so results point at real code units with `file:line` citations.
 - **Hybrid retrieval, with optional reranking.** Dense vector search **+** BM25 keyword search, fused with Reciprocal Rank Fusion — great at both "what does this *mean*" and exact-identifier lookups. Add an optional local **cross-encoder reranker** (two-stage retrieve-then-rerank, `CODERAG_RERANK=1`, no API key) to sharpen the top results.
+- **Drop-in for AI coding agents.** An **MCP server** (`coderag mcp`) lets Claude Code, Codex, and Cursor search a warm, pre-indexed workspace instead of slow grep/glob/read loops — ranked `path:line` results from a single call, with the index kept live as you edit. Works on a plain file directory too, not just code.
 - **Measured, not guessed.** A built-in **evaluation harness** (`coderag eval`) scores retrieval quality — recall@k, MRR, nDCG@k at file *or* symbol level — and can mine a benchmark straight from your git history. Every default (1:1 hybrid, reranker opt-in, adaptive fusion off) is the choice the harness validated, including across an external repo.
 - **Incremental & live.** Content-hashed indexing only re-embeds files that changed; a debounced watcher keeps the index current as you code. No duplicate or stale vectors.
 - **Built to scale.** Exact `Flat` search for small repos, automatic switch to approximate `IVF` past a threshold so it stays fast at 100k+ chunks.
-- **Four surfaces, one engine.** CLI · Python library · HTTP/REST · web UI — all thin wrappers over the same `CodeRAG` object.
+- **Five surfaces, one engine.** CLI · Python library · HTTP/REST · web UI · MCP server — all thin wrappers over the same `CodeRAG` object.
 
 ## 🚀 Quick start
 
@@ -47,6 +76,7 @@ pip install -e .            # core engine (local embeddings included)
 # optional extras:
 pip install -e ".[server]"     # HTTP/REST API
 pip install -e ".[ui]"         # built-in web UI (FastAPI + Jinja + Pygments)
+pip install -e ".[mcp]"        # MCP server for AI coding agents (Claude Code, Codex, Cursor)
 pip install -e ".[openai]"     # OpenAI (or self-hosted OpenAI-compatible) embeddings / answers
 pip install -e ".[anthropic]"  # Anthropic (Claude) LLM answers
 pip install -e ".[all]"        # everything above
@@ -69,7 +99,7 @@ coderag search "where are duplicate vectors removed on file change" --watched-di
 By default the index lives in `./.coderag/`. Set `CODERAG_WATCHED_DIR` / `CODERAG_STORE_DIR`
 (or copy `example.env` to `.env`) to avoid repeating flags.
 
-## 🧑‍💻 The four surfaces
+## 🧑‍💻 The surfaces
 
 ### CLI
 
@@ -79,6 +109,7 @@ coderag search "QUERY" [-k 8]     # hybrid search; add --json or --answer
 coderag watch                     # index, then keep it live as files change
 coderag serve --port 8000         # run the HTTP API  (needs [server])
 coderag ui                        # launch the web UI (needs [ui])
+coderag mcp                       # MCP server for AI agents (needs [mcp]); --all-text for any dir
 coderag status                    # index stats (files, chunks, model, index type)
 coderag eval --dataset d.jsonl --compare  # retrieval quality: dense vs BM25 vs hybrid
 ```
@@ -131,6 +162,57 @@ browser**, index status, a one-click **Reindex**, and an optional streamed LLM a
 enhanced — every page works with JavaScript disabled, and there's no CDN/runtime network
 dependency, so it stays local-first.
 
+### MCP — let an AI coding agent search instead of grepping  (`coderag mcp`)
+
+Tools like Claude Code and Codex locate code with iterative `grep`/`glob`/read loops. CodeRAG
+exposes the same workspace as a **Model Context Protocol** server, so an agent gets fast,
+ranked `path:line` results from a single call against a **warm, pre-indexed** workspace — the
+embedding model loads once and every query is then one in-process lookup (FAISS + BM25 +
+fusion), not a multi-round shell search.
+
+```bash
+pip install -e ".[mcp]"
+coderag mcp                 # index the current dir, keep it live, serve over stdio
+coderag mcp --all-text      # index ALL text files (docs/notes/config), not just code
+```
+
+It auto-indexes the working directory on startup (in the **background**, so it's responsive
+immediately) and keeps the index live with the watcher — zero manual steps. Tools exposed:
+**`search_code`** (hybrid search, compact snippets + `path:line`), **`get_file`** (read a
+precise range of an indexed file), **`index_status`** (coverage/freshness), and **`reindex`**.
+
+Wire it into an agent (the server defaults to the directory it's launched in):
+
+```bash
+# Claude Code
+claude mcp add coderag -- coderag mcp
+```
+
+```jsonc
+// Cursor: .cursor/mcp.json  —  or Claude Code: .mcp.json  (at the repo root)
+{ "mcpServers": { "coderag": { "command": "coderag", "args": ["mcp"] } } }
+```
+
+```toml
+# Codex: ~/.codex/config.toml
+[mcp_servers.coderag]
+command = "coderag"
+args = ["mcp"]
+```
+
+> If `coderag` isn't on the launcher's PATH, use an absolute path (or `python -m coderag.surfaces.cli mcp`).
+> To index a directory other than where the client launches, add `"--watched-dir", "/abs/path"` to `args`.
+> Fast by default (local `bge-small`, no reranker); set `CODERAG_RERANK=1` to trade ~30 ms/query for sharper top results.
+
+**Why bother? Measure it.** [`scripts/bench_vs_grep.py`](scripts/bench_vs_grep.py) scores
+indexed search against a raw grep baseline on the same eval dataset — accuracy
+(recall@k / nDCG@k / MRR via the eval harness), latency per query, and approximate context
+tokens (compact chunks vs reading whole files):
+
+```bash
+python scripts/bench_vs_grep.py --watched-dir . --dataset coderag/eval/datasets/coderag_self.jsonl
+```
+
 ## 🐳 Docker (beta)
 
 Prebuilt **multi-arch** images (`linux/amd64` + `linux/arm64`) are published to GHCR on
@@ -228,18 +310,25 @@ Everything is configurable via `CODERAG_*` environment variables or a `.env` fil
 | `CODERAG_ANTHROPIC_MODEL` | `claude-opus-4-8` | Anthropic chat model for answers |
 | `CODERAG_API_KEY` | – | If set, the HTTP API **requires** it (`Authorization: Bearer <key>` or `X-API-Key`). Set whenever the server is reachable beyond localhost. |
 | `CODERAG_CORS_ORIGINS` | – | Comma-separated CORS allowlist for the HTTP API (never `*`). Empty ⇒ no cross-origin browser access. |
+| `CODERAG_WORKERS` | `4` | Worker threads for chunking + embedding during indexing (`1` = serial). |
+| `CODERAG_INDEX_ALL_TEXT` | `false` | Index any UTF-8 text file (docs/config/extensionless), not just code — turns a plain directory into a searchable workspace. Binary files are always skipped. |
+| `CODERAG_MCP_AUTO_INDEX` | `true` | MCP server indexes the watched dir on startup (in the background). |
+| `CODERAG_MCP_WATCH` | `true` | MCP server keeps the index live via the filesystem watcher. |
+| `CODERAG_MCP_SNIPPET_LINES` | `12` | Lines of a chunk returned in a `search_code` snippet by default. |
 
 ## 🧩 Supported languages
 
 Symbol-aware (function/class/method level): **Python, JavaScript, TypeScript/TSX, Go, Rust,
-Java**. Many other languages and docs (C/C++, Ruby, PHP, Markdown, YAML, …) are indexed with
-a line-window fallback, so they remain searchable.
+Java**. Many other languages and docs (C/C++, Ruby, PHP, Markdown, YAML, HTML/CSS, …) are
+indexed with a line-window fallback, so they remain searchable. Set `CODERAG_INDEX_ALL_TEXT=1`
+(or `coderag mcp --all-text`) to index **any** UTF-8 text file — including extensionless ones
+like `Dockerfile` — so a plain document/notes directory becomes searchable too, not just code.
 
 ## 🛠️ Development
 
 ```bash
 python -m venv venv && source venv/bin/activate
-pip install -e ".[dev,server,openai]"
+pip install -e ".[dev,server,ui,mcp,openai]"
 
 pytest -m "not integration"     # fast, offline (uses a deterministic fake embedder)
 pytest -m integration           # exercises the real local model (downloads once)
diff --git a/coderag/api.py b/coderag/api.py
index a144ff8..497993d 100644
--- a/coderag/api.py
+++ b/coderag/api.py
@@ -8,6 +8,7 @@
 from __future__ import annotations
 
 import logging
+import threading
 from pathlib import Path
 from typing import TYPE_CHECKING, List, Optional, Union
 
@@ -38,6 +39,10 @@ def __init__(self, config: Optional[Config] = None) -> None:
         # Set when the store's embedding model/dim changed and the FAISS cache must
         # be rebuilt from scratch (consumed when the vector index is first opened).
         self._rebuild_required: bool = False
+        # Serializes all indexing/deletion so concurrent writers (the CLI, the HTTP
+        # surface, the MCP server's background index, and the live watcher) can't
+        # interleave a file's delete-before-add sequence. Reads (search) are unaffected.
+        self._index_lock = threading.Lock()
 
     # --- lazily constructed collaborators ---
 
@@ -116,7 +121,8 @@ def index(
         force a clean rebuild.
         """
         target = Path(path).expanduser() if path else self.config.watched_dir
-        return self.indexer.index(target, full=full)
+        with self._index_lock:
+            return self.indexer.index(target, full=full)
 
     def search(self, query: str, top_k: Optional[int] = None) -> List[SearchHit]:
         """Hybrid (dense + lexical) search over the indexed codebase."""
@@ -161,10 +167,11 @@ def delete_path(self, path: Union[str, Path]) -> int:
             rel = Path(path).resolve().relative_to(root).as_posix()
         except ValueError:
             return 0
-        removed = self.store.delete_file(rel)
-        if removed:
-            self.vectors.remove(removed)
-            self.vectors.save()
+        with self._index_lock:
+            removed = self.store.delete_file(rel)
+            if removed:
+                self.vectors.remove(removed)
+                self.vectors.save()
         return len(removed)
 
     def status(self) -> dict:
diff --git a/coderag/chunking/languages.py b/coderag/chunking/languages.py
index c2cc5a4..84e0e8f 100644
--- a/coderag/chunking/languages.py
+++ b/coderag/chunking/languages.py
@@ -49,12 +49,49 @@
     ".json": "json",
     ".cfg": "ini",
     ".ini": "ini",
+    # Common markup/web/config text — searchable in most repos, line-window chunked.
+    ".xml": "xml",
+    ".html": "html",
+    ".htm": "html",
+    ".css": "css",
+    ".scss": "scss",
+    ".less": "less",
+    ".vue": "vue",
+    ".svelte": "svelte",
+    ".properties": "properties",
+    ".gradle": "gradle",
 }
 
+# Well-known text files that have no (or an unconventional) extension. Matched on the
+# lowercased file *name* when the extension lookup misses.
+FILENAME_TO_LANGUAGE = {
+    "dockerfile": "dockerfile",
+    "makefile": "make",
+    "license": "text",
+    "notice": "text",
+    "readme": "text",
+    "codeowners": "text",
+    ".env": "text",
+    ".gitignore": "text",
+    ".dockerignore": "text",
+}
+
+
+def detect_language(path: str | Path, *, all_text: bool = False) -> Optional[str]:
+    """Return the language for ``path``, or ``None`` if it should not be indexed.
 
-def detect_language(path: str | Path) -> Optional[str]:
-    """Return the language for ``path``, or ``None`` if it should not be indexed."""
-    return EXTENSION_TO_LANGUAGE.get(Path(path).suffix.lower())
+    With ``all_text=True`` any unrecognized file is treated as plain ``"text"`` so a whole
+    directory (docs, notes, config) becomes searchable, not just code. Binary files are
+    still rejected later by the indexer's NUL-byte sniff, so this stays safe.
+    """
+    p = Path(path)
+    lang = EXTENSION_TO_LANGUAGE.get(p.suffix.lower())
+    if lang:
+        return lang
+    lang = FILENAME_TO_LANGUAGE.get(p.name.lower())
+    if lang:
+        return lang
+    return "text" if all_text else None
 
 
 def extensions_for(languages: Iterable[str]) -> List[str]:
diff --git a/coderag/config.py b/coderag/config.py
index 4c76423..54b9f29 100644
--- a/coderag/config.py
+++ b/coderag/config.py
@@ -117,6 +117,12 @@ class Config:
     # --- What to index ---
     languages: Tuple[str, ...] = DEFAULT_LANGUAGES
     ignore_globs: Tuple[str, ...] = DEFAULT_IGNORE_GLOBS
+    # Index any UTF-8-decodable file as plain text, even with an unknown/absent extension
+    # (Dockerfile, Makefile, LICENSE, .log, ...). Off by default so code repos aren't
+    # polluted; turn on (CODERAG_INDEX_ALL_TEXT / `coderag mcp --all-text`) to make
+    # CodeRAG a general document/file-directory search engine. Binary files are still
+    # skipped (NUL-byte sniff in the indexer).
+    index_all_text: bool = False
     max_file_bytes: int = 1_000_000  # skip files larger than this
     max_chunk_lines: int = 200  # split oversized symbols into windows above this
     window_lines: int = 60  # fallback line-window size
@@ -182,6 +188,18 @@ class Config:
     # wildcard lets any website the user visits exfiltrate them via the browser.
     cors_origins: Tuple[str, ...] = ()
 
+    # --- MCP server surface (optional [mcp] surface) ---
+    # `coderag mcp` runs a persistent, warm process so the embedding model loads once and
+    # every query is fast — the win over an agent's cold, repeated grep/read loop. By
+    # default it indexes the watched dir on startup (in the background, so the server is
+    # responsive immediately) and keeps it live via the filesystem watcher, so an agent
+    # gets fresh results with zero manual steps.
+    mcp_auto_index: bool = True
+    mcp_watch: bool = True
+    # Lines of a chunk returned in a search_code snippet by default (the agent can request
+    # the full text, or fetch a precise range via get_file) — keeps responses token-cheap.
+    mcp_snippet_lines: int = 12
+
     # --- Demo mode (public, untrusted UI) ---
     # When on, the Streamlit UI shows a notice, hides the Reindex button, and limits
     # LLM answers per browser session. The per-session limit is soft (session-state
@@ -252,6 +270,12 @@ def from_env(cls, **overrides: object) -> "Config":
             ),
             api_key=os.getenv("CODERAG_API_KEY"),
             cors_origins=_env_tuple("CODERAG_CORS_ORIGINS", cls.cors_origins),
+            index_all_text=_env_bool("CODERAG_INDEX_ALL_TEXT", cls.index_all_text),
+            mcp_auto_index=_env_bool("CODERAG_MCP_AUTO_INDEX", cls.mcp_auto_index),
+            mcp_watch=_env_bool("CODERAG_MCP_WATCH", cls.mcp_watch),
+            mcp_snippet_lines=_env_int(
+                "CODERAG_MCP_SNIPPET_LINES", cls.mcp_snippet_lines
+            ),
             demo_mode=_env_bool("CODERAG_DEMO_MODE", cls.demo_mode),
             demo_max_answers=_env_int("CODERAG_DEMO_MAX_ANSWERS", cls.demo_max_answers),
             demo_cooldown_seconds=_env_int(
diff --git a/coderag/indexer.py b/coderag/indexer.py
index e3d4fd3..0970e40 100644
--- a/coderag/indexer.py
+++ b/coderag/indexer.py
@@ -24,7 +24,7 @@
 from coderag.embeddings import EmbeddingProvider
 from coderag.store.sqlite_store import SQLiteStore
 from coderag.store.vector_index import FaissVectorIndex
-from coderag.types import IndexStats
+from coderag.types import Chunk, IndexStats
 
 logger = logging.getLogger(__name__)
 
@@ -84,17 +84,11 @@ def index(
             else:
                 work.append(item)
 
-        # 2. (Re)index changed files: remove old chunks, embed, add new ones.
-        iterator: Iterator[_Work] = iter(work)
-        if progress and work:
-            try:
-                from tqdm import tqdm
-
-                iterator = tqdm(work, desc="Indexing", unit="file")
-            except Exception:  # pragma: no cover
-                pass
-        for item in iterator:
-            added, removed = self._index_file(item)
+        # 2. (Re)index changed files. Chunking + embedding (the CPU/network cost) may run
+        #    in parallel across files (config.index_workers); the SQLite + FAISS writes
+        #    stay on this single thread to preserve the delete-before-add invariant and
+        #    the single-connection store.
+        for added, removed in self._embed_and_write(work, progress=progress):
             stats.chunks_added += added
             stats.chunks_removed += removed
             stats.files_indexed += 1
@@ -131,6 +125,8 @@ def _maybe_work(self, abs_path: Path, rel: str, language: str) -> Optional[_Work
             return None
         if len(data) > self.config.max_file_bytes or not data.strip():
             return None
+        if b"\x00" in data[:8192]:
+            return None  # binary file (NUL byte in the head) — never index as text
         content_hash = hashlib.sha256(data).hexdigest()
         existing = self.store.get_file(rel)
         if existing is not None and existing["content_hash"] == content_hash:
@@ -138,7 +134,66 @@ def _maybe_work(self, abs_path: Path, rel: str, language: str) -> Optional[_Work
         text = data.decode("utf-8", errors="replace")
         return _Work(rel, language, text, content_hash, abs_path.stat().st_mtime)
 
-    def _index_file(self, item: _Work) -> Tuple[int, int]:
+    def _embed_and_write(
+        self, work: List[_Work], *, progress: bool
+    ) -> Iterator[Tuple[int, int]]:
+        """Chunk+embed each file (optionally across worker threads) and apply the writes.
+
+        Embedding is the expensive, parallelizable step and touches no shared mutable
+        state, so it runs in a thread pool when ``index_workers > 1``. The store/FAISS
+        writes are drained here on the single calling thread, so the no-duplicate
+        (delete-before-add) invariant and the single-writer store are preserved.
+        """
+        if not work:
+            return
+        workers = max(1, self.config.index_workers)
+        bar = self._progress_bar(len(work), progress)
+        try:
+            if workers > 1 and len(work) > 1:
+                from concurrent.futures import ThreadPoolExecutor, as_completed
+
+                with ThreadPoolExecutor(max_workers=workers) as pool:
+                    futures = {pool.submit(self._prepare, item): item for item in work}
+                    for fut in as_completed(futures):
+                        chunks, vectors = fut.result()
+                        yield self._write(futures[fut], chunks, vectors)
+                        if bar is not None:
+                            bar.update(1)
+            else:
+                for item in work:
+                    chunks, vectors = self._prepare(item)
+                    yield self._write(item, chunks, vectors)
+                    if bar is not None:
+                        bar.update(1)
+        finally:
+            if bar is not None:
+                bar.close()
+
+    @staticmethod
+    def _progress_bar(total: int, progress: bool):  # type: ignore[no-untyped-def]
+        if not progress:
+            return None
+        try:
+            from tqdm import tqdm
+
+            return tqdm(total=total, desc="Indexing", unit="file")
+        except Exception:  # pragma: no cover
+            return None
+
+    def _prepare(self, item: _Work) -> Tuple[List[Chunk], Optional[np.ndarray]]:
+        """Chunk and embed a file. Pure with respect to the store/FAISS, so it is safe to
+        run in a worker thread; the resulting writes are applied by :meth:`_write`."""
+        chunks = chunk_file(item.text, item.language, self.config)
+        if not chunks:
+            return [], None
+        vectors = self.provider.embed_documents([c.text for c in chunks])
+        return chunks, vectors
+
+    def _write(
+        self, item: _Work, chunks: List[Chunk], vectors: Optional[np.ndarray]
+    ) -> Tuple[int, int]:
+        """Apply a prepared file: remove its old chunks (store + FAISS) before adding the
+        new ones. Must run single-threaded — it is the only writer."""
         removed = 0
         existing = self.store.get_file(item.rel)
         if existing is not None:
@@ -150,11 +205,9 @@ def _index_file(self, item: _Work) -> Tuple[int, int]:
             item.rel, item.language, item.content_hash, item.mtime
         )
 
-        chunks = chunk_file(item.text, item.language, self.config)
-        if not chunks:
+        if not chunks or vectors is None:
             return 0, removed
 
-        vectors = self.provider.embed_documents([c.text for c in chunks])
         new_ids = self.store.add_chunks(
             file_id, chunks, vectors, self.provider.model_id
         )
@@ -164,7 +217,7 @@ def _index_file(self, item: _Work) -> Tuple[int, int]:
     def _walk(self, target: Path, root: Path) -> Iterator[Tuple[Path, str, str]]:
         if target.is_file():
             rel = self._rel(target, root)
-            language = detect_language(target)
+            language = detect_language(target, all_text=self.config.index_all_text)
             if rel and language and not self._ignored(rel):
                 yield target, rel, language
             return
@@ -177,7 +230,7 @@ def _walk(self, target: Path, root: Path) -> Iterator[Tuple[Path, str, str]]:
                 rel = self._rel(abs_path, root)
                 if not rel or self._ignored(rel):
                     continue
-                language = detect_language(name)
+                language = detect_language(name, all_text=self.config.index_all_text)
                 if language:
                     yield abs_path, rel, language
 
diff --git a/coderag/store/vector_index.py b/coderag/store/vector_index.py
index 5fa919e..5ae86a3 100644
--- a/coderag/store/vector_index.py
+++ b/coderag/store/vector_index.py
@@ -14,6 +14,7 @@
 
 import logging
 import math
+import threading
 from pathlib import Path
 from typing import TYPE_CHECKING, Iterable, Tuple
 
@@ -49,6 +50,11 @@ def __init__(self, index: faiss.Index, kind: str, config: Config, dim: int) -> N
         self.kind = kind
         self.config = config
         self.dim = dim
+        # A FAISS index is not safe for a write (add/remove/rebuild) concurrent with a
+        # read (search). The MCP server is the first surface to run the watcher (which
+        # writes) alongside live agent queries (which read), so serialize index access on
+        # a reentrant lock. Reads are fast, so contention is negligible.
+        self._lock = threading.RLock()
 
     # --- construction / persistence ---
 
@@ -76,14 +82,16 @@ def open(cls, config: Config, dim: int) -> "FaissVectorIndex":
     def save(self) -> None:
         path = self.config.faiss_path
         path.parent.mkdir(parents=True, exist_ok=True)
-        faiss.write_index(self._index, str(path))
-        Path(str(path) + ".kind").write_text(self.kind)
+        with self._lock:
+            faiss.write_index(self._index, str(path))
+            Path(str(path) + ".kind").write_text(self.kind)
 
     # --- properties ---
 
     @property
     def ntotal(self) -> int:
-        return int(self._index.ntotal)
+        with self._lock:
+            return int(self._index.ntotal)
 
     # --- mutations ---
 
@@ -92,22 +100,25 @@ def add(self, ids: np.ndarray, vectors: np.ndarray) -> None:
             return
         vecs = _normalized(vectors)
         id_arr = np.ascontiguousarray(ids, dtype="int64")
-        self._index.add_with_ids(vecs, id_arr)
+        with self._lock:
+            self._index.add_with_ids(vecs, id_arr)
 
     def remove(self, ids: Iterable[int]) -> int:
         ids = list(ids)
         if not ids:
             return 0
         selector = faiss.IDSelectorBatch(np.asarray(ids, dtype="int64"))
-        return int(self._index.remove_ids(selector))
+        with self._lock:
+            return int(self._index.remove_ids(selector))
 
     def search(self, query: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]:
         """Return ``(ids, scores)`` for the top-k, with FAISS ``-1`` padding stripped."""
-        if self.ntotal == 0:
-            return np.empty(0, dtype="int64"), np.empty(0, dtype="float32")
-        q = _normalized(np.asarray(query, dtype="float32").reshape(1, -1))
-        k = min(k, self.ntotal)
-        scores, ids = self._index.search(q, k)
+        with self._lock:
+            if self.ntotal == 0:
+                return np.empty(0, dtype="int64"), np.empty(0, dtype="float32")
+            q = _normalized(np.asarray(query, dtype="float32").reshape(1, -1))
+            k = min(k, self.ntotal)
+            scores, ids = self._index.search(q, k)
         ids_row, scores_row = ids[0], scores[0]
         mask = ids_row != -1
         return ids_row[mask].astype("int64"), scores_row[mask].astype("float32")
@@ -136,44 +147,50 @@ def _build_ivf(self, ids: np.ndarray, vecs: np.ndarray) -> faiss.Index:
         return index
 
     def rebuild_from_store(self, store: "SQLiteStore") -> None:
-        """Discard the current index and rebuild it from the SQLite vectors."""
-        n = store.total_chunks()
-        kind = self._choose_kind(n)
-        if n == 0:
-            self._index = self._empty_flat(self.dim)
-            self.kind = "flat"
-            self.save()
-            return
+        """Discard the current index and rebuild it from the SQLite vectors.
 
-        if kind == "ivf":
-            # IVF needs all training vectors up front.
-            all_ids, all_vecs = [], []
-            for ids, vecs in store.iter_vectors():
-                all_ids.append(ids)
-                all_vecs.append(_normalized(vecs))
-            ids = np.concatenate(all_ids)
-            vecs = np.vstack(all_vecs)
-            try:
-                self._index = self._build_ivf(ids, vecs)
-                self.kind = "ivf"
-            except Exception as exc:
-                # Degenerate corpora (too few or many duplicate vectors) can make IVF
-                # training fail; fall back to exact flat rather than aborting indexing.
-                logger.warning(
-                    "IVF training failed (%s); falling back to flat index.", exc
-                )
+        Holds the index lock for the whole swap so a concurrent search never observes a
+        half-built index. This is rare (model change, or the one-time flat->ivf upgrade),
+        so briefly stalling reads is an acceptable price for correctness.
+        """
+        with self._lock:
+            n = store.total_chunks()
+            kind = self._choose_kind(n)
+            if n == 0:
+                self._index = self._empty_flat(self.dim)
+                self.kind = "flat"
+                self.save()
+                return
+
+            if kind == "ivf":
+                # IVF needs all training vectors up front.
+                all_ids, all_vecs = [], []
+                for ids, vecs in store.iter_vectors():
+                    all_ids.append(ids)
+                    all_vecs.append(_normalized(vecs))
+                ids = np.concatenate(all_ids)
+                vecs = np.vstack(all_vecs)
+                try:
+                    self._index = self._build_ivf(ids, vecs)
+                    self.kind = "ivf"
+                except Exception as exc:
+                    # Degenerate corpora (too few or many duplicate vectors) can make IVF
+                    # training fail; fall back to exact flat rather than aborting indexing.
+                    logger.warning(
+                        "IVF training failed (%s); falling back to flat index.", exc
+                    )
+                    index = self._empty_flat(self.dim)
+                    index.add_with_ids(vecs, np.ascontiguousarray(ids))
+                    self._index = index
+                    self.kind = "flat"
+            else:
                 index = self._empty_flat(self.dim)
-                index.add_with_ids(vecs, np.ascontiguousarray(ids))
+                for ids, vecs in store.iter_vectors():
+                    index.add_with_ids(_normalized(vecs), np.ascontiguousarray(ids))
                 self._index = index
                 self.kind = "flat"
-        else:
-            index = self._empty_flat(self.dim)
-            for ids, vecs in store.iter_vectors():
-                index.add_with_ids(_normalized(vecs), np.ascontiguousarray(ids))
-            self._index = index
-            self.kind = "flat"
-            logger.info("Built flat index: %d vectors", n)
-        self.save()
+                logger.info("Built flat index: %d vectors", n)
+            self.save()
 
     def ensure_consistent(self, store: "SQLiteStore") -> None:
         """Rebuild from SQLite if the cached index disagrees with the store.
diff --git a/coderag/surfaces/cli.py b/coderag/surfaces/cli.py
index aa36730..7c5456b 100644
--- a/coderag/surfaces/cli.py
+++ b/coderag/surfaces/cli.py
@@ -193,6 +193,27 @@ def cmd_serve(args: argparse.Namespace) -> int:
     return 0
 
 
+def cmd_mcp(args: argparse.Namespace) -> int:
+    try:
+        from coderag.surfaces.mcp_server import run_mcp
+    except ImportError:
+        print(
+            "The MCP server needs extra deps. Install with: pip install 'coderag[mcp]'"
+        )
+        return 1
+    cfg = _build_config(args)
+    if args.all_text:
+        cfg = cfg.with_overrides(index_all_text=True)
+    cr = CodeRAG(cfg)
+    run_mcp(
+        cr,
+        transport=args.transport,
+        auto_index=not args.no_index,
+        watch=not args.no_watch,
+    )
+    return 0
+
+
 def cmd_ui(args: argparse.Namespace) -> int:
     try:
         from coderag.surfaces.webui import run_ui
@@ -326,6 +347,36 @@ def build_parser() -> argparse.ArgumentParser:
     _add_common(p_serve)
     p_serve.set_defaults(func=cmd_serve)
 
+    p_mcp = sub.add_parser(
+        "mcp",
+        help="Run the MCP server so AI agents (Claude Code/Codex/Cursor) can search "
+        "this workspace instead of grepping.",
+    )
+    p_mcp.add_argument(
+        "--transport",
+        choices=("stdio", "sse", "streamable-http"),
+        default="stdio",
+        help="MCP transport (default stdio — how editors/agents launch servers).",
+    )
+    p_mcp.add_argument(
+        "--no-index",
+        action="store_true",
+        help="Don't index the workspace on startup; use the existing index as-is.",
+    )
+    p_mcp.add_argument(
+        "--no-watch",
+        action="store_true",
+        help="Don't keep the index live with the filesystem watcher.",
+    )
+    p_mcp.add_argument(
+        "--all-text",
+        action="store_true",
+        help="Index any text file, not just code (docs/notes/config) — for a plain "
+        "file directory, not only a code repo.",
+    )
+    _add_common(p_mcp)
+    p_mcp.set_defaults(func=cmd_mcp)
+
     p_ui = sub.add_parser("ui", help="Launch the built-in web UI.")
     p_ui.add_argument(
         "--host",
diff --git a/coderag/surfaces/mcp_server.py b/coderag/surfaces/mcp_server.py
new file mode 100644
index 0000000..5ae10cd
--- /dev/null
+++ b/coderag/surfaces/mcp_server.py
@@ -0,0 +1,272 @@
+"""MCP server surface (optional ``[mcp]`` extra).
+
+Exposes CodeRAG to AI coding agents (Claude Code, Codex, Cursor, …) over the Model
+Context Protocol so they can query a **warm, pre-indexed workspace** instead of running
+slow, repeated grep/glob/read loops. The server is a persistent process: the embedding
+model loads once at startup and every query is then a single fast in-process call
+(FAISS ANN + BM25 + fusion), so retrieval is cheaper and faster than an agent's
+multi-round shell search — the whole point of this surface.
+
+Design: like the other surfaces (``cli``/``http_api``/``webui``), this is a thin adapter
+over the :class:`coderag.api.CodeRAG` facade. Heavy imports (the ``mcp`` SDK) live inside
+the functions so importing this module stays cheap and the ``[mcp]`` extra is only needed
+to actually run it. The four tools route entirely through existing facade methods.
+
+Note: this module intentionally does NOT use ``from __future__ import annotations`` — the
+MCP SDK introspects the tools' real type hints to generate their input/output schemas.
+"""
+
+import logging
+import threading
+from typing import TYPE_CHECKING, List, Literal, Optional
+
+if TYPE_CHECKING:
+    from mcp.server.fastmcp import FastMCP
+
+    from coderag.api import CodeRAG
+    from coderag.types import SearchHit
+
+logger = logging.getLogger(__name__)
+
+_INSTRUCTIONS = (
+    "CodeRAG indexes this workspace for fast semantic + keyword search. Prefer the "
+    "search_code tool over grep/glob/read loops to find code or text by meaning or by "
+    "identifier — it returns ranked results with exact path:line locations in one call. "
+    "Then use get_file to read a precise range. Call index_status to check freshness."
+)
+
+
+class _State:
+    """Mutable server state shared between the tools and the background threads."""
+
+    def __init__(self) -> None:
+        self.indexing = False  # True while the initial/manual index runs
+        self.stop = threading.Event()  # set on shutdown to stop the watcher thread
+
+
+def _truncate(text: str, max_lines: int) -> "tuple[str, bool]":
+    lines = text.splitlines()
+    if max_lines <= 0 or len(lines) <= max_lines:
+        return text, False
+    return "\n".join(lines[:max_lines]) + "\n…", True
+
+
+def _format_hit(hit: "SearchHit", snippet_lines: int, full_text: bool) -> dict:
+    """Compact, token-cheap projection of a SearchHit for an agent.
+
+    Collapses the line range into a ``path:start-end`` location and truncates the snippet
+    by default — the agent reads the location and calls ``get_file`` only for the chunk it
+    actually wants, which is what makes this cheaper than dumping whole files.
+    """
+    if full_text:
+        snippet, truncated = hit.text, False
+    else:
+        snippet, truncated = _truncate(hit.text, snippet_lines)
+    return {
+        "location": f"{hit.path}:{hit.start_line}-{hit.end_line}",
+        "symbol": hit.symbol,
+        "kind": hit.kind,
+        "language": hit.language,
+        "score": round(hit.score, 4),
+        "similarity": round(hit.similarity, 4),
+        "snippet": snippet,
+        "truncated": truncated,
+    }
+
+
+def _filter_hits(
+    hits: "List[SearchHit]",
+    *,
+    language: Optional[str],
+    path_prefix: Optional[str],
+    kind: Optional[str],
+) -> "List[SearchHit]":
+    """Best-effort post-filter (the searcher itself has no filters)."""
+    out = hits
+    if language:
+        lang = language.lower()
+        out = [h for h in out if h.language.lower() == lang]
+    if kind:
+        want = kind.lower()
+        out = [h for h in out if h.kind.lower() == want]
+    if path_prefix:
+        out = [h for h in out if h.path.startswith(path_prefix)]
+    return out
+
+
+def _status_word(state: _State) -> str:
+    return "in_progress" if state.indexing else "ready"
+
+
+def build_mcp(cr: "CodeRAG", *, state: Optional[_State] = None) -> "FastMCP":
+    """Build the FastMCP server with CodeRAG's tools wired to the facade.
+
+    Pure construction (no indexing, no transport), so tests can drive the tools in-memory.
+    """
+    from mcp.server.fastmcp import FastMCP
+
+    state = state or _State()
+    snippet_lines = cr.config.mcp_snippet_lines
+    mcp = FastMCP("coderag", instructions=_INSTRUCTIONS)
+
+    @mcp.tool()
+    def search_code(
+        query: str,
+        top_k: int = 8,
+        language: Optional[str] = None,
+        path_prefix: Optional[str] = None,
+        kind: Optional[str] = None,
+        full_text: bool = False,
+    ) -> dict:
+        """Search the indexed workspace by meaning AND keyword (hybrid retrieval).
+
+        Use this INSTEAD of grep/glob/read loops to locate code or text: one fast call
+        returns the most relevant chunks with exact ``path:start-end`` locations. Works for
+        conceptual questions ("where is retry/backoff handled?") and exact identifiers
+        alike. Snippets are truncated by default to stay token-cheap — pass
+        ``full_text=true`` for the whole chunk, or call ``get_file`` for a precise range.
+
+        Args:
+            query: Natural-language question, or a code snippet/identifier to find.
+            top_k: Maximum number of results to return (default 8).
+            language: Restrict to one language tag (e.g. "python", "typescript").
+            path_prefix: Restrict to paths starting with this prefix (e.g. "src/").
+            kind: Restrict to a chunk kind ("function", "class", "method", "window").
+            full_text: Return each chunk's full text instead of a truncated snippet.
+        """
+        if language or path_prefix or kind:
+            # The searcher can't filter, so pull a deeper pool and filter post-hoc.
+            pool = max(top_k * 5, cr.config.fetch_k)
+            hits = _filter_hits(
+                cr.search(query, top_k=pool),
+                language=language,
+                path_prefix=path_prefix,
+                kind=kind,
+            )[:top_k]
+        else:
+            hits = cr.search(query, top_k=top_k)
+        return {
+            "query": query,
+            "count": len(hits),
+            "indexing": _status_word(state),
+            "results": [_format_hit(h, snippet_lines, full_text) for h in hits],
+        }
+
+    @mcp.tool()
+    def get_file(
+        path: str,
+        start_line: Optional[int] = None,
+        end_line: Optional[int] = None,
+    ) -> dict:
+        """Return the exact contents of an INDEXED file, optionally a 1-based line range.
+
+        Pair with search_code: take a result's path and line range to read precise context.
+        Only files that are in the index can be read (so this can't fetch arbitrary files
+        like .env). Returns ``{"error": ...}`` if the path isn't indexed or escapes the
+        workspace root, rather than failing the call.
+        """
+        try:
+            content = cr.get_file(path, start_line, end_line)
+        except (ValueError, FileNotFoundError) as exc:
+            return {"error": str(exc), "path": path}
+        return {
+            "path": path,
+            "start_line": start_line,
+            "end_line": end_line,
+            "content": content,
+        }
+
+    @mcp.tool()
+    def index_status() -> dict:
+        """Report index coverage, freshness, and the active retrieval configuration.
+
+        Includes total_files / total_chunks, the embedding model, whether the reranker is
+        enabled, and ``"indexing": "ready" | "in_progress"`` so you can tell whether the
+        initial background index has finished. If results look thin, the index may still be
+        warming up — check here.
+        """
+        status = cr.status()
+        status["indexing"] = _status_word(state)
+        return status
+
+    @mcp.tool()
+    def reindex(path: Optional[str] = None, full: bool = False) -> dict:
+        """Re-index the workspace now (incremental by default).
+
+        Rarely needed — the watcher keeps the index live automatically — but useful right
+        after a large checkout or branch switch. Pass ``full=true`` for a clean rebuild.
+        Returns the index stats, or ``{"error": ...}`` if an index run is already going.
+        """
+        if state.indexing:
+            return {"error": "An index operation is already in progress"}
+        state.indexing = True
+        try:
+            stats = cr.index(path, full=full)
+        finally:
+            state.indexing = False
+        return stats.as_dict()
+
+    return mcp
+
+
+def _warm_up(cr: "CodeRAG") -> None:
+    """Load the engine + embedding model once at startup, not on the first query."""
+    try:
+        cr.status()  # builds provider/store/vectors
+        cr.provider.embed_query("warm up")  # loads the model and JITs the query path
+    except Exception:  # pragma: no cover - warm-up is best-effort
+        logger.exception("MCP warm-up failed (continuing).")
+
+
+def run_mcp(
+    cr: "CodeRAG",
+    *,
+    transport: Literal["stdio", "sse", "streamable-http"] = "stdio",
+    auto_index: Optional[bool] = None,
+    watch: Optional[bool] = None,
+) -> None:
+    """Run the MCP server: warm up, (background) index, watch, then serve.
+
+    ``transport`` defaults to ``stdio`` — how Claude Code / Codex / Cursor launch servers.
+    ``auto_index`` / ``watch`` default to the config (``mcp_auto_index`` / ``mcp_watch``).
+    """
+    from coderag.watch import watch as watch_loop
+
+    auto_index = cr.config.mcp_auto_index if auto_index is None else auto_index
+    do_watch = cr.config.mcp_watch if watch is None else watch
+
+    state = _State()
+    mcp = build_mcp(cr, state=state)
+
+    _warm_up(cr)
+
+    if auto_index:
+        # Index on a background thread so stdio is responsive immediately; search_code
+        # works against whatever is already indexed while this runs.
+        state.indexing = True
+
+        def _initial_index() -> None:
+            try:
+                cr.index()
+            except Exception:  # pragma: no cover - defensive
+                logger.exception("Initial MCP index failed.")
+            finally:
+                state.indexing = False
+
+        threading.Thread(
+            target=_initial_index, name="coderag-mcp-index", daemon=True
+        ).start()
+
+    if do_watch:
+        threading.Thread(
+            target=watch_loop,
+            args=(cr,),
+            kwargs={"stop_event": state.stop},
+            name="coderag-mcp-watch",
+            daemon=True,
+        ).start()
+
+    try:
+        mcp.run(transport=transport)
+    finally:
+        state.stop.set()
diff --git a/coderag/watch.py b/coderag/watch.py
index 931e5fd..ee744ba 100644
--- a/coderag/watch.py
+++ b/coderag/watch.py
@@ -11,7 +11,7 @@
 import threading
 import time
 from pathlib import Path
-from typing import TYPE_CHECKING, Set
+from typing import TYPE_CHECKING, Optional, Set
 
 from watchdog.events import FileSystemEvent, FileSystemEventHandler
 from watchdog.observers import Observer
@@ -25,12 +25,15 @@
 
 
 class _Handler(FileSystemEventHandler):
-    def __init__(self, pending: Set[str], lock: threading.Lock) -> None:
+    def __init__(
+        self, pending: Set[str], lock: threading.Lock, all_text: bool = False
+    ) -> None:
         self._pending = pending
         self._lock = lock
+        self._all_text = all_text
 
     def _note(self, path: str) -> None:
-        if path and detect_language(path):
+        if path and detect_language(path, all_text=self._all_text):
             with self._lock:
                 self._pending.add(path)
 
@@ -52,19 +55,27 @@ def on_moved(self, event: FileSystemEvent) -> None:
             self._note(str(getattr(event, "dest_path", "")))
 
 
-def watch(cr: "CodeRAG", debounce: float = 0.5) -> None:
-    """Block, keeping ``cr``'s index in sync with its watched directory until Ctrl-C."""
+def watch(
+    cr: "CodeRAG",
+    debounce: float = 0.5,
+    stop_event: Optional[threading.Event] = None,
+) -> None:
+    """Keep ``cr``'s index in sync with its watched directory.
+
+    Blocks until Ctrl-C, or until ``stop_event`` is set — which lets the watcher run on a
+    background thread (e.g. inside the MCP server) and be shut down cleanly.
+    """
     root = cr.config.watched_dir
     pending: Set[str] = set()
     lock = threading.Lock()
-    handler = _Handler(pending, lock)
+    handler = _Handler(pending, lock, all_text=cr.config.index_all_text)
     observer = Observer()
     observer.schedule(handler, str(root), recursive=True)
     observer.start()
     logger.info("Watching %s for changes (Ctrl-C to stop)...", root)
 
     try:
-        while True:
+        while stop_event is None or not stop_event.is_set():
             time.sleep(debounce)
             with lock:
                 batch = set(pending)
diff --git a/example.env b/example.env
index 8f829b1..2accbfa 100644
--- a/example.env
+++ b/example.env
@@ -24,6 +24,26 @@ CODERAG_WATCHED_DIR=/path/to/your/codebase
 # --- Retrieval ---
 # CODERAG_TOP_K=8
 
+# --- Indexing throughput ---
+# Number of worker threads for chunking + embedding during indexing. >1 parallelizes
+# the embed step (a big win for the OpenAI/remote providers; for the local fastembed
+# default ONNX already uses multiple cores per call, so the extra lever there is the
+# batch size below). Set to 1 to force fully serial indexing.
+# CODERAG_WORKERS=4
+# CODERAG_EMBED_BATCH=64
+
+# --- MCP server surface (`coderag mcp`, install: pip install 'coderag[mcp]') ---
+# Lets AI coding agents (Claude Code, Codex, Cursor) query this workspace instead of
+# grepping. By default it indexes the watched dir on startup (in the background) and
+# keeps it live via the watcher.
+# CODERAG_MCP_AUTO_INDEX=true
+# CODERAG_MCP_WATCH=true
+# Lines of a chunk returned in a search_code snippet by default (full text on request).
+# CODERAG_MCP_SNIPPET_LINES=12
+# Index any UTF-8 text file, not just code (docs/notes/config, extensionless files) so a
+# plain file directory becomes searchable. Binary files are always skipped.
+# CODERAG_INDEX_ALL_TEXT=false
+
 # --- Optional: AI model platforms (only needed for `--provider openai` or LLM answers) ---
 # CodeRAG runs fully local by default. Configure one of the platforms below only if you
 # want OpenAI embeddings, an LLM-generated answer (`coderag search ... --answer`), or a
diff --git a/pyproject.toml b/pyproject.toml
index fc5268b..ccad06a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -39,6 +39,11 @@ ui = [
     "jinja2>=3.1.6,<4",
     "pygments>=2.20.0,<3",
 ]
+# MCP server surface: lets AI coding agents (Claude Code, Codex, Cursor) use CodeRAG's
+# index as their retrieval tool instead of slow grep/glob/read loops.
+mcp = [
+    "mcp>=1.9.0,<2",
+]
 openai = [
     "openai>=2.41.1,<3",
 ]
@@ -50,6 +55,7 @@ all = [
     "uvicorn[standard]>=0.49.0,<1",
     "jinja2>=3.1.6,<4",
     "pygments>=2.20.0,<3",
+    "mcp>=1.9.0,<2",
     "openai>=2.41.1,<3",
     "anthropic>=0.109.2,<1",
 ]
diff --git a/scripts/bench_vs_grep.py b/scripts/bench_vs_grep.py
new file mode 100644
index 0000000..20c64c0
--- /dev/null
+++ b/scripts/bench_vs_grep.py
@@ -0,0 +1,276 @@
+#!/usr/bin/env python
+"""Benchmark CodeRAG's indexed search against a raw grep baseline.
+
+This makes the headline claim measurable: a warm, pre-indexed workspace answers a query
+faster (one in-process call vs many ripgrep invocations) and more accurately on conceptual
+/ natural-language queries than the agentic grep loop that tools like Claude Code and Codex
+fall back to. It reuses the eval harness so the accuracy numbers are directly comparable to
+``coderag eval``.
+
+    python scripts/bench_vs_grep.py \
+        --watched-dir . \
+        --dataset coderag/eval/datasets/coderag_self.jsonl
+
+What it reports, both for CodeRAG (hybrid retrieval) and for grep:
+  * accuracy   — recall@k / nDCG@k / MRR at the file level (via coderag.eval.evaluate)
+  * latency    — mean / p50 / p95 wall-clock per query
+  * context    — approximate tokens needed to surface the top-k context (CodeRAG returns
+                 compact chunks; the grep baseline must read whole matched files)
+
+The grep baseline models the agent's behaviour: extract salient terms from the query, run
+ripgrep for each, rank files by match frequency — i.e. the floor that semantic search is
+meant to beat. As the project's own strategy notes, grep wins on exact identifiers and
+persistent edit tasks; CodeRAG's edge is conceptual queries on larger repos plus BM25 for
+identifiers, with no code leaving the machine.
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import re
+import subprocess  # nosec B404 — benchmarking against the ripgrep CLI is the whole point
+import time
+from collections import Counter
+from pathlib import Path
+from statistics import mean
+from typing import Callable, List, Sequence
+
+from coderag.api import CodeRAG
+from coderag.config import Config
+from coderag.eval import EvalCase, evaluate, load_dataset
+from coderag.eval.harness import EvalResult, format_table
+from coderag.types import SearchHit
+
+# Tiny stopword set so grep searches for content terms, not glue words.
+_STOP = {
+    "the",
+    "and",
+    "for",
+    "with",
+    "that",
+    "this",
+    "from",
+    "into",
+    "are",
+    "was",
+    "use",
+    "add",
+    "fix",
+    "when",
+    "where",
+    "what",
+    "how",
+    "does",
+    "should",
+    "make",
+    "now",
+}
+_TOKEN = re.compile(r"[A-Za-z_][A-Za-z0-9_]{2,}")
+
+
+def _query_terms(query: str, limit: int = 8) -> List[str]:
+    """Salient search terms from a natural-language query (what an agent would grep for)."""
+    seen: List[str] = []
+    for tok in _TOKEN.findall(query):
+        low = tok.lower()
+        if low in _STOP or low in {t.lower() for t in seen}:
+            continue
+        seen.append(tok)
+        if len(seen) >= limit:
+            break
+    return seen
+
+
+def make_grep_search(root: Path) -> Callable[[str, int], List[SearchHit]]:
+    """A grep-backed retriever with the harness's ``(query, k) -> hits`` signature."""
+
+    def search(query: str, k: int) -> List[SearchHit]:
+        terms = _query_terms(query)
+        if not terms:
+            return []
+        counts: Counter = Counter()
+        for term in terms:
+            try:
+                proc = subprocess.run(  # nosec B603,B607 — fixed argv, no shell
+                    [
+                        "rg",
+                        "--count-matches",
+                        "--no-messages",
+                        "-i",
+                        "-e",
+                        term,
+                        str(root),
+                    ],
+                    capture_output=True,
+                    text=True,
+                    timeout=30,
+                )
+            except (FileNotFoundError, subprocess.TimeoutExpired):
+                continue
+            for line in proc.stdout.splitlines():
+                path, _, num = line.rpartition(":")
+                if not path:
+                    continue
+                try:
+                    counts[path] += int(num)
+                except ValueError:
+                    # Not a "path:count" line (e.g. a path containing ':', or rg's
+                    # summary output) — skip it rather than fail the whole query.
+                    continue
+        hits: List[SearchHit] = []
+        for abs_path, score in counts.most_common(k):
+            rel = os.path.relpath(abs_path, root)
+            hits.append(
+                SearchHit(
+                    chunk_id=0,
+                    path=Path(rel).as_posix(),
+                    symbol=None,
+                    kind="window",
+                    language="",
+                    start_line=1,
+                    end_line=1,
+                    text="",
+                    score=float(score),
+                    similarity=0.0,
+                )
+            )
+        return hits
+
+    return search
+
+
+def _timed(
+    fn: Callable[[str, int], List[SearchHit]], sink: List[float]
+) -> Callable[[str, int], List[SearchHit]]:
+    def wrapped(query: str, k: int) -> List[SearchHit]:
+        start = time.perf_counter()
+        try:
+            return fn(query, k)
+        finally:
+            sink.append(time.perf_counter() - start)
+
+    return wrapped
+
+
+def _percentile(values: Sequence[float], pct: float) -> float:
+    if not values:
+        return 0.0
+    ordered = sorted(values)
+    idx = min(len(ordered) - 1, int(round((pct / 100.0) * (len(ordered) - 1))))
+    return ordered[idx]
+
+
+def _fmt_ms(values: Sequence[float]) -> str:
+    if not values:
+        return "n/a"
+    return (
+        f"mean {mean(values) * 1000:7.1f}  "
+        f"p50 {_percentile(values, 50) * 1000:7.1f}  "
+        f"p95 {_percentile(values, 95) * 1000:7.1f}  (ms)"
+    )
+
+
+def _file_chars(path: Path) -> int:
+    try:
+        return len(path.read_text(encoding="utf-8", errors="replace"))
+    except OSError:
+        return 0
+
+
+def _context_tokens(
+    cr: CodeRAG,
+    grep_search: Callable[[str, int], List[SearchHit]],
+    cases: Sequence[EvalCase],
+    root: Path,
+    top_k: int,
+) -> tuple[int, int]:
+    """Approximate tokens (~chars/4) to surface top-k context for each retriever.
+
+    CodeRAG returns the matched chunks; the grep baseline must read whole matched files —
+    which is the token-cost argument in favour of indexed retrieval.
+    """
+    cr_tokens = grep_tokens = 0
+    for case in cases:
+        cr_tokens += sum(len(h.text) for h in cr.search(case.query, top_k)) // 4
+        grep_tokens += (
+            sum(_file_chars(root / h.path) for h in grep_search(case.query, top_k)) // 4
+        )
+    return cr_tokens, grep_tokens
+
+
+def main() -> int:
+    ap = argparse.ArgumentParser(description=__doc__)
+    ap.add_argument("--watched-dir", default=".", help="Repo/directory to search.")
+    ap.add_argument(
+        "--dataset",
+        default="coderag/eval/datasets/coderag_self.jsonl",
+        help="JSONL query -> relevant_files dataset (file level).",
+    )
+    ap.add_argument(
+        "--store-dir", default=None, help="Index location (default ./.coderag)."
+    )
+    ap.add_argument("--model", default="BAAI/bge-small-en-v1.5")
+    ap.add_argument("--ks", default="1,5,10")
+    ap.add_argument(
+        "--top-k", type=int, default=10, help="k for latency/token sampling."
+    )
+    ap.add_argument("--no-index", action="store_true", help="Reuse the existing index.")
+    args = ap.parse_args()
+
+    root = Path(args.watched_dir).expanduser().resolve()
+    ks = tuple(int(k) for k in args.ks.split(","))
+    cases = load_dataset(args.dataset)
+    if not cases:
+        raise SystemExit(f"No eval cases in {args.dataset}.")
+
+    cfg = Config.from_env(
+        provider="fastembed",
+        model=args.model,
+        watched_dir=root,
+        store_dir=Path(args.store_dir).expanduser()
+        if args.store_dir
+        else root / ".coderag",
+    )
+    cr = CodeRAG(cfg)
+    if not args.no_index:
+        stats = cr.index()
+        print(f"Indexed {stats.total_files} files / {stats.total_chunks} chunks.\n")
+
+    grep_search = make_grep_search(root)
+
+    cr_times: List[float] = []
+    grep_times: List[float] = []
+    results: List[EvalResult] = [
+        evaluate(
+            _timed(cr.search, cr_times),
+            cases,
+            label="coderag (hybrid)",
+            ks=ks,
+            level="file",
+        ),
+        evaluate(
+            _timed(grep_search, grep_times), cases, label="grep", ks=ks, level="file"
+        ),
+    ]
+
+    cr_tok, grep_tok = _context_tokens(cr, grep_search, cases, root, args.top_k)
+
+    print(f"Accuracy ({len(cases)} cases, file level)\n")
+    print(format_table(results))
+    print("\nLatency per query")
+    print(f"  coderag (1 warm call)   : {_fmt_ms(cr_times)}")
+    print(
+        f"  grep ({len(_query_terms(cases[0].query)) or 'n'} rg calls/query) : {_fmt_ms(grep_times)}"
+    )
+    print(f"\nApprox context tokens for top-{args.top_k} (sum over cases)")
+    print(f"  coderag (compact chunks): {cr_tok:>9,}")
+    print(f"  grep (read whole files) : {grep_tok:>9,}")
+    if cr_tok:
+        print(f"  -> grep needs ~{grep_tok / max(cr_tok, 1):.1f}x the context tokens")
+    cr.close()
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/test_mcp.py b/tests/test_mcp.py
new file mode 100644
index 0000000..daace4a
--- /dev/null
+++ b/tests/test_mcp.py
@@ -0,0 +1,264 @@
+"""Tests for the MCP server surface (all offline, with the fake provider).
+
+Drives the FastMCP tools in-memory via ``call_tool`` (no subprocess), mirroring the
+HTTP-surface tests. Also covers the two things the MCP server newly stresses: parallel
+indexing correctness and search staying safe while the index is being written.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import json
+import re
+import threading
+
+import pytest
+
+pytest.importorskip("mcp")  # skip the whole module if the [mcp] extra isn't installed
+
+from coderag.api import CodeRAG  # noqa: E402
+from coderag.config import Config  # noqa: E402
+from coderag.surfaces.mcp_server import _State, _warm_up, build_mcp  # noqa: E402
+from tests.conftest import write  # noqa: E402
+
+DEMO = {
+    "auth.py": (
+        "def authenticate(token):\n"
+        "    '''verify a token with retry/backoff'''\n"
+        "    return token == 'ok'\n"
+    ),
+    "math.ts": "export function add(a: number, b: number) {\n  return a + b;\n}\n",
+}
+
+
+def _make(tmp_path, files, **cfg):
+    """Build an indexed CodeRAG + MCP server over ``files`` with the fake provider."""
+    repo = tmp_path / "repo"
+    store = tmp_path / "store"
+    for name, body in files.items():
+        write(repo / name, body)
+    cr = CodeRAG(Config(provider="fake", watched_dir=repo, store_dir=store, **cfg))
+    cr.index()
+    state = _State()
+    return cr, build_mcp(cr, state=state), state, repo
+
+
+def _call(mcp, name, args):
+    """Invoke a tool and parse its JSON text content into a dict."""
+    res = asyncio.run(mcp.call_tool(name, args))
+    content = res[0] if isinstance(res, tuple) else res
+    return json.loads(content[0].text)
+
+
+# --- tool surface ---
+
+
+def test_tools_are_registered(tmp_path):
+    cr, mcp, _, _ = _make(tmp_path, DEMO)
+    names = {t.name for t in asyncio.run(mcp.list_tools())}
+    assert names == {"search_code", "get_file", "index_status", "reindex"}
+    cr.close()
+
+
+def test_search_code_returns_compact_locations(tmp_path):
+    cr, mcp, _, _ = _make(tmp_path, DEMO)
+    r = _call(mcp, "search_code", {"query": "authenticate token", "top_k": 5})
+    assert r["count"] >= 1
+    assert r["indexing"] == "ready"
+    hit = r["results"][0]
+    # Compact shape: path:start-end location, a snippet, and no heavy full-text field.
+    assert re.match(r".+:\d+-\d+$", hit["location"])
+    assert "snippet" in hit and "text" not in hit
+    assert {"symbol", "kind", "language", "score", "similarity"} <= hit.keys()
+    cr.close()
+
+
+def test_snippet_truncated_unless_full_text(tmp_path):
+    body = (
+        "def big_function():\n"
+        + "".join(f"    step_{i} = {i}\n" for i in range(40))
+        + "    return 'x39 done'\n"
+    )
+    cr, mcp, _, _ = _make(tmp_path, {"big.py": body})
+    q = {"query": "big_function step_39 x39", "top_k": 5}
+    hit = next(
+        h for h in _call(mcp, "search_code", q)["results"] if "big.py" in h["location"]
+    )
+    assert hit["truncated"] is True and "…" in hit["snippet"]
+
+    full = next(
+        h
+        for h in _call(mcp, "search_code", {**q, "full_text": True})["results"]
+        if "big.py" in h["location"]
+    )
+    assert full["truncated"] is False and "step_39" in full["snippet"]
+    cr.close()
+
+
+def test_search_code_filters(tmp_path):
+    cr, mcp, _, _ = _make(tmp_path, DEMO)
+
+    r = _call(
+        mcp, "search_code", {"query": "add", "top_k": 10, "language": "typescript"}
+    )
+    assert r["results"] and all(h["language"] == "typescript" for h in r["results"])
+
+    r = _call(
+        mcp, "search_code", {"query": "function", "top_k": 10, "path_prefix": "math"}
+    )
+    assert all(h["location"].startswith("math.ts") for h in r["results"])
+
+    r = _call(
+        mcp, "search_code", {"query": "anything", "top_k": 10, "language": "rust"}
+    )
+    assert r["results"] == []  # no rust files indexed
+    cr.close()
+
+
+def test_get_file_range_and_structured_errors(tmp_path):
+    cr, mcp, _, _ = _make(tmp_path, DEMO)
+
+    r = _call(mcp, "get_file", {"path": "auth.py", "start_line": 1, "end_line": 1})
+    assert r["content"] == "def authenticate(token):"
+
+    # Errors are returned as content, not raised — so the agent gets a usable message.
+    assert "error" in _call(mcp, "get_file", {"path": "../../etc/passwd"})
+    assert "error" in _call(mcp, "get_file", {"path": "not_indexed.py"})
+    cr.close()
+
+
+def test_index_status_reports_totals_and_flag(tmp_path):
+    cr, mcp, state, _ = _make(tmp_path, DEMO)
+    r = _call(mcp, "index_status", {})
+    assert r["total_files"] == 2
+    assert r["total_chunks"] == cr.vectors.ntotal
+    assert r["indexing"] == "ready"
+
+    state.indexing = True
+    assert _call(mcp, "index_status", {})["indexing"] == "in_progress"
+    cr.close()
+
+
+def test_reindex_picks_up_new_file_and_guards_concurrency(tmp_path):
+    cr, mcp, state, repo = _make(tmp_path, DEMO)
+    write(repo / "extra.py", "def extra():\n    return 1\n")
+    r = _call(mcp, "reindex", {})
+    assert r["total_files"] == 3
+    assert cr.store.total_chunks() == cr.vectors.ntotal
+
+    state.indexing = True  # a run already in progress -> guarded
+    assert "error" in _call(mcp, "reindex", {})
+    cr.close()
+
+
+def test_warm_up_is_safe(tmp_path):
+    cr, _, _, _ = _make(tmp_path, DEMO)
+    _warm_up(cr)  # must not raise with any provider
+    cr.close()
+
+
+# --- all-text (general file-directory) indexing ---
+
+
+def test_all_text_indexes_text_and_skips_binary(tmp_path):
+    files = {
+        "notes.log": "deployment runbook: restart the scheduler service\n",
+        "Dockerfile": "FROM python:3.11\nRUN pip install coderag\n",
+        "data.bin": "head\x00\x01\x02tail binary blob\n",  # NUL byte -> binary
+    }
+
+    # Default (code-oriented): unknown .log is skipped; Dockerfile is a known text name.
+    cr, _, _, _ = _make(tmp_path / "a", files)
+    paths = set(cr.store.all_file_paths())
+    assert "notes.log" not in paths
+    assert "Dockerfile" in paths
+    cr.close()
+
+    # all_text: arbitrary text becomes searchable; binary is still rejected.
+    cr, _, _, _ = _make(tmp_path / "b", files, index_all_text=True)
+    paths = set(cr.store.all_file_paths())
+    assert "notes.log" in paths
+    assert "data.bin" not in paths
+    cr.close()
+
+
+# --- parallel indexing correctness & concurrency safety ---
+
+
+def test_parallel_indexing_matches_serial(tmp_path):
+    files = {
+        f"m{i}.py": (
+            f"def f{i}(x):\n    return x + {i}\n\n"
+            f"class C{i}:\n    def m(self):\n        return {i}\n"
+        )
+        for i in range(8)
+    }
+
+    def build(workers, sub):
+        cr = CodeRAG(
+            Config(
+                provider="fake",
+                watched_dir=tmp_path / "repo",
+                store_dir=tmp_path / sub,
+                index_workers=workers,
+            )
+        )
+        # write the same repo once (shared watched_dir)
+        for name, body in files.items():
+            write(tmp_path / "repo" / name, body)
+        stats = cr.index()
+        out = (
+            stats.total_chunks,
+            cr.store.total_chunks(),
+            cr.vectors.ntotal,
+            sorted(cr.store.all_file_paths()),
+        )
+        cr.close()
+        return out
+
+    serial = build(1, "store_serial")
+    parallel = build(4, "store_parallel")
+    assert serial[1] == parallel[1] > 0  # identical chunk count
+    assert serial[1] == serial[2] and parallel[1] == parallel[2]  # store == FAISS
+    assert serial[3] == parallel[3]  # identical file set
+
+
+def test_search_is_safe_during_concurrent_indexing(tmp_path):
+    repo = tmp_path / "repo"
+    for i in range(25):
+        write(repo / f"f{i}.py", "def g():\n    return 'token retry backoff'\n")
+    cr = CodeRAG(
+        Config(provider="fake", watched_dir=repo, store_dir=tmp_path / "store")
+    )
+    cr.index()
+
+    errors: list = []
+    stop = threading.Event()
+
+    def hammer_search():
+        try:
+            while not stop.is_set():
+                cr.search("token retry backoff", top_k=5)
+        except Exception as exc:  # pragma: no cover - failure path
+            errors.append(exc)
+
+    t = threading.Thread(target=hammer_search)
+    t.start()
+    try:
+        # Re-index (FAISS add/remove) while searches (FAISS reads) run concurrently.
+        for _ in range(3):
+            for i in range(25, 45):
+                write(repo / f"f{i}.py", "def g():\n    return 'more tokens here'\n")
+            cr.index()
+            for i in range(25, 45):
+                (repo / f"f{i}.py").unlink()
+            cr.index()
+    finally:
+        stop.set()
+        t.join(timeout=5)
+
+    assert not errors, errors
+    assert (
+        cr.store.total_chunks() == cr.vectors.ntotal
+    )  # invariant holds after the race
+    cr.close()