Skip to content

Latest commit

 

History

History
232 lines (176 loc) · 10.9 KB

File metadata and controls

232 lines (176 loc) · 10.9 KB

Configuration & local models

CodeRAG runs fully locally with no API key out of the box. Everything below is optional — reach for it only when you want to change a default, scale up, or turn on the optional LLM answer. Settings come from CODERAG_* environment variables (or a .env file — copy example.env and edit). A few CLI flags (--watched-dir, --store-dir, --provider, --model) override the matching variable for a single command.

The two model roles (read this first)

CodeRAG uses models in two independent places. Keeping them straight removes most of the confusion about "OpenAI or Anthropic":

Role What it does Default Required?
Embedding model Turns code + your query into vectors — this is the search. Local ONNX bge-small via fastembed. Always used. Local, no key.
Answer LLM Optional. Writes a grounded, cited prose answer from the retrieved chunks (coderag search --answer, the UI's answer box). Off. Only if you want generated answers.

Search works with only the embedding model. The answer LLM is a separate, optional layer — and it, too, can be a local model. You never need a cloud account to use CodeRAG.

Recipes — pick the one that matches you

1. Fully local, zero config (default)

Local embeddings, no API key, no answer LLM. Just index and search:

pip install -e .
coderag index --watched-dir /path/to/repo
coderag search "where is retry/backoff handled" --watched-dir /path/to/repo

Nothing to configure. Want a stronger (still local, still keyless) embedding model? See Local embedding models below.

2. Fully local with generated answers (Ollama / LM Studio / vLLM)

Add a --answer that's written by a local LLM. Any OpenAI-compatible server works — you point CodeRAG at it with OPENAI_BASE_URL and name the model with CODERAG_CHAT_MODEL. The answer backend stays openai (it means "the OpenAI protocol", not the OpenAI company); no API key is needed for a local server.

# 1. Run a local model server, e.g. Ollama:
ollama serve
ollama pull llama3.1

# 2. Point CodeRAG at it (in your shell or .env):
export OPENAI_BASE_URL=http://localhost:11434/v1   # Ollama's OpenAI-compatible endpoint
export CODERAG_CHAT_MODEL=llama3.1                  # the model name your server serves

# 3. Search with a locally-generated answer:
coderag search "how is the vector index persisted" --answer

Other local servers expose the same OpenAI-compatible API — only the base URL and model name change:

Server Typical OPENAI_BASE_URL Notes
Ollama http://localhost:11434/v1 ollama pull <model>; key ignored.
LM Studio http://localhost:1234/v1 Start the local server from the app.
vLLM http://localhost:8000/v1 vllm serve <model>; OpenAI-compatible.
LocalAI http://localhost:8080/v1 Drop-in OpenAI replacement.

A local server usually ignores the API key, so you can leave OPENAI_API_KEY unset — CodeRAG sends a harmless placeholder. Set it only if your server enforces one.

3. OpenAI cloud (embeddings and/or answers)

pip install -e ".[openai]"
export OPENAI_API_KEY=sk-...
# Cloud embeddings (optional — the local default is fine for most repos):
coderag index --provider openai            # uses CODERAG_OPENAI_MODEL (text-embedding-3-small)
# Cloud answer:
coderag search "where is auth handled" --answer    # uses CODERAG_CHAT_MODEL (gpt-4o-mini)

4. Anthropic (Claude) answers

The answer LLM can be Claude instead of an OpenAI-compatible model. (Embeddings still come from the local default or OpenAI — Anthropic is answer-only here.)

pip install -e ".[anthropic]"
export CODERAG_LLM_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
coderag search "where is auth handled" --answer    # uses CODERAG_ANTHROPIC_MODEL

5. Local embeddings via an OpenAI-compatible server (advanced)

Beyond the built-in fastembed models, you can serve embeddings from a local OpenAI-compatible endpoint (e.g. Ollama embedding models, text-embeddings-inference). Use the openai provider pointed at your local URL:

export OPENAI_BASE_URL=http://localhost:11434/v1
export CODERAG_OPENAI_MODEL=nomic-embed-text
coderag index --provider openai

Changing the embedding model (its dimension) triggers a one-time index rebuild — that's expected and safe (the LanceDB store is rebuildable — re-indexing recreates it from source).

Local embedding models

The default BAAI/bge-small-en-v1.5 is the smallest/fastest; larger local models score better on code retrieval. List the curated, keyless options:

coderag eval --list-models

Then select one (downloaded once, cached under CODERAG_CACHE_DIR):

export CODERAG_MODEL=jinaai/jina-embeddings-v2-base-code   # best out-of-the-box local code retriever
coderag index --full

Want sharper top results without a cloud call? Turn on the optional local cross-encoder reranker (no key, ~30 ms/query): export CODERAG_RERANK=1.

Full settings reference

Grouped by area. Default is what you get with the variable unset; everything here is optional.

Embeddings (the search model)

Variable Default Meaning
CODERAG_PROVIDER fastembed Embedding backend: fastembed (local) · openai (OpenAI API or any OpenAI-compatible/local server) · fake (deterministic, tests).
CODERAG_MODEL BAAI/bge-small-en-v1.5 Local fastembed model id (coderag eval --list-models).
CODERAG_OPENAI_MODEL text-embedding-3-small Embedding model when provider=openai (cloud or local server).
CODERAG_CACHE_DIR ~/.cache/coderag Where downloaded local models are cached.
CODERAG_EMBED_BATCH 64 Texts embedded per batch during indexing.

Answers (the optional LLM)

Variable Default Meaning
CODERAG_LLM_PROVIDER openai Answer backend: openai (OpenAI API or any OpenAI-compatible/local server) · anthropic.
CODERAG_CHAT_MODEL gpt-4o-mini Chat model for the openai backend — set to your local model name (e.g. llama3.1) when using OPENAI_BASE_URL.
OPENAI_API_KEY OpenAI key. Needed for the OpenAI cloud (embeddings or answers); optional for a local server.
OPENAI_BASE_URL Point the OpenAI client at a self-hosted/local server (Ollama, vLLM, LM Studio, LocalAI). Enables local embeddings and local answers.
ANTHROPIC_API_KEY Anthropic (Claude) key — required when CODERAG_LLM_PROVIDER=anthropic.
CODERAG_ANTHROPIC_MODEL claude-opus-4-8 Claude model for the anthropic answer backend.
CODERAG_ANSWER_MAX_TOKENS 1024 Max tokens generated per answer.

Locations

Variable Default Meaning
CODERAG_WATCHED_DIR cwd Codebase to index/search.
CODERAG_STORE_DIR <watched-dir>/.coderag Where the LanceDB store lives. Derived from the watched dir (not the cwd) so the index is found no matter where you run coderag from.
CODERAG_INDEX_ALL_TEXT false Index any UTF-8 text file (docs/config/extensionless), not just code. Binary files are always skipped.

Retrieval & quality

Variable Default Meaning
CODERAG_TOP_K 8 Results returned.
CODERAG_FETCH_K 50 Candidates pulled from each retriever before fusion.
CODERAG_RRF_K 60 Reciprocal Rank Fusion constant.
CODERAG_DENSE_WEIGHT 1.0 Weight of dense (vector) results in fusion.
CODERAG_LEXICAL_WEIGHT 1.0 Weight of BM25 (keyword) results in fusion.
CODERAG_RERANK false Enable the optional local cross-encoder reranker (no key; ~30 ms/query).
CODERAG_RERANK_MODEL Xenova/ms-marco-MiniLM-L-12-v2 Local reranker model (coderag eval --list-models).
CODERAG_RERANK_CANDIDATES 50 Fused hits reranked before trimming to top_k.
CODERAG_ADAPTIVE_FUSION false Tilt fusion weights by query type (dense-up for NL, BM25-up for code).
CODERAG_NL_DENSE_WEIGHT / CODERAG_NL_LEXICAL_WEIGHT 1.0 / 0.4 Adaptive weights for natural-language queries.
CODERAG_CODE_DENSE_WEIGHT / CODERAG_CODE_LEXICAL_WEIGHT 1.0 / 1.0 Adaptive weights for identifier/code queries.
CODERAG_GRAPH_EXPANSION false Enrich the pool with 1-hop call-graph neighbors of the top hits — the definitions of what each hit calls (its callees). No key, no schema change. Small, consistent symbol-level lift across flask/requests/click.
CODERAG_GRAPH_SEEDS 5 Top fused hits to expand callees from.
CODERAG_GRAPH_NEIGHBORS 5 Max new callee definitions pulled per seed.
CODERAG_GRAPH_WEIGHT 0.15 Fusion weight of the neighbor list (down-weighted). 0.15 was a strict Pareto improvement across the eval repos; higher trades rank precision for marginal recall.

Scale & throughput

Variable Default Meaning
CODERAG_WORKERS 4 Worker threads for chunking + embedding (1 = serial; a big lever for remote/OpenAI embeddings).

HTTP API server (coderag serve)

Variable Default Meaning
CODERAG_API_KEY If set, the HTTP API requires it (Authorization: Bearer <key> or X-API-Key). Always set it when the server is reachable beyond localhost.
CODERAG_CORS_ORIGINS Comma-separated CORS allowlist (never *). Empty ⇒ no cross-origin browser access.

MCP server (coderag mcp)

Variable Default Meaning
CODERAG_MCP_AUTO_INDEX true Index the watched dir on startup (in the background).
CODERAG_MCP_WATCH true Keep the index live via the filesystem watcher.
CODERAG_MCP_SNIPPET_LINES 12 Lines of a chunk returned in a search_code snippet by default.

Demo mode (public, untrusted UI)

Variable Default Meaning
CODERAG_DEMO_MODE false Show a notice, hide Reindex, cap results, and rate-limit answers per browser session.
CODERAG_DEMO_MAX_ANSWERS 5 LLM answers allowed per browser session.
CODERAG_DEMO_COOLDOWN_SECONDS 20 Minimum seconds between answers in a session.

Setting configuration from Python

The same options are fields on the immutable Config object, so a library caller doesn't need environment variables:

from coderag import CodeRAG, Config

# Local embeddings + a local LLM for answers, no env vars:
cfg = Config.from_env(
    watched_dir="/path/to/repo",
    openai_base_url="http://localhost:11434/v1",
    chat_model="llama3.1",
)
cr = CodeRAG(cfg)
cr.index()

Config.from_env(**overrides) reads the environment/.env first, then applies the keyword overrides last — handy for tests and embedding CodeRAG in your own tools.