Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
enable-cache: true

- name: Install
run: uv pip install --system -e ".[dev,server,openai]"
run: uv pip install --system -e ".[dev,server,ui,openai]"

- name: Lint (ruff)
run: ruff check .
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docker-beta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: Docker (beta)

# Build the container images on every PR (validation only, no push) and publish
# the beta channel to GHCR on every push to master. Two images are built from one
# Dockerfile: the HTTP API (:beta) and the Streamlit UI (:beta-ui).
# Dockerfile: the HTTP API (:beta) and the web UI (:beta-ui).
on:
push:
branches: [master]
Expand Down Expand Up @@ -53,7 +53,7 @@ jobs:
include:
- target: server # HTTP/REST API
suffix: ""
- target: ui # Streamlit UI
- target: ui # Web UI
suffix: "-ui"
steps:
- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
Expand Down
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,6 @@ node_modules/
# Ignore Git directory
.git/

# Ignore Streamlit temporary files
.streamlit/

# Ignore logs and temporary files
*.log
*.tmp
Expand Down
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
- `coderag/store/`: `sqlite_store.py` (source of truth + FTS5) and `vector_index.py` (FAISS Flat/IVF cache).
- `coderag/retrieval/`: Hybrid dense + BM25 search fused with RRF.
- `coderag/indexer.py`, `coderag/watch.py`: Incremental indexing and the debounced watcher.
- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `streamlit_app.py` — thin adapters over the facade.
- `coderag/surfaces/`: `cli.py`, `http_api.py` (FastAPI), `webui.py` — thin adapters over the facade.
- `tests/`: pytest suite (offline by default via the `fake` provider; real model behind `-m integration`).
- `example.env` → copy to `.env`; CI lives in `.github/`.

Expand Down
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ coderag/
│ ├── sqlite_store.py # files/chunks/vectors + FTS5 lexical search
│ └── vector_index.py # FaissVectorIndex: Flat (exact) / IVF (scale)
├── retrieval/ # Hybrid search: dense + BM25, fused with RRF
└── surfaces/ # cli.py · http_api.py (FastAPI) · streamlit_app.py
└── surfaces/ # cli.py · http_api.py (FastAPI) · webui.py
```

### Design invariants (don't break these)
Expand Down
14 changes: 6 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# CodeRAG container images — two build targets share one base:
# docker build --target server -t coderag . # HTTP/REST API (port 8000)
# docker build --target ui -t coderag-ui . # Streamlit UI (port 8501)
# docker build --target ui -t coderag-ui . # Web UI (port 8501)
# Published to GHCR as :beta / :beta-ui by .github/workflows/docker-beta.yml.

ARG PYTHON_VERSION=3.12
Expand Down Expand Up @@ -47,18 +47,16 @@ HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
ENTRYPOINT ["coderag"]
CMD ["serve", "--host", "0.0.0.0", "--port", "8000"]

# ---------- Streamlit UI image ----------
# ---------- Web UI image ----------
FROM base AS ui
# Include the LLM answer backends (openai covers self-hosted OpenAI-compatible
# servers like Ollama/vLLM too) so the UI's "Generate LLM answer" works.
RUN uv pip install --system --no-cache ".[ui,openai,anthropic]"
# `coderag ui` shells out to `streamlit run`; configure the server via env vars.
ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0 \
STREAMLIT_SERVER_PORT=8501 \
STREAMLIT_SERVER_HEADLESS=true \
STREAMLIT_BROWSER_GATHER_USAGE_STATS=false
# `coderag ui` serves the FastAPI/Jinja UI via uvicorn; host/port come from env.
ENV CODERAG_UI_HOST=0.0.0.0 \
CODERAG_UI_PORT=8501
USER coderag
EXPOSE 8501
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
CMD ["python", "-c", "import sys,urllib.request as u; sys.exit(0 if u.urlopen('http://127.0.0.1:8501/_stcore/health').status==200 else 1)"]
CMD ["python", "-c", "import sys,urllib.request as u; sys.exit(0 if u.urlopen('http://127.0.0.1:8501/healthz').status==200 else 1)"]
CMD ["coderag", "ui"]
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,15 @@ codebases**. Use it from the **CLI**, embed it as a **Python library**, self-hos
- **Hybrid retrieval.** Dense vector search **+** BM25 keyword search, fused with Reciprocal Rank Fusion. Great at both "what does this *mean*" and exact-identifier lookups.
- **Incremental & live.** Content-hashed indexing only re-embeds files that changed; a debounced watcher keeps the index current as you code. No duplicate or stale vectors.
- **Built to scale.** Exact `Flat` search for small repos, automatic switch to approximate `IVF` past a threshold so it stays fast at 100k+ chunks.
- **Four surfaces, one engine.** CLI · Python library · HTTP/REST · Streamlit UI — all thin wrappers over the same `CodeRAG` object.
- **Four surfaces, one engine.** CLI · Python library · HTTP/REST · web UI — all thin wrappers over the same `CodeRAG` object.

## 🚀 Quick start

```bash
pip install -e . # core engine (local embeddings included)
# optional extras:
pip install -e ".[server]" # HTTP/REST API
pip install -e ".[ui]" # Streamlit web UI
pip install -e ".[ui]" # built-in web UI (FastAPI + Jinja + Pygments)
pip install -e ".[openai]" # OpenAI (or self-hosted OpenAI-compatible) embeddings / answers
pip install -e ".[anthropic]" # Anthropic (Claude) LLM answers
pip install -e ".[all]" # everything above
Expand Down Expand Up @@ -107,9 +107,13 @@ Self-host it once and point any number of custom apps or teammates at a big shar

### Web UI (`coderag ui`)

Streamlit app: search box, retrieved chunks with `path:line` citations and similarity
scores, a one-click **Reindex** button, and an optional streamed LLM answer (when an
OpenAI/Anthropic key or a self-hosted endpoint is configured).
A built-in, server-rendered web UI (FastAPI + Jinja, syntax highlighting via Pygments):
a search box with language/kind/path filters, results with `path:line` citations and
similarity scores, an in-browser **file viewer** (cited lines highlighted), a **file
browser**, index status, a one-click **Reindex**, and an optional streamed LLM answer
(when an OpenAI/Anthropic key or a self-hosted endpoint is configured). It is progressively
enhanced — every page works with JavaScript disabled, and there's no CDN/runtime network
dependency, so it stays local-first.

## 🐳 Docker (beta)

Expand All @@ -128,7 +132,7 @@ curl "localhost:8000/search?q=where%20is%20retry%20handled&k=5"
```

```bash
# Streamlit UI on :8501
# Web UI on :8501
docker run --rm -p 8501:8501 \
-v "$PWD:/workspace:ro" -v coderag-index:/data \
ghcr.io/neverdecel/coderag:beta-ui
Expand Down Expand Up @@ -237,7 +241,7 @@ Apache License 2.0 — see [LICENSE](LICENSE-2.0.txt).

[FAISS](https://github.com/facebookresearch/faiss) · [fastembed](https://github.com/qdrant/fastembed) ·
[tree-sitter](https://tree-sitter.github.io/tree-sitter/) · [FastAPI](https://fastapi.tiangolo.com/) ·
[Streamlit](https://streamlit.io/) · [watchdog](https://github.com/gorakhargosh/watchdog)
[Jinja](https://jinja.palletsprojects.com/) · [Pygments](https://pygments.org/) · [watchdog](https://github.com/gorakhargosh/watchdog)

---

Expand Down
16 changes: 16 additions & 0 deletions coderag/store/sqlite_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,22 @@ def all_file_paths(self) -> List[str]:
rows = self._conn.execute("SELECT path FROM files").fetchall()
return [r["path"] for r in rows]

def distinct_languages(self) -> List[str]:
"""Languages present in the index, sorted — used to populate UI filters."""
with self._lock:
rows = self._conn.execute(
"SELECT DISTINCT language FROM chunks ORDER BY language"
).fetchall()
return [r["language"] for r in rows]

def distinct_kinds(self) -> List[str]:
"""Chunk kinds present in the index, sorted — used to populate UI filters."""
with self._lock:
rows = self._conn.execute(
"SELECT DISTINCT kind FROM chunks ORDER BY kind"
).fetchall()
return [r["kind"] for r in rows]

def upsert_file(
self, path: str, language: str, content_hash: str, mtime: float
) -> int:
Expand Down
2 changes: 1 addition & 1 deletion coderag/surfaces/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
"""User-facing surfaces: CLI, HTTP server, and Streamlit UI — all thin over the facade."""
"""User-facing surfaces: CLI, HTTP server, and web UI — all thin over the facade."""
37 changes: 21 additions & 16 deletions coderag/surfaces/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import argparse
import json
import logging
import os
import sys
import textwrap
from pathlib import Path
Expand Down Expand Up @@ -113,25 +114,21 @@ def cmd_serve(args: argparse.Namespace) -> int:


def cmd_ui(args: argparse.Namespace) -> int:
import subprocess

app = Path(__file__).with_name("streamlit_app.py")
try:
return subprocess.call(
["streamlit", "run", str(app), "--", *_passthrough(args)]
)
except FileNotFoundError:
print("Streamlit is not installed. Install with: pip install 'coderag[ui]'")
from coderag.surfaces.webui import run_ui
except ImportError:
print("The web UI needs extra deps. Install with: pip install 'coderag[ui]'")
return 1
cr = CodeRAG(_build_config(args))
host = args.host or os.getenv("CODERAG_UI_HOST") or "127.0.0.1"
port = args.port if args.port is not None else _env_port("CODERAG_UI_PORT", 8501)
run_ui(cr, host=host, port=port)
return 0


def _passthrough(args: argparse.Namespace) -> List[str]:
out: List[str] = []
if getattr(args, "watched_dir", None):
out += ["--watched-dir", str(args.watched_dir)]
if getattr(args, "store_dir", None):
out += ["--store-dir", str(args.store_dir)]
return out
def _env_port(key: str, default: int) -> int:
raw = os.getenv(key)
return int(raw) if raw and raw.isdigit() else default


# --- parser ---
Expand Down Expand Up @@ -197,7 +194,15 @@ def build_parser() -> argparse.ArgumentParser:
_add_common(p_serve)
p_serve.set_defaults(func=cmd_serve)

p_ui = sub.add_parser("ui", help="Launch the Streamlit web UI.")
p_ui = sub.add_parser("ui", help="Launch the built-in web UI.")
p_ui.add_argument(
"--host",
default=None,
help="Bind address (default 127.0.0.1 / CODERAG_UI_HOST).",
)
p_ui.add_argument(
"--port", type=int, default=None, help="Port (default 8501 / CODERAG_UI_PORT)."
)
_add_common(p_ui)
p_ui.set_defaults(func=cmd_ui)

Expand Down
Loading
Loading