AI-powered codebase intelligence tool
Point it at any repo → understand the entire codebase in hours, not weeks
Features • Demo • Frontend • Local Setup • Cloud • MCP • API • Architecture • Contributing
Codewalk analyzes any codebase and gives you:
- Module detection — groups files into logical modules automatically
- Dependency graph — extracts every import/require → builds the full dependency map
- Blast radius — "if I change this file, what breaks?"
- Reading order — optimal file reading sequence (dependencies first)
- Execution flow — entry points, module-to-module and file-to-file dependency flow
- AI chat — ask anything about the code, powered by RAG + tool-calling agent
- Code review — review git diffs for bugs, security issues, and style (context-enriched, OWASP-focused)
- Incremental reindex — re-embed only changed files using content hash comparison
- Graph intelligence — DuckDB + igraph: symbol-level call graph, cycle detection, centrality analysis, import chain tracing
- Corrective RAG — distance-based chunk filtering + LLM answer grading + query rewriting for higher quality answers
- Voice interface — talk to your codebase hands-free: mic → transcribe → Copilot routes → speak answer
Three ways to use it locally, plus optional cloud indexing:
| Interface | Best for |
|---|---|
| Web UI (Next.js) | Visual exploration — Knowledge Graph UI, diagrams, module browser, blast radius viewer |
| MCP Server | VS Code Copilot, Claude Code, Cursor — AI agents use tools directly |
| REST API | Scripts, CI/CD, custom integrations |
Cloud (optional): Push to GitHub → Codewalk Cloud indexes on the server → MCP downloads the index and queries locally. See Cloud Deployment.
🎙️ Voice is available via both MCP (
codewalk_voice_ask+codewalk_speak) and REST API (POST /voice/ask) — ask questions by speaking, hear answers read aloud.
| Scenario | How Codewalk helps |
|---|---|
| New dev joins the team | Point Codewalk at the repo → get an overview, module map, and reading order. Self-onboard in hours instead of weeks of "hey, can you explain this?" |
| LLM token costs are high | Without RAG, the LLM needs your entire codebase in context — slow and expensive. Codewalk embeds code into a vector DB and retrieves only the relevant chunks per query. Faster answers, fraction of the tokens. |
| Senior dev switches modules | You know the auth module but now need to work on payments. Get module info, blast radius, and execution flow without bugging the payments team. |
| Before a refactor | Check blast radius before touching shared code. "If I change base_model.py, what breaks?" — get the answer before you break prod. |
| PR reviews | Run codewalk_run_review (MCP) or POST /review (API) — automated multi-stage review with OWASP security checks, blast radius warnings, and team guidelines matching. MCP mode returns enriched context so the calling model (Claude/GPT) performs the review directly — no separate LLM needed. |
| Documentation is outdated | Codewalk analyzes the actual code, not stale wiki pages. Always up to date. |
| Feature | Description |
|---|---|
| 🔍 Module Detection | Auto-groups files into packages/modules by directory structure |
| 🕸️ Dependency Graph | Parses imports across 15+ languages via tree-sitter |
| 💥 Blast Radius | BFS on reversed dependency graph → shows transitive impact of any change |
| 📖 Reading Order | Topological sort → "read config.py before embedder.py because embedder imports config" |
| 🔄 Execution Flow | Entry points, module/file dependency chains, Mermaid diagrams |
| 🤖 AI Chat | LangGraph agent with 7 tools, multi-turn conversation with memory |
| 🔎 Semantic Search | ChromaDB vector search on embedded code chunks (RAG) |
| 🔬 Code Review | Multi-stage review pipeline: test coverage, blast radius, guidelines RAG, context-enriched deep analysis |
| 🔄 Incremental Reindex | Content hash comparison — only re-embeds changed files, skips unchanged |
| 🧩 MCP Server | 38 MCP tools for VS Code Copilot / Claude Code / Cursor / Codex |
| 🎙️ Voice Interface | Talk to your codebase — mic recording, local STT (faster-whisper), agent-driven routing (MCP + API), TTS response |
| 🔬 Graph Intelligence | DuckDB persistent graph + igraph C-speed traversal: cycle detection, centrality, import chain tracing |
| 🧬 Corrective RAG | Distance-based chunk filtering (free) + LLM answer grading + query rewriting for reliable answers |
| 📦 Parent-Child Chunking | Full functions stored as parents, sub-chunks searched — retrieve complete context on match |
| ⚡ Parallel Embedding | Producer-consumer pipeline — CPU chunking overlaps with GPU embedding |
| 🏗️ Multi-Provider LLM | Ollama (local), OpenAI, Anthropic, Groq, Gemini, OpenRouter, DeepSeek |
| 📚 Doc Indexing | Index team docs (.md, .pdf, .txt) — search and ask questions with source citations |
| 🔄 Reflection | Actor→Critic→Improve loop used by deep research to refine cross-cutting reports |
| 🧑💻 Human-in-the-Loop | Approval gate before any code/file modification — LangGraph checkpoint + interrupt |
| 🔬 Deep Research | Fan-out parallel search → merge → synthesize → reflect for complex cross-cutting questions |
| 🏗️ Architecture Health | Graph stats, bottleneck files (betweenness centrality), PageRank, cycle detection with fix suggestions |
| 🌐 15+ Languages | Python, JS, TS, Java, Go, Rust, Ruby, PHP, C#, C++, C, Dart, Kotlin, Swift, YAML |
| Language | Extensions | Tree-sitter Parsing | Import Extraction |
|---|---|---|---|
| Python | .py |
✅ | ✅ |
| JavaScript | .js, .jsx |
✅ | ✅ |
| TypeScript | .ts, .tsx |
✅ | ✅ |
| Java | .java |
✅ | ✅ |
| Go | .go |
✅ | ✅ |
| Rust | .rs |
✅ | ✅ |
| Ruby | .rb |
✅ | ✅ |
| PHP | .php |
✅ | ✅ |
| C# | .cs |
✅ | ✅ |
| C++ | .cpp |
✅ | ✅ |
| C | .c |
✅ | ✅ |
| Kotlin | .kt |
✅ | ✅ |
| Swift | .swift |
✅ | ✅ |
| Dart | .dart |
✅ (optional install) | ✅ |
| YAML | .yaml, .yml |
— | — |
| JSON | .json |
— | — |
| TOML | .toml |
— | — |
| Markdown | .md |
— | — |
Tree-sitter parsing = extracts functions, classes, and methods for accurate chunking and function explanations.
Import extraction = builds the dependency graph, blast radius, and reading order.
Languages without tree-sitter support still get indexed via text splitting — they work with semantic search and AI chat, just without function-level granularity.
Codewalk is not another AI autocomplete. It is a codebase intelligence layer: it builds a persistent dependency graph, embeds your code, indexes your docs, and exposes that intelligence through a UI, an MCP server, and an API.
If you need deep cross-file reasoning, blast-radius analysis, or AI review inside your existing IDE agent, Codewalk fits where general-purpose assistants stop.
| Use case | Typical approach | What Codewalk does differently |
|---|---|---|
| Explain this codebase | Ask a generic chat model and paste files | Builds a live graph + RAG so answers are grounded and cite real files |
| PR review | Lint + human review | LLM review with blast-radius, architecture, and custom guidelines |
| Refactor shared code | Grep for imports | Dependency graph + blast radius showing transitive impact |
| Onboard a new developer | Read wiki pages | Reading order + module map generated from actual code |
| Team knowledge | Search Confluence/Notion | Index docs alongside code and ask with citations |
| AI agent tooling | Write custom scripts or prompts | 38 MCP tools the agent can call directly |
codewalk-demo-frontend.mp4
codewalk-demo-mcp.mp4
🎥 [Video coming soon]
codewalk_demo_voice.mp4
Codewalk ships with a Next.js frontend for visual codebase exploration.
- Structural view — explore the repo as a layered dependency graph: modules, files, classes, and functions laid out as an interactive path flow.
- Knowledge view — semantic graph of entities and relationships surfaced by the AI analysis.
- Path Finder — pick a source and target node and discover import/dependency paths between them.
- Search — fuzzy + semantic search across files, symbols, and concepts.
- Blast Radius / Diff mode — visually highlight changed and affected nodes.
- Themes — switch between presets (Dark Gold, Dark Ocean, Dark Forest, Dark Rose, Light Minimal), accent colors, and heading fonts; your choice is saved locally.
- Info Panel — unified node details, metrics, source preview, and project overview.
- Sidebar tab lock — index-dependent tabs stay locked until
GET /index-statusreportsindexed: true. - Cloud Admin — visit
/adminto register repos, list repos, trigger indexing, copy tokens, and check server health/version.
cd frontend
npm install
npm run devIf you change frontend code and see stale chunk 404s or client-side exceptions, restart with a clean build cache:
npm run dev:clean # clears .next and starts fresh
npm run restart # kills port 3000 and restarts dev
# or
./scripts/restart-frontend.shSet NEXT_PUBLIC_API_URL to point at the backend (e.g. http://localhost:8000 or https://api.codewalk.xyz).
Then open http://localhost:3000, analyze a repo, and click Knowledge Graph.
🎥 [Video coming soon — add frontend walkthrough here]
Production cloud server? See FULL_SETUP_GUIDE.md — step-by-step: Hetzner,
api.codewalk.xyz, GitHub App, webhooks, MCP download.
| Tool | Version | Check |
|---|---|---|
| Python | 3.10+ | python3 --version |
| Node.js | 18+ | node --version |
| Git | Any | git --version |
| Ollama (optional) | Latest | ollama --version |
git clone https://github.com/gupta29470/codewalk.git
cd codewalk# Create virtual environment
python3 -m venv .codewalk-env
source .codewalk-env/bin/activate # macOS / Linux
# .codewalk-env\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txt⚠️ VPN / Corporate Network / Private Network Issues
If you're behind a VPN, corporate proxy, or private network, package installations and model downloads may fail due to blocked connections or SSL certificate errors.
Recommended: Use a normal (non-VPN) network for first-time setup.
Codewalk's setup downloads packages from PyPI, npm, and HuggingFace. These are one-time downloads — once installed, everything runs locally. If possible:
- Disconnect from VPN temporarily
- Run the setup steps (
pip install,npm install, start the backend once to download the embedding model) - Reconnect to VPN — everything is cached locally, no more downloads needed
After the first run, Codewalk works fully offline (with Ollama). The VPN/corporate network won't cause any issues.
Optional: Dart/Flutter support (tree-sitter-dart)
# If you get an SSH error, run this first:
git config --global url."https://github.com/".insteadOf "git@github.com:"
# Then install:
pip install "tree-sitter-dart @ git+https://github.com/UserNobody14/tree-sitter-dart.git"Without this, Codewalk still works — Dart files just won't get tree-sitter parsing (falls back to text splitting).
cd frontend
npm install
cd ..Copy the template: cp env.local.example.txt .env then edit:
# ─── LLM Configuration ──────────────────────────────────────
# Provider: ollama | openai | anthropic | gemini | groq | openrouter
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5-coder:7b
# ─── Embeddings ──────────────────────────────────────────────
EMBEDDING_MODEL=jinaai/jina-code-embeddings-1.5b
# ─── API Keys (only fill the one you're using) ──────────────
# GROQ_API_KEY=gsk_...
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GOOGLE_API_KEY=AI...
# OPENROUTER_API_KEY=sk-or-...ollama pull qwen2.5-coder:7bRecommended models by size
| Model | Size | Tool Calling | Best For |
|---|---|---|---|
qwen2.5-coder:7b |
4.7 GB | ✅ | Code-focused, fast |
qwen3.5:latest (8B) |
6.6 GB | ✅ | General + code |
qwen3.5:27b |
17 GB | ✅ | Best accuracy |
Open two terminals in codewalk:
Terminal 1 — Backend API
source .codewalk-env/bin/activate
uvicorn src.codewalk.api.main:app --reload --port 8000Terminal 2 — Frontend
cd frontend
npm run devIf the frontend throws stale chunk 404s after pulling or editing code, restart it cleanly:
npm run dev:clean
# or from the project root
./scripts/restart-frontend.shOpen http://localhost:3000 → click Analyze Codebase (the repo is discovered from the working directory via codewalk.yaml).
Then explore:
- Knowledge Graph — interactive structural + knowledge graph, layer/module legend, node-category filters, detail-level toggle, persona selector, Path Finder, export menu, code viewer, file explorer, tour/onboarding, mobile layout, edge styling, and diff overlay
- Overview — tech stack, modules, dependency diagram, riskiest files
- Modules — browse all modules, click one for file list + dependencies
- Blast Radius — which files break if you change each file
- Reading Order — optimal file reading sequence with risk levels
- Execution Flow — Mermaid diagram of module/file dependencies
- Chat — ask any question ("explain the authentication flow", "what does scanner.py do?")
- Code Review — review git diffs, review single files, load team guidelines
- Voice — click the mic, ask a question by speaking, hear the answer read aloud
- Smart Reindex — incremental re-embed with stats (skipped, changed, deleted)
- Cloud Admin —
/adminpage for repo registration, token management, and server health
See MCP Integration below.
# Start the backend
source .codewalk-env/bin/activate
uvicorn src.codewalk.api.main:app --reload --port 8000Step 1 — Analyze a codebase:
# Run from inside the repo you want to analyze (repo is discovered from cwd via codewalk.yaml)
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"index_mode": "auto"}'Step 2 — Check index status and explore the results:
# Check whether the current workspace is indexed
# Optional: ?repo_path=/path/to/repo (defaults to cwd discovery)
curl "http://localhost:8000/index-status" | python3 -m json.tool
# Project overview (tech stack, modules, riskiest files)
curl http://localhost:8000/overview | python3 -m json.tool
# List all modules
curl http://localhost:8000/modules | python3 -m json.tool
# Dive into a specific module
curl http://localhost:8000/modules/auth | python3 -m json.tool
# What breaks if I change files in the auth module?
curl http://localhost:8000/blast-radius/auth | python3 -m json.tool
# Optimal reading order
curl http://localhost:8000/reading-order | python3 -m json.tool
# Execution flow (entry points, dependency chains)
curl http://localhost:8000/execution-flow | python3 -m json.toolStep 3 — Chat with the agent:
# Ask a question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Explain this project", "thread_id": "thread-1"}'
# Follow-up (same thread_id = conversation memory)
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What does the auth module do?", "thread_id": "thread-1"}'
# After code changes — refresh analysis without re-embedding
curl -X POST http://localhost:8000/refresh
# Incremental reindex — only re-embed changed files
curl -X POST http://localhost:8000/incremental-reindex
# Review current git diff for bugs, security, style
curl -X POST http://localhost:8000/review \
-H "Content-Type: application/json" \
-d '{"staged": false, "target_branch": "master"}'See API Reference for full request/response details on every endpoint.
Codewalk runs as an MCP (Model Context Protocol) server, so any AI agent that speaks MCP can use it.
- Cloud server indexes your repo on
git push(GitHub App webhook) - Clone codewalk locally (MCP server code)
- Open target repo in Cursor/VS Code (
${workspaceFolder}) - Configure MCP with cloud URL + repo token:
{
"servers": {
"codewalk": {
"command": "/path/to/codewalk/.codewalk-env/bin/python",
"args": [
"-c",
"import os, sys; sys.path.insert(0, os.environ['CODEWALK_PATH']); from src.codewalk.mcp.server import mcp; mcp.run(transport='stdio')"
],
"cwd": "${workspaceFolder}",
"env": {
"CODEWALK_PATH": "/path/to/codewalk",
"CODEWALK_SERVER_URL": "https://api.codewalk.xyz",
"CODEWALK_REPO_NAME": "owner/repo",
"CODEWALK_REPO_TOKEN": "cw_repo_xxxxxxxx"
}
}
}
}
cwdshould be the target repo (wherecodewalk.yamllives).CODEWALK_PATHtells Python where to find the Codewalk source package. Open the target repo in your editor so the server starts from that workspace.
Get repo_token after first index (on server):
docker compose exec postgres psql -U codewalk -d codewalk -c \
"SELECT repo_token FROM repos WHERE full_name='owner/repo';"Run codewalk_connect_repo in Cursor or let analyze auto-download the index. Cloud sync tools include codewalk_pull_index, codewalk_connect_repo, codewalk_index_status, codewalk_check_version, and codewalk_show_knowledge_graph.
Every MCP tool is wrapped with a workspace-change guard (
_refresh_state_if_moved) that re-discovers the current working directory and resets state if the workspace changes.
⚠️ One repo per MCP server process. Codewalk keeps runtime state (vector store, graph, repo path) in memory. Pointing the same running MCP server at multiple repos — or rapidly switching workspaces in the same process — can overwrite or corrupt that state. Use one editor window / one MCP connection per repo. The stdio transport is safe because each connection spawns a separate process, but do not route commands for different repos into the same server instance.
No cloud — index runs locally via codewalk_analyze_codebase. After rebuild_analysis_cache, MCP embeds with index_from_paths_parallel (same pipeline helpers as the API, but MCP scans via scan_repo_files + codewalk.yaml excludes rather than calling full_index_parallel directly).
| Surface | Local embed entrypoint | Notes |
|---|---|---|
MCP codewalk_analyze_codebase |
index_from_paths_parallel |
rebuild_analysis_cache → parallel chunk/embed → write_manifest |
API POST /analyze (+ /analyze/stream) |
full_index_parallel |
Same Chroma output under {repo}/.codewalk/ |
You talk to your IDE agent; the agent calls Codewalk MCP tools. Codewalk does not render UI — each host has its own approve/reject experience (Cursor approval cards, Copilot chat, Claude Code prompts, etc.). The agent must present each fix and wait for your approval through that host UI (or yes/no in chat).
- Agent runs
codewalk_run_review(returns enriched context for the host LLM to review) orcodewalk_review_file(runs the full pipeline on one file) - For each finding:
codewalk_finding_verdictrecords whether you accept or reject it - Apply accepted fixes:
codewalk_apply_acceptedapplies every accepted finding withrecommended_codein one call; or usecodewalk_approve_action→codewalk_apply_fix(..., approval_token=<token>)for a single fix - After edits:
codewalk_verify_fix→codewalk_incremental_reindex
Full agent rules: src/codewalk/mcp/server.py FastMCP instructions (sent on MCP connect).
Example: @codewalk review my changes, then fix each issue only after I approve
Cloud re-download: codewalk_pull_index / codewalk_connect_repo / auto-download on analyze all replace local .codewalk/ (delete then extract). Force refresh: rm -rf .codewalk then pull.
| Tool | Index required? | Notes |
|---|---|---|
codewalk_analyze_codebase |
Builds/loads | Cloud download or local embed |
codewalk_generate_config |
No | Creates starter codewalk.yaml |
| Query tools (search, overview, modules, symbols, …) | Yes | _require_index() auto-loads disk |
codewalk_find_circular_dependencies |
Yes | Uses graph data |
codewalk_get_architecture_health |
Yes | Graph stats + cycles |
codewalk_incremental_reindex, codewalk_refresh_analysis |
Yes | |
codewalk_run_review, codewalk_review_file, codewalk_get_stack_info |
Soft / Yes | Better with index; run_review returns context for the host LLM |
codewalk_get_review_details, codewalk_finding_verdict |
Yes | Reads persisted session |
codewalk_approve_action / codewalk_apply_fix |
No / edits files | Token required for apply_fix |
codewalk_apply_accepted |
Yes | Applies all accepted findings from a session |
codewalk_verify_fix |
No | Runs static analysis + tests |
codewalk_run_static_analysis |
No | ruff/mypy/eslint/etc. |
codewalk_run_tests |
No | pytest/npm test/etc. |
codewalk_pull_index, codewalk_connect_repo, codewalk_index_status |
Cloud config | Replace .codewalk/ on download |
Docs / guidelines / voice / check_version / show_knowledge_graph |
Varies | See MCP server instructions |
-
Open VS Code in the codewalk project
-
Press
Cmd+Shift+P(macOS) orCtrl+Shift+P(Windows/Linux) -
Type
MCP: List Serversand select it -
You'll see
codewalkin the list -
Click Start Server next to codewalk
-
The server starts in the background (stdio transport)
-
Open Copilot Chat → type
@codewalk→ all Codewalk MCP tools are available
Add to .vscode/mcp.json in your desired project:
⚠️ Replace/path/to/codewalkwith the actual absolute path where you cloned codewalk.cwd(${workspaceFolder}) should be the target repo so the server discoverscodewalk.yaml.CODEWALK_PATHmust point at the cloned Codewalk repo sosrc.codewalk.mcp.serverresolves.
{
"servers": {
"codewalk": {
"command": "/path/to/codewalk/.codewalk-env/bin/python",
"args": [
"-c",
"import os, sys; sys.path.insert(0, os.environ['CODEWALK_PATH']); from src.codewalk.mcp.server import mcp; mcp.run(transport='stdio')"
],
"cwd": "${workspaceFolder}",
"env": {
"CODEWALK_PATH": "/path/to/codewalk"
}
}
}
}Team config (
codewalk.yaml): Put repo-specific settings in the repo root:guidelines_path: contributing-docs docs_path: team-docs indexing: exclude: - tests/** - docs/** - scripts/legacy/** - "*.generated.*" include: - docs/architecture/**
indexing.excludeis a list of paths/patterns skipped during scanning.indexing.includeoverrides exclusions (and the core safety net) for specific paths. These are checked at scan time. Generate a starter config with stack-specific excludes viapython -m src.codewalk.cli generate-configor@codewalk Run codewalk_generate_config.
Customizing file filters: Codewalk uses a deterministic core safety net (
src/codewalk/ingestion/file_filter.py) — no LLM involved. It always skips universally bad content (.git,node_modules, dependency/build/cache dirs, binaries, media, secrets, lock files, generated suffixes). Repo- or framework-specific exclusions (e.g.,tools/,scripts/,cdk/,migrations/, story files) belong incodewalk.yaml(often generated bygenerate-config). If a folder or file is not being indexed that you need, you have three options:
codewalk.yamlindexing.include— override exclusions for specific paths. Example:["docs/architecture/**", "src/migrations/schema.py"].codewalk.yamlindexing.exclude— repo-specific dirs/patterns. Example:["tests/**", "docs/**", "*.generated.*"]..codewalkignorefile — gitignore-style patterns in the repo root (see below).You generally do not need to duplicate
node_modules,.git, build dirs, etc. incodewalk.yaml; those are handled by the core safety net.
.codewalkignore— Create a.codewalkignorefile in the root of the repo you're analyzing to skip specific files/directories:# Skip test files tests/ *_test.py # Skip specific directories data/ wiki/ blogs/ # Skip specific file patterns *.config.js setup.pySyntax (gitignore-like):
folder/— skip any path containing this directory*.pattern— glob match against full path or filenamefilename— matches exact filename or path segment# comment— ignored- blank lines — ignored
Patterns are cached in
_codewalkignore_patterns(loaded once per session). If you change the repo being analyzed,reset_codewalkignore()clears the cache so the next repo's.codewalkignoregets loaded.
Then in Copilot Chat: @codewalk → it will call codewalk_analyze_codebase automatically.
Note: After adding or modifying
.vscode/mcp.json, reload the VS Code window:Cmd+Shift+P→Developer: Reload Window.
Add to ~/.claude/mcp.json:
{
"mcpServers": {
"codewalk": {
"command": "/path/to/codewalk/.codewalk-env/bin/python",
"args": [
"-c",
"import os, sys; sys.path.insert(0, os.environ['CODEWALK_PATH']); from src.codewalk.mcp.server import mcp; mcp.run(transport='stdio')"
],
"cwd": "${workspaceFolder}",
"env": {
"CODEWALK_PATH": "/path/to/codewalk"
}
}
}
}Settings → MCP Servers → Add:
{
"codewalk": {
"command": "/path/to/codewalk/.codewalk-env/bin/python",
"args": ["-m", "src.codewalk.mcp.server"],
"cwd": "${workspaceFolder}",
"env": {
"CODEWALK_PATH": "/path/to/codewalk"
}
}
}Exclusions now live in
codewalk.yaml(indexing.exclude) or.codewalkignore, not in theEXCLUDE_PATHSenv var.
Add to ~/.codex/mcp.json:
{
"mcpServers": {
"codewalk": {
"command": "/path/to/codewalk/.codewalk-env/bin/python",
"args": [
"-c",
"import os, sys; sys.path.insert(0, os.environ['CODEWALK_PATH']); from src.codewalk.mcp.server import mcp; mcp.run(transport='stdio')"
],
"cwd": "${workspaceFolder}",
"env": {
"CODEWALK_PATH": "/path/to/codewalk"
}
}
}
}The first time you use Codewalk on a new codebase, it needs to index the files.
You just tell the AI to analyze — the AI handles the rest automatically.
┌─────────────────────────────────────────────────────────────────────┐
│ SETUP WORKFLOW (run once) │
│ │
│ Step 1 (only step) │
│ codewalk_analyze_codebase │
│ │ scans files, builds dependency graph, detects modules, │
│ │ filters with file_filter.py, chunks, embeds — all in one │
│ ▼ │
│ ✅ READY — all query tools unlocked │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ QUERY TOOLS (use after setup) │
│ │
│ codewalk_get_overview → project summary + dependency flow │
│ codewalk_search_codebase → semantic code search │
│ codewalk_lookup_symbol → find symbols by name across repo │
│ codewalk_get_module_info → inspect a specific module │
│ codewalk_explain_function → AI-powered function explanation │
│ codewalk_explain_class → AI-powered class explanation │
│ codewalk_get_blast_radius_map → change risk analysis │
│ codewalk_find_circular_dependencies → detect import cycles │
│ codewalk_get_reading_order → optimal file reading sequence │
│ codewalk_get_execution_flow → module/file dependency flow │
│ codewalk_get_architecture_health → bottlenecks, cycles, key files │
│ codewalk_call_chain(source, target) → trace import path between │
│ codewalk_show_knowledge_graph → export graph for visualization │
│ codewalk_index_docs(docs_path) → index .md/.pdf/.txt docs │
│ codewalk_search_docs(query) → search indexed documents │
│ codewalk_ask_docs(question) → RAG answer grounded in docs │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ REVIEW & HITL TOOLS │
│ │
│ codewalk_run_review → gather review context for host LLM│
│ codewalk_review_file → full pipeline review of one file │
│ codewalk_get_stack_info → deterministic stack signals │
│ codewalk_get_review_details → retrieve a persisted review │
│ codewalk_load_guidelines → load team coding standards │
│ codewalk_finding_verdict → accept/reject a finding │
│ codewalk_apply_accepted → apply all accepted fixes │
│ codewalk_approve_action(text) → HITL gate (returns approval_token)│
│ codewalk_apply_fix(..., token) → apply one fix after approval │
│ codewalk_verify_fix → static analysis + tests │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ MAINTENANCE (after code changes) │
│ │
│ codewalk_generate_config → starter codewalk.yaml │
│ codewalk_incremental_reindex → re-embed only changed files │
│ codewalk_refresh_analysis → re-scan without re-embedding │
│ codewalk_run_static_analysis → ruff/mypy/eslint/etc. on files │
│ codewalk_run_tests → pytest/npm test/etc. on files │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ CLOUD (when configured) │
│ │
│ codewalk_pull_index → download latest server index │
│ codewalk_connect_repo → one-step cloud setup │
│ codewalk_index_status → local vs cloud version │
│ codewalk_check_version → server health/version │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ VOICE (hands-free) │
│ │
│ MCP: codewalk_voice_ask → mic → transcribe │
│ Copilot picks tool → calls it → codewalk_speak(summary) │
│ │
│ API: POST /voice/ask → mic → transcribe → agent invokes tool │
│ agent answer → format_voice_response() → MP3 │
└─────────────────────────────────────────────────────────────────────┘
💡 Before indexing: Close unnecessary applications (browsers, Slack, Docker, etc.). Indexing loads the embedding model into memory and processes all files at once — freeing up RAM helps it run faster and avoids slowdowns.
You type this in Copilot Chat:
@codewalk analyze this codebase [auto(default) | reindex(update index) | full(delete existing index and generate new index)]
or
@codewalk_analyze_codebase [auto(default) | reindex(update index) | full(delete existing index and generate new index)]
What happens behind the scenes (you don't need to do anything):
- The AI calls
codewalk_analyze_codebase→ scans all files, filters withfile_filter.py, detects modules, builds the dependency graph, chunks and embeds everything in one call
You'll see progress like:
✓ Codebase analyzed and indexed successfully
Files found: 142
Files indexed: 121
Chunks embedded: 380
Modules found: api, analysis, embeddings, ingestion, rag
✅ Ready to answer questions — use query tools directly.
Note: After indexing, the AI agent should automatically call these tools. If it doesn't, you can invoke them manually — the hints above tell you exactly which tools to run.
Note: This only happens once. Next time you say
@codewalk analyze this codebase, it detects the existing index and skips straight to "ready."
The setup is now a single call — codewalk_analyze_codebase does everything. If the AI stops after that, just call any query tool yourself:
| AI stopped after... | You call next |
|---|---|
codewalk_analyze_codebase |
Any query tool — codewalk_get_overview, codewalk_search_codebase, etc. |
Tip: Look for the ⏩ NEXT STEP line at the bottom of each tool's output — it tells you exactly what to do.
After indexing is done, here's every tool you can use.
You don't need to remember tool names — just ask naturally and the AI picks the right tool.
Tool: codewalk_get_overview — no parameters needed
You just joined a new team. You have no idea what this project does. Start here.
@codewalk give me an overview of this project
or
@codewalk_get_overview
When to use: Day 1 on a new project. You want to know what you're dealing with.
Tool: codewalk_get_module_info(module_name) — pass the module name
You saw "auth" in the overview and want to dig into it.
@codewalk tell me about the auth module
or
@codewalk_get_module_info auth
When to use: You need to work on a specific module and want to see all its files, classes, and functions at a glance.
Tool: codewalk_explain_function(function_name) — pass the function or class name
Your tech lead mentioned verify_request in a PR review. You have no idea what it does.
@codewalk explain the verify_request function
or
@codewalk_explain_function verify_request function
When to use: You see a function name in code/PR/docs and want to understand exactly what it does without reading the whole file yourself.
Tool: codewalk_search_codebase(query) — pass any natural language question
You need to find where database connections are handled but don't know which file.
@codewalk how does this project handle database connections?
or
@codewalk_search_codebase how does this project handle database connections?
When to use: You have a question about a concept ("error handling", "file upload", "caching") and don't know which files to look at.
Tool: codewalk_get_blast_radius_map(target) — pass a module name, file name, or leave empty
You're about to refactor models/base.py. Before you touch it, you want to know the damage.
@codewalk what's the blast radius of base.py / auth?
or
@codewalk_get_blast_radius_map base.py / auth?
When to use: Before refactoring or making changes. "Is it safe to change this, or will half the project break?"
Tool: codewalk_get_reading_order(module_name) — pass a module name or leave empty for entire repo
You want to understand the agent module but don't know which file to read first.
@codewalk what order should I read the agent module?
or
@codewalk_get_reading_order
When to use: You want to understand code without constantly jumping between files wondering "wait, what's this import?"
Tool: codewalk_get_execution_flow(module_name) — pass a module name or leave empty for module-level view
You want to understand how modules connect to each other.
@codewalk show me the execution flow
or
@codewalk_get_execution_flow
When to use: You want to understand "what calls what" — the big picture of how code connects.
Tool: codewalk_refresh_analysis — no parameters needed
You added 3 new files and refactored a module. The analysis is now stale.
@codewalk refresh the analysis
or
@codewalk_refresh_analysis
When to use: After you commit code changes and want updated blast radius / reading order / execution flow results.
Tool: codewalk_incremental_reindex — no parameters needed
You changed a few files but don't want to re-embed the entire codebase.
@codewalk reindex changed files
or
@codewalk_incremental_reindex
When to use: After code changes when you want the vector search to reflect the latest code without a full re-index. Uses content hashes — only re-embeds what actually changed.
Tool: codewalk_run_review — optional: staged=true, target_branch="master"
You're about to push a PR and want an automated code review.
@codewalk review my changes
or
@codewalk_run_review
@codewalk_run_review staged=true target_branch="master"
When to use: Before pushing a PR. codewalk_run_review gathers the full diff, neighborhood context, blast radius, and stack signals, then returns them to Copilot so it can perform the review directly using enriched context — no local LLM overhead, instant results.
Tool: codewalk_review_file(file_path) — pass the file path
You want to check if a file follows the project's conventions.
@codewalk review src/codewalk/pipeline.py
or
@codewalk_review_file src/codewalk/pipeline.py
When to use: When you want to review any file — no git diff needed. Reads the file directly, enriches it with caller context (who imports it), security patterns from the vector store, similar code elsewhere in the codebase, and team guidelines. Copilot performs the review natively — no local LLM, instant results.
Tool: codewalk_load_guidelines(docs_path) — pass path to guidelines directory
Your team has coding standards in markdown files.
@codewalk load guidelines from docs/standards
or
@codewalk_load_guidelines docs/standards
When to use: Once per project. After loading, codewalk_run_review and codewalk_review_file automatically include your team's standards in their context.
Tools: codewalk_voice_ask + codewalk_speak — no parameters needed
You want to ask a question by speaking instead of typing.
@codewalk_voice_ask
What happens:
- 🔔 Beep — signals "start talking"
- 🎙️ Records your voice (up to 30s, stops after 5s of silence)
- 📝 Transcribes locally via faster-whisper
- 🧠 Copilot reads the transcript and picks the right codewalk tool
- ⚙️ Copilot calls the tool and gets the result
- 🔊 Copilot calls
codewalk_speak(summary)— speaks a 2-4 sentence summary aloud
When to use: Hands-free coding. You're reading code and want to ask "what does this function do?" without switching to the keyboard.
Note: Routing is done by Copilot (full LLM), not a separate model — no Ollama required for MCP voice. The REST API (
POST /voice/ask) sends the transcript directly to the chat agent, which picks the right tool natively.
Tool: codewalk_get_architecture_health — no parameters needed
You want a health check: bottleneck files, circular dependencies, and the most important files.
@codewalk check the architecture health
or
@codewalk_get_architecture_health
Returns: Graph stats, bottleneck files (betweenness centrality), most important files (PageRank), circular dependencies with suggested fixes.
When to use: Before a refactor, code review, or whenever you suspect architectural issues.
Tool: codewalk_call_chain(source, target) — two file names
You want to trace the import chain between two files — "how does a change in config.py eventually affect server.py?"
@codewalk trace the import chain from config.py to server.py
or
@codewalk_call_chain config.py server.py
Returns: Shortest import path with hop count and full file paths.
When to use: Understanding how changes propagate, debugging import issues, or tracing dependency chains.
Tool: codewalk_find_circular_dependencies() — no parameters
Detect import cycles that can cause brittle architecture or load-order bugs.
@codewalk find circular dependencies
or
@codewalk_find_circular_dependencies
When to use: Before a refactor or when investigating why two modules feel tightly coupled.
Tool: codewalk_lookup_symbol(symbol_name) — pass a function, class, or method name
Find every definition and key references of a named symbol across the repo.
@codewalk lookup symbol authenticate_user
or
@codewalk_lookup_symbol authenticate_user
When to use: You know a name and want its exact file, line, and callers without doing a semantic search.
Tool: codewalk_generate_config() — no parameters
Create a stack-specific codewalk.yaml with sensible excludes for your repo.
@codewalk generate a codewalk.yaml for this repo
or
@codewalk_generate_config
When to use: First-time setup, before the first analyze, to avoid indexing build artifacts and tests.
Tool: codewalk_run_static_analysis(file_paths) — pass one or more files
Run language-appropriate linters/type-checkers (ruff, mypy, eslint, etc.) on the given files.
@codewalk run static analysis on src/auth.py
or
@codewalk_run_static_analysis src/auth.py
When to use: After applying a fix or editing files to catch style/type issues quickly.
Tool: codewalk_run_tests(file_paths) — pass one or more files
Auto-detect and run the relevant test command (pytest, npm test, go test, cargo test, etc.).
@codewalk run tests for src/auth.py
or
@codewalk_run_tests src/auth.py
When to use: After a fix or refactor to confirm nothing broke.
| You want to... | Just say... |
|---|---|
| First-time setup | @codewalk analyze this codebaseor @codewalk_analyze_codebase |
| Big picture overview | @codewalk give me an overview or @codewalk_get_overview |
| Understand a module | @codewalk tell me about the auth module or @codewalk_get_module_info auth |
| Understand a function | @codewalk explain the verify_request function or @codewalk_explain_function verify_request |
| Find code by concept | @codewalk how does error handling work? or @codewalk_search_codebase how does error handling work? |
| Check change risk | @codewalk what's the blast radius of config.py? or @codewalk_get_blast_radius_map config.py? |
| Find riskiest files | @codewalk show me the riskiest files |
| Best reading order | @codewalk what order should I read the agent module? or @codewalk_get_reading_order agent module |
| See dependency flow | @codewalk show me the execution flow or @codewalk_get_execution_flow |
| After code changes | @codewalk refresh the analysis or @codewalk_refresh_analysis |
| Update embeddings | @codewalk reindex changed files or @codewalk_incremental_reindex |
| Review git diff | @codewalk review my changes or @codewalk_run_review |
| Review a file | @codewalk review src/auth.py or @codewalk_review_file src/auth.py |
| Get stack signals | @codewalk what stack is this? or @codewalk_get_stack_info |
| Load guidelines | @codewalk load guidelines from docs/ or @codewalk_load_guidelines docs/ |
| Architecture health | @codewalk check architecture health or @codewalk_get_architecture_health |
| Trace import chain | @codewalk trace chain from config.py to server.py or @codewalk_call_chain config.py server.py |
| Find circular dependencies | @codewalk find circular dependencies or @codewalk_find_circular_dependencies |
| Lookup a symbol | @codewalk lookup symbol authenticate_user or @codewalk_lookup_symbol authenticate_user |
| Run static analysis | @codewalk run static analysis on src/auth.py or @codewalk_run_static_analysis src/auth.py |
| Run tests | @codewalk run tests for src/auth.py or @codewalk_run_tests src/auth.py |
| Generate repo config | @codewalk generate a codewalk.yaml or @codewalk_generate_config |
| Search team docs | @codewalk search docs for deployment or @codewalk_search_docs deployment |
| Ask docs a question | @codewalk how do we deploy? or @codewalk_ask_docs how do we deploy |
| Deep research | @codewalk research how error handling works across the codebase |
| Accept/reject findings | @codewalk accept finding 3 → @codewalk_finding_verdict |
| Apply accepted fixes | @codewalk apply accepted fixes or @codewalk_apply_accepted |
| Approve then apply one fix | @codewalk approve apply fix to auth.py → @codewalk_approve_action → @codewalk_apply_fix |
| Ask by speaking (hands-free) | @codewalk_voice_ask → Copilot calls tool → @codewalk_speak |
Base URL: http://localhost:8000
Start the server:
source .codewalk-env/bin/activate
uvicorn src.codewalk.api.main:app --reload --port 8000curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{
"collection_name": "",
"index_mode": "auto"
}'Response:
{
"status": "complete",
"repo_path": "/Users/you/projects/my-app",
"files_scanned": 142,
"chunks_created": 380,
"modules": ["api", "auth", "models", "utils", "frontend"]
}- The repo is discovered from the current working directory via
codewalk.yaml(auto-created if missing).repo_pathis no longer accepted in the request body. index_mode:"auto"(skip if indexed),"reindex"(smart update),"full"(wipe & rebuild)collection_name: leave empty — readsmanifest.collection_nameif present, else repo folder nameauto+ index on disk → load only (load_scoped_analysis), no re-embed — same idea as MCPcodewalk_analyze_codebase- No index →
full_index_parallelwithcodewalk.yamlexcludes (local embed on API server)
curl -N -X POST http://localhost:8000/analyze/stream \
-H "Content-Type: application/json" \
-d '{"index_mode": "auto"}'Response (Server-Sent Events) — step values from analyze_stream() in main.py:
step |
When |
|---|---|
init |
Always first — checking existing index |
skip |
index_mode: auto + .codewalk/ on disk (load only), or non-full/reindex skip |
scan |
Full index or reindex — file scan (codewalk.yaml excludes on full) |
chunk |
Full index — parallel chunk + embed |
embed |
Full index — embed count |
store |
Full index — Chroma persist + manifest |
reindex |
index_mode: reindex — new/changed/deleted counts |
analyze |
Dependency graph + module detection |
agent |
state.initialize (DuckDB, docs, guidelines, agent) |
done |
Success (result object on final event when complete) |
error |
Exception message |
index_mode: auto + existing .codewalk/ (fast path):
data: {"step": "init", "message": "Checking existing index..."}
data: {"step": "skip", "message": "Loaded existing index (380 chunks)"}
data: {"step": "done", "message": "Analysis complete!", "result": {...}}
index_mode: full (or empty index):
data: {"step": "init", "message": "Checking existing index..."}
data: {"step": "scan", "message": "Scanning directory..."}
data: {"step": "scan", "message": "Scanned 142 files (codewalk.yaml excludes applied)"}
data: {"step": "chunk", "message": "Chunking + embedding in parallel..."}
data: {"step": "chunk", "message": "Created 380 chunks"}
data: {"step": "embed", "message": "Embedded 380 chunks"}
data: {"step": "store", "message": "Storing in vector database..."}
data: {"step": "store", "message": "Stored 380 chunks in ChromaDB"}
data: {"step": "analyze", "message": "Building dependency graph..."}
data: {"step": "agent", "message": "Creating AI agent..."}
data: {"step": "analyze", "message": "Detected 5 modules"}
data: {"step": "done", "message": "Analysis complete!", "result": {...}}
# Check cwd-discovered repo
curl http://localhost:8000/index-status | python3 -m json.tool
# Optional: check a specific repo path
curl "http://localhost:8000/index-status?repo_path=/Users/you/projects/my-app" | python3 -m json.toolResponse:
{
"indexed": true,
"repo_path": "/Users/you/projects/my-app"
}The frontend sidebar uses this endpoint to lock index-dependent tabs until indexed: true.
All query endpoints call state.require_index() — auto-loads .codewalk/ from disk after server restart (same as MCP _require_index()).
| Endpoint | Index required? | MCP equivalent | Notes |
|---|---|---|---|
POST /analyze |
Builds or loads | codewalk_analyze_codebase |
API: full_index_parallel; MCP local: index_from_paths_parallel |
POST /analyze/stream |
Builds or loads | same (SSE progress) | Steps: init→skip/scan→chunk→embed→store→analyze→agent→done |
POST /chat, /chat/stream |
Yes | agent + tools | API HITL via POST /chat/approve |
GET /overview |
Yes | codewalk_get_overview |
|
GET /modules, /modules/{name} |
Yes | codewalk_get_module_info |
|
GET /blast-radius, /blast-radius/{m} |
Yes | codewalk_get_blast_radius_map |
|
GET /reading-order |
Yes | codewalk_get_reading_order |
|
GET /execution-flow |
Yes | codewalk_get_execution_flow |
|
GET /architecture, /cycles |
Yes | codewalk_get_architecture_health |
|
POST /semantic-search |
Yes | codewalk_search_codebase |
Chroma semantic search endpoint |
POST /rag/expand-query |
Yes | — | LLM query expansion for RAG |
POST /rag/rerank |
Yes | — | LLM chunk reranking |
POST /rag/symbol-lookup |
Yes | codewalk_lookup_symbol |
DuckDB symbol lookup |
POST /tools/static-analysis |
No | codewalk_run_static_analysis |
ruff/mypy/eslint/etc. on files |
POST /tools/run-tests |
No | codewalk_run_tests |
pytest/npm test/etc. on files |
GET /version |
No | codewalk_check_version |
Codewalk version + commit info |
GET /staleness |
Yes | — | Local vs cloud index staleness |
POST /refresh |
Yes | codewalk_refresh_analysis |
No re-embed |
POST /incremental-reindex |
Yes | codewalk_incremental_reindex |
team_config + manifest collection |
POST /review |
Soft (better with index) | codewalk_run_review |
Works with partial context |
POST /review/stream |
Soft (better with index) | — | SSE progress events |
POST /review/cancel |
Yes | — | Cancel a running review |
POST /review/file |
Yes | codewalk_review_file |
|
POST /review/guidelines |
No | codewalk_load_guidelines |
Guidelines Chroma only |
POST /review/verdict |
Yes | codewalk_finding_verdict |
Accept/reject a finding |
POST /review/apply-accepted |
Yes | codewalk_apply_accepted |
Apply all accepted fixes |
POST /review/apply |
No (repo path only) | codewalk_apply_fix |
Caller approves in UI; no token gate |
POST /docs/index, /docs/search, /docs/ask |
Doc index only | codewalk_index_docs etc. |
|
POST /chat/approve |
Yes | — | Resume/reject interrupted agent |
POST /voice/ask |
Yes | codewalk_voice_ask |
|
POST /research |
Yes | deep research | |
GET /health |
No | — | |
Cloud GET /indexes/... |
Download only | codewalk_pull_index |
Blocked when cloud-only mode |
Cloud API server (DATABASE_URL set): query endpoints above return 400 — indexes are built on server, queried locally via MCP download.
curl -X POST http://localhost:8000/refreshResponse:
{
"status": "refreshed",
"repo_path": "/Users/you/projects/my-app",
"files": 142,
"modules": ["api", "auth", "models", "utils", "frontend"]
}curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "Explain how authentication works in this project", "thread_id": "thread-1"}'Response:
{
"answer": "The authentication flow starts in auth/middleware.py which checks JWT tokens on every request. The token validation logic is in auth/jwt.py which uses the python-jose library...",
"thread_id": "thread-1"
}Multi-turn conversation — use the same thread_id:
# Follow-up question
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message": "What happens if the token expires?", "thread_id": "thread-1"}'curl http://localhost:8000/overviewResponse:
{
"tech_stack": ["Python", "FastAPI", "React"],
"total_files": 142,
"total_modules": 5,
"modules": [
{"name": "api", "file_count": 12, "depends_on": ["auth", "models"]},
{"name": "auth", "file_count": 5, "depends_on": ["models"]}
],
"diagram": "graph TD\n api --> auth\n api --> models\n auth --> models",
"overview_text": "## Project Overview\nTech stack: Python, FastAPI...",
"riskiest_files": [
{"file": "models/base.py", "risk_level": "high", "affected_files": 23}
]
}curl http://localhost:8000/modulesResponse:
{
"modules": [
{"name": "api", "file_count": 12, "languages": ["python"]},
{"name": "auth", "file_count": 5, "languages": ["python"]},
{"name": "frontend", "file_count": 34, "languages": ["typescript", "css"]}
],
"total": 5
}curl http://localhost:8000/modules/authResponse:
{
"name": "auth",
"file_count": 5,
"files": ["auth/middleware.py", "auth/jwt.py", "auth/permissions.py", "auth/models.py", "auth/__init__.py"],
"languages": {"python": 5},
"depends_on": ["models"],
"depended_by": ["api"],
"blast_radius": [
{"file": "auth/middleware.py", "risk_level": "moderate", "affected_files": 8}
],
"module_risk": "moderate"
}curl http://localhost:8000/blast-radiusResponse:
{
"module": null,
"module_risk": "high",
"total_files": 15,
"files": [
{
"file": "models/base.py",
"risk_level": "high",
"affected_files": 23,
"direct": ["api/routes.py", "auth/models.py"],
"transitive": ["api/views.py", "auth/middleware.py"]
}
]
}curl http://localhost:8000/blast-radius/authcurl http://localhost:8000/reading-orderResponse:
{
"order": [
{
"file": "config.py",
"position": 1,
"why": "No internal dependencies",
"risk_level": "moderate",
"affected_files": 12,
"direct": ["embedder.py", "chain.py"],
"transitive": ["pipeline.py"]
},
{
"file": "models/base.py",
"position": 2,
"why": "No internal dependencies | Used by: routes.py, views.py",
"risk_level": "high",
"affected_files": 23
}
]
}curl http://localhost:8000/execution-flowResponse:
{
"flow": "## Execution Flow — Module Level\nEntry modules: api, cli\nTotal modules: 5\n\n### Module Dependencies\n api (12 files) → depends on: auth, models\n auth (5 files) → depends on: models\n models (8 files) → (standalone)\n utils (6 files) → (standalone)\n frontend (34 files) → (standalone)"
}curl -X POST http://localhost:8000/incremental-reindexResponse:
{
"repo_path": "/Users/you/projects/my-app",
"files_on_disk": 142,
"files_skipped": 138,
"files_reindexed": 3,
"files_deleted": 1,
"chunks_embedded": 12,
"total_time": "2.3s"
}How it works: Incremental reindex first performs a Chroma incremental update (only changed files are embedded/deleted). It then fully rebuilds DuckDB and knowledge-graph.json from all Chroma chunks, and re-indexes docs/guidelines. The manifest ({repo}/.codewalk/manifest.json) is updated every write with an incremented index_version and a chunk_count reflecting total Chroma chunks.
curl -X POST http://localhost:8000/review \
-H "Content-Type: application/json" \
-d '{"staged": false, "target_branch": "master"}'Response:
{
"verdict": "request_changes",
"verdict_reason": "Critical security issue found that must be fixed before merge.",
"issues": [
{
"severity": "critical",
"category": "security",
"file_path": "src/auth/jwt.py",
"line_number": 42,
"title": "JWT secret hardcoded",
"explanation": "The JWT signing secret is hardcoded in the source file.",
"suggestion": "Move the secret to an environment variable.",
"code_snippet": "SECRET = 'my-secret-key'",
"blocking": true,
"confidence": "high"
}
],
"summary": "Found 1 critical issue in 3 files (+45 / -12 lines)",
"narrative_summary": "",
"files_reviewed": 3,
"lines_added": 45,
"lines_removed": 12,
"session_id": "25-June-2026-143052-feature-x-to-main",
"architecture_flags": {},
"schema_version": "2.0",
"merge_blockers": ["JWT secret hardcoded"],
"clusters": [],
"fixed_count": 0,
"new_count": 1,
"still_present_count": 0
}staged: Iftrue, review only staged changes (--staged). Default:false.target_branch: Diff against a branch (e.g."master"for full PR review). Default:null(unstaged changes).incremental: Carry forward previous findings whentrue. Default:false.narrative_summary: Settruefor an LLM-written narrative summary (slower). Default:false.
curl -X POST http://localhost:8000/review/file \
-H "Content-Type: application/json" \
-d '{"file_path": "src/codewalk/pipeline.py"}'Response:
{
"verdict": "approve_with_nits",
"verdict_reason": "Non-critical issues found. Fix recommended but not blocking.",
"issues": [
{
"severity": "suggestion",
"category": "style",
"file_path": "src/codewalk/pipeline.py",
"line_number": 120,
"title": "Consider extracting helper function",
"explanation": "The inline loop is repeated in two places.",
"suggestion": "Move the loop body into a private helper.",
"code_snippet": "for chunk in chunks:"
}
],
"summary": "Clean change with one minor style suggestion.",
"file_path": "src/codewalk/pipeline.py"
}curl -X POST http://localhost:8000/review/guidelines \
-H "Content-Type: application/json" \
-d '{"docs_path": "/path/to/guidelines"}'Response:
{
"status": "loaded",
"chunks": 24,
"path": "/path/to/guidelines"
}Upload an audio file (webm/mp3/wav from browser mic). Codewalk transcribes it, sends it to the chat agent (which picks the right tool natively), and returns both the text answer and a spoken MP3 response.
curl -X POST http://localhost:8000/voice/ask \
-F "audio=@question.webm" \
-F "thread_id=voice"Response:
{
"question": "what does the auth module do?",
"answer": "The auth module contains 5 files handling JWT validation...",
"speech": "The auth module handles JWT validation and permissions.",
"audio_base64": "SUQzBAAAAAAAI1RTU0UAAAA..."
}audio(required): Audio file upload (webm, mp3, wav)thread_id(optional): Conversation thread ID. Default:"voice"audio_base64: Base64-encoded MP3 of the spoken answer — decode and play in the browser
Pipeline: audio upload → faster-whisper STT → chat agent (picks tool natively) → summarize → edge-tts → MP3 response
curl http://localhost:8000/healthResponse:
{
"status": "ok"
}Production API: https://api.codewalk.xyz (indexing + webhooks + index download).
Marketing site (optional): https://codewalk.xyz
git push → GitHub App webhook → api.codewalk.xyz
↓
build in .incoming.{commit}/ → atomic_swap → active index
↓
Local MCP → GET /indexes/{owner}/{repo} → query locally
Indexing is server-side only — the cloud API does not serve /analyze or /chat for indexed repos. MCP downloads the index tarball and queries locally.
| Event | What happens |
|---|---|
First git push |
Auto-registers repo, incremental index, index_status: ready |
| Later pushes | Incremental re-index; cloud Postgres index_version bumps and is written to the downloaded .codewalk/manifest.json |
| Push during indexing | Older run superseded; newest commit wins |
| Deploy / API restart | Orphan jobs cancelled; catch-up re-indexes stale/pending repos (~15s) |
Stale codewalk_version |
Catch-up full re-index after semver deploy |
| Crash mid-write | Atomic swap — active index unchanged until publish succeeds |
Laptop after server index updates: codewalk_pull_index (not codewalk_analyze_codebase when cloud is configured).
Staleness banners (MCP): [Cloud] → pull index / wait for server catch-up; [Local] → codewalk_analyze_codebase. See deploy/SERVER_OPS.md §6.
Cloud Admin UI: The frontend includes an /admin page to register repos, list repos, trigger indexing, copy per-repo tokens, and check server health/version. Production API base is configured via NEXT_PUBLIC_API_URL.
Local-ahead safety: codewalk_pull_index and codewalk_connect_repo warn and require force=True when the local .codewalk/manifest.json index_version is ahead of the cloud Postgres row.
| Component | Where | Role |
|---|---|---|
| Cloud server | Hetzner + Docker | Index on push only |
| GitHub Actions | GitHub | Build image + deploy server |
| GitHub App | GitHub | Send push webhooks (must install app on repos) |
| Local MCP | Your laptop | Download index, run queries |
- Follow FULL_SETUP_GUIDE.md (complete step-by-step)
- Server
.env:cp env.server.example.txt→/opt/codewalk/.env - GitHub App webhook:
https://api.codewalk.xyz/webhooks/github - Install App on each repo to index (creating the app is not enough)
git pushto that repo — this registers it and starts indexing (install alone is not enough)- Verify:
POST /admin/reposwithX-Admin-Key→index_status: ready - Get
repo_tokenfrom DB → setCODEWALK_REPO_TOKENin MCP config
Day-to-day server ops: deploy/SERVER_OPS.md — health, SQL, logs, reset-repo.sh (prepare/reset/delete, --dry-run).
| Step | Required for indexing? |
|---|---|
Server running + cloud .env (App ID, PEM, webhook secret) |
Yes |
GitHub Actions secrets (HETZNER_*) |
No — deploy only |
| Install GitHub App on the repo | Yes |
git push to the repo |
Yes — triggers register + index |
repo_token in local MCP |
Yes — for downloading the index |
GitHub Actions deploys the server. Indexing is triggered by GitHub App
pushwebhooks, not Actions.
Each indexed repo can have a codewalk.yaml at its root:
indexing:
branches: # only these branches trigger indexing (fnmatch)
- master
- release/**
exclude:
# Repo-specific dirs/files (the core safety net already skips
# node_modules, build artifacts, binaries, secrets, lock files, etc.)
- frontend/**
- docs/**
include:
# Override an exclusion for a specific path
- docs/architecture/**Generate a starter config with stack-specific excludes:
python -m src.codewalk.cli generate-configOr via MCP: @codewalk Run codewalk_generate_config.
Cloud reads codewalk.yaml on every index. Pushes to other branches are ignored. See FULL_SETUP_GUIDE.md § Phase 7.
| File | Use |
|---|---|
| FULL_SETUP_GUIDE.md | Complete A→Z setup |
| deploy/DEPLOY.md | Deployment guide |
| deploy/SERVER_OPS.md | Server ops — health, indexing, SQL, reset-repo.sh |
| env.server.example.txt | Hetzner /opt/codewalk/.env |
| env.local.example.txt | Local dev .env |
| mcp.json.example | MCP config → .vscode/mcp.json |
| env.example.txt | All env vars index |
Push to master → build image → GHCR → deploy to Hetzner (deploy-server.sh syncs compose + Caddyfile).
Secrets: HETZNER_HOST, HETZNER_USER, HETZNER_SSH_KEY
GitHub Actions deploys the server. Indexing is triggered by GitHub App push webhooks, not Actions.
┌─────────────────────────────────────────────────────────┐
│ INTERFACES │
│ │
│ Next.js Web UI (:3000) MCP Server REST API │
│ ├── Overview (stdio) (:8000) │
│ ├── Modules │ │ │
│ ├── Blast Radius │ │ │
│ ├── Reading Order Voice Interface │ │
│ ├── Execution Flow (mic → speak) │ │
│ ├── Code Review │ │ │
│ ├── Smart Reindex │ │ │
│ └── Chat ──────────────────┐ │ │ │
│ ▼ ▼ ▼ │
├──────────────────────────────────────────────────────────┤
│ AGENT LAYER │
│ │
│ LangGraph StateGraph ─── LLM (bind_tools) ───┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌─ 11 Agent Tools ─────────────────────────────┐ │
│ │ search_codebase get_overview │ │
│ │ get_module_info get_blast_radius_map │ │
│ │ explain_function get_reading_order │ │
│ │ get_execution_flow get_architecture_health │ │
│ │ load_guidelines apply_fix │ │
│ │ verify_fix │ │
│ └──────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────┤
│ INGESTION LAYER │
│ │
│ scanner.py ──► file_filter.py ──► tech_detect.py │
│ (file enum (skip rules (language/ │
│ & hashing) & safety net) framework id) │
├──────────────────────────────────────────────────────────┤
│ ANALYSIS LAYER │
│ │
│ code_parser.py ──► dependency_graph.py ──► module_ │
│ (tree-sitter (import extraction detector │
│ 15+ langs) → graph) │
│ │ │
│ ▼ │
│ blast_radius.py reading_order.py │
│ (BFS reverse (topological │
│ graph) sort) │
├──────────────────────────────────────────────────────────┤
│ GRAPH LAYER │
│ │
│ graph/store.py ──► graph/runtime.py │
│ (DuckDB 10-table (igraph C-speed │
│ persistent traversal: cycles, │
│ graph) centrality, paths) │
│ │
│ .codewalk/graph.duckdb ◄── files, imports, symbols, │
│ symbol_calls, chunks, │
│ modules, module_deps │
├──────────────────────────────────────────────────────────┤
│ REVIEW LAYER │
│ │
│ diff_parser.py → reviewers/ → pipeline/ → engine.py │
│ (git diff (pluggable (cluster/rank/ │
│ parsing) reviewers) verify/summary) │
│ │
│ report.py ────────► fix_applier.py │
│ (Finding dataclasses) (approved fix application) │
├──────────────────────────────────────────────────────────┤
│ EMBEDDING LAYER │
│ │
│ chunker.py ──► embedder.py ──► vector_store.py │
│ (smart code (Jina 1.5B (ChromaDB │
│ chunks) MPS/CUDA) persistent) │
├──────────────────────────────────────────────────────────┤
│ VOICE LAYER │
│ │
│ ┌── mic ──► stt.py ──► router.py ──► tool exec ──┐ │
│ │ sounddevice faster-whisper get_llm() │ │
│ │ (record) (transcribe) (route to tool) │ │
│ │ │ │
│ │ ┌─ content tool? ─┐ │ │
│ │ │ YES │ NO (admin) │ │
│ │ ▼ ▼ │ │
│ │ main LLM (get_llm()) return text only │ │
│ │ raw result → speech (no TTS) │ │
│ │ │ │ │
│ │ tts.py ◄── speech │ │
│ │ edge-tts (speak answer) │ │
│ └────────────────────────────────────────────── │
│ │
│ Voice Flow: │
│ 🔔 beep → 🎙️ record (30s max, 5s silence stop) │
│ → 📝 transcribe (faster-whisper, local) │
│ → 🧠 route (configured LLM picks the right tool + args) │
│ → ⚙️ execute tool │
│ → 🔇 admin tool? → text result only (silent) │
│ → 🔊 content tool? → main LLM → speech → edge-tts │
├──────────────────────────────────────────────────────────┤
│ CORE LAYER (v2.4–v2.7) │
│ │
│ core/reflect.py → Actor→Critic→Improve loop │
│ core/hitl.py → LangGraph interrupts + checkpoint │
│ core/fanout.py → Parallel fan-out/fan-in graphs │
│ │
│ Used by: agent (hitl), research (reflect + fanout) │
│ — generic, composable, zero duplication │
├──────────────────────────────────────────────────────────┤
│ LLM LAYER │
│ │
│ config.py ──► get_llm() factory │
│ Ollama │ OpenAI │ Anthropic │ Gemini │ Groq │ ... │
└──────────────────────────────────────────────────────────┘
codewalk/
├── src/codewalk/
│ ├── config.py # Settings + LLM provider factory
│ ├── pipeline.py # Orchestration (parallel embed)
│ ├── ingestion/ # File scanning & tech detection
│ │ ├── scanner.py # File enumeration
│ │ ├── file_filter.py # Skip rules (node_modules, etc.)
│ │ └── tech_detect.py # Language/framework detection
│ ├── analysis/ # Code parsing & dependency analysis
│ │ ├── code_parser.py # Tree-sitter (15+ languages)
│ │ ├── dependency_graph.py # Import extraction → graph
│ │ ├── module_detector.py # Auto-grouping into modules
│ │ ├── blast_radius.py # Change impact (BFS)
│ │ └── reading_order.py # Topological sort
│ ├── graph/ # Graph intelligence layer
│ │ ├── graph_store.py # DuckDB 10-table schema + stable hash IDs
│ │ └── graph_runtime.py # igraph: cycles, centrality, shortest path
│ ├── embeddings/ # Vectorization
│ │ ├── chunker.py # Code → chunks
│ │ ├── embedder.py # Chunks → vectors
│ │ └── vector_store.py # ChromaDB storage
│ ├── agent/ # LangGraph chat agent
│ │ ├── graph.py # StateGraph + fallback parser
│ │ ├── tools.py # 11 tool functions
│ │ └── prompts.py # System prompt
│ ├── rag/ # RAG pipeline
│ │ ├── chain.py # ask() + ask_corrective() (corrective RAG)
│ │ ├── retrieval_quality.py # Distance-based chunk filtering (free)
│ │ ├── answer_grader.py # LLM answer quality grading
│ │ └── query_rewriter.py # LLM query reformulation
│ ├── review/ # Code review pipeline
│ │ ├── engine.py # Main review orchestrator (run_review)
│ │ ├── report.py # Finding, ReviewReport, Severity, Category
│ │ ├── diff_parser.py # git diff → parsed DiffFile objects
│ │ ├── fix_applier.py # Apply approved fixes safely
│ │ ├── finding_store.py # Persist review findings
│ │ ├── session_store.py # Persist review sessions
│ │ ├── reviewers/ # Pluggable reviewers (generic, security, …)
│ │ ├── pipeline/ # Post-processing (cluster/rank/verify/summary)
│ │ └── renderers/ # Output formatters (markdown/cli/api)
│ ├── api/ # FastAPI REST
│ │ ├── main.py # 35+ endpoints
│ │ ├── models.py # Pydantic schemas
│ │ ├── state.py # Singleton app state + restart resilience
│ │ └── cloud.py # Cloud mode (GitHub App + webhooks)
│ ├── voice/ # Voice interface
│ │ ├── stt.py # Mic recording + faster-whisper transcription
│ │ ├── tts.py # edge-tts speech synthesis (thread-safe)
│ │ ├── router.py # LLM-based tool routing (via get_llm)
│ │ ├── backends.py # Tool execution bridge
│ │ └── companion.py # Standalone voice loop
│ ├── core/ # Reusable LangGraph patterns
│ │ ├── reflect.py # Actor→Critic→Improve loop
│ │ ├── hitl.py # Human-in-the-loop interrupts
│ │ └── fanout.py # Parallel fan-out/fan-in graphs
│ ├── research/ # Deep research mode
│ │ ├── deep_research.py # End-to-end deep research entry point
│ │ ├── researcher.py # Parallel search + synthesis
│ │ ├── planner.py # Decompose question into sub-questions
│ │ └── synthesizer.py # Merge parallel findings into report
│ ├── generation/ # Explanation / diagram generation
│ │ ├── overview_generator.py # Project overview text
│ │ ├── module_explainer.py # Module-level explanations
│ │ ├── diagram_generator.py # Mermaid diagrams
│ │ └── flow_generator.py # Execution flow diagrams
│ ├── doc_knowledge/ # Docs & guidelines indexing
│ │ ├── doc_parser.py # Parse .md/.pdf/.txt/.rst
│ │ └── doc_store.py # ChromaDB doc collection wrapper
│ ├── services/ # Deterministic service wrappers
│ │ ├── search_service.py # retrieval pipeline wrapper
│ │ └── symbol_service.py # symbol lookup wrapper
│ ├── tools/ # Agent / MCP tool implementations
│ │ ├── static_analysis.py # Static analysis runner
│ │ ├── test_runner.py # Test execution runner
│ │ └── tool_runner.py # Generic tool dispatch
│ ├── worker/ # Background cloud indexing worker
│ │ ├── indexer.py # Poll Postgres jobs, build indexes
│ │ ├── github_app.py # GitHub App token retrieval
│ │ └── atomic_store.py # Atomic index directory swap
│ ├── eval/ # Evaluation & benchmarking
│ │ ├── evaluator.py # RAGAS RAG evaluation
│ │ ├── experiments.py # A/B parameter sweeps
│ │ └── generate_multilang_review_fixtures.py # Review eval fixtures
│ ├── debug/ # Development/debug utilities
│ │ └── fanout_agent.py # Fan-out graph experiments
│ ├── cli.py # Command-line interface
│ └── mcp/ # Model Context Protocol
│ └── server.py # 38 MCP tools (stdio)
│
├── frontend/ # Next.js 14 web UI
│ └── src/app/
│ ├── page.tsx # Home (analyze form)
│ ├── chat/page.tsx # AI chat interface
│ ├── overview/page.tsx # Project overview
│ ├── modules/page.tsx # Module browser
│ ├── modules/[name]/page.tsx# Single module detail
│ ├── blast-radius/page.tsx # Change impact viewer
│ ├── reading-order/page.tsx # Reading order viewer
│ ├── execution-flow/page.tsx# Flow diagram viewer
│ ├── knowledge-graph/page.tsx# Interactive graph explorer
│ ├── review/page.tsx # Code review (diff/file/guidelines)
│ ├── voice/page.tsx # Voice assistant (mic → transcribe → speak)
│ ├── admin/page.tsx # Admin dashboard
│ ├── architecture/page.tsx # Architecture health viewer
│ ├── docs/page.tsx # Team docs search & ask
│ ├── research/page.tsx # Deep research interface
│ └── incremental-reindex/ # Smart reindex page
│ └── page.tsx
│
├── <target-repo>/.codewalk/
│ ├── chroma/ # ChromaDB persistent storage (per repo)
│ ├── graph.duckdb # DuckDB graph database (relationships)
│ ├── knowledge-graph.json # Serialized knowledge graph entities/relationships
│ └── manifest.json # Version tracking + index metadata (index_version, chunk_count)
│
├── deploy/ # Production deployment
│ ├── Dockerfile # Multi-stage Python 3.11 build
│ ├── docker-compose.yml # Postgres + API + Caddy orchestration
│ ├── Caddyfile # Reverse proxy config (IP or domain mode)
│ ├── hetzner-setup.sh # One-click Hetzner VPS provisioning
│ ├── DEPLOY.md # Full deployment guide
│ └── SERVER_OPS.md # Health, indexing, logs, SQL commands
│
├── .github/workflows/ # CI/CD
│ └── deploy.yml # Build Docker image → GHCR → deploy
│
├── requirements.txt # Python dependencies
├── codewalk.yaml # Per-repo config (branches, excludes)
├── env.example.txt # Environment variable index
├── env.server.example.txt # Hetzner server .env template
├── env.local.example.txt # Local dev .env template
├── FULL_SETUP_GUIDE.md # Complete cloud + MCP setup (step by step)
├── .env # Configuration (gitignored)
└── .vscode/mcp.json # MCP server config
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
LLM backend: ollama, openai, anthropic, gemini, groq, openrouter |
LLM_MODEL |
qwen3.5:27b |
Model name (must match provider) |
EMBEDDING_MODEL |
jinaai/jina-code-embeddings-1.5b |
Sentence-transformer model for code embeddings |
GROQ_API_KEY |
— | Groq API key |
OPENAI_API_KEY |
— | OpenAI API key |
ANTHROPIC_API_KEY |
— | Anthropic API key |
GOOGLE_API_KEY |
— | Google Gemini API key |
OPENROUTER_API_KEY |
— | OpenRouter API key |
POSTGRES_PASSWORD |
— | Postgres password (for Docker/server deployment) |
CORS_ORIGINS |
* |
Comma-separated allowed origins (e.g. https://yourdomain.com) |
RATE_LIMIT_REQUESTS |
60 |
Max requests per IP per window |
RATE_LIMIT_WINDOW |
60 |
Rate limit window in seconds |
INDEX_STORAGE_PATH |
/var/codewalk |
Path for ChromaDB/DuckDB data (Docker default) |
GITHUB_APP_ID |
— | GitHub App ID (server cloud mode) |
GITHUB_APP_PRIVATE_KEY_PATH |
— | PEM path inside container, e.g. /var/codewalk/secrets/key.pem |
GITHUB_WEBHOOK_SECRET |
— | Must match GitHub App webhook secret |
ADMIN_API_KEY |
— | X-Admin-Key header for /admin/* routes |
CODEWALK_SERVER_URL |
— | MCP: cloud API URL, e.g. https://api.codewalk.xyz |
CODEWALK_REPO_NAME |
— | MCP: owner/repo slug |
CODEWALK_REPO_TOKEN |
— | MCP: per-repo download token (cw_repo_...) |
| Provider | Set LLM_PROVIDER= |
API Key | Notes |
|---|---|---|---|
| Ollama | ollama |
None | Fully local, no internet. Run ollama serve first |
| OpenAI | openai |
OPENAI_API_KEY |
GPT models, etc. |
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
Claude models |
| Google Gemini | gemini |
GOOGLE_API_KEY |
Gemini models |
| Groq | groq |
GROQ_API_KEY |
Groq models |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
Access to 100+ models |
| DeepSeek | deepseek |
DEEPSEEK_API_KEY |
DeepSeek V3, R1 models |
To wipe all indexed data and start fresh, delete the .codewalk/chroma/ directory inside the target repo:
# From the target repo root:
rm -rf .codewalk/chroma/This removes all embedded chunks and collections. Next time you run codewalk_analyze_codebase (MCP) or POST /analyze (API), it will re-index from scratch.
When to do this:
- You switched to a different repo and want a clean index
- Embeddings seem stale or corrupted
- You changed the embedding model and need to re-embed everything
- You want to use
index_mode: "full"but it's still picking up old data
Codewalk stores its index data inside each target repo at .codewalk/ (ChromaDB embeddings, DuckDB graph, version metadata). This directory should not be committed to version control.
Add this to each target repo's .gitignore:
# Codewalk index (auto-generated)
.codewalk/This is only needed in the target repo you're analyzing, not in the codewalk repo itself.
| Layer | Technology |
|---|---|
| Backend | Python 3.10+, FastAPI, Uvicorn |
| Agent | LangGraph, LangChain |
| Vector DB | ChromaDB (persistent, per-repo at .codewalk/chroma/) |
| Graph DB | DuckDB (persistent, per-repo at .codewalk/graph.duckdb) — Why DuckDB over SQLite? |
| Graph Runtime | igraph (C-speed traversal, in-memory from DuckDB) |
| Voice STT | faster-whisper (local, small model, int8) |
| Voice TTS | edge-tts (free, en-US-AriaNeural) |
| Voice Router | User's configured LLM (via get_llm()) |
| Embeddings | Jina Code Embeddings 1.5B (768-dim, Ollama/MPS) |
| Code Parsing | Tree-sitter (15+ language grammars) |
| Frontend | Next.js 14, React 18, TypeScript 5 |
| Styling | Tailwind CSS, shadcn/ui |
| Diagrams | Mermaid.js |
| MCP | Model Context Protocol (stdio transport) |
Codewalk holds one repo's state in memory at a time (vector store, dependency graph, module map, repo path). Using more than one repo in the same process can override and corrupt that state — index files, cached graph handles, and the active repo path can end up pointing at the wrong workspace.
| Interface | Multi-repo behavior |
|---|---|
| MCP (stdio) | ✅ Safe per connection. Each MCP connection spawns a separate Python process, so two repos in two editor windows are isolated. However, do not point the same running MCP server at multiple repos or switch workspaces rapidly within the same process — that will corrupt in-memory state. |
| FastAPI (REST) | /analyze calls for different repos will race — whoever finishes last overwrites the shared globals. Only one repo at a time. |
| Web UI |
This is by design, not a bug. Codewalk is optimized for the common case: one developer, one repo at a time. If you need concurrent multi-repo support on the API side, it would require a dict[repo_path, SessionState] architecture — contributions welcome.
MCP users: You're already safe as long as each repo gets its own MCP server process (one VS Code window / Claude Code session / Cursor instance per repo). Do not route commands for different repos into the same stdio connection or reuse one server process across workspaces.
- Fork this repo
- Clone your fork:
git clone https://github.com/<your-username>/codewalk.git - Create a branch:
git checkout -b feat/my-feature - Make your changes and test them
- Commit:
git commit -m "feat: add my feature" - Push:
git push origin feat/my-feature - Open a Pull Request against
master
All contributions welcome — bug fixes, new language support, UI improvements, docs, anything.
Found a bug? Open an issue with screenshots, error logs, or references — it helps us fix it faster.
⭐ If you find Codewalk useful, give it a star — it helps others discover it!
Built by gupta29470
LinkedIn · Twitter/X



