This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
IMPORTANT: This project uses uv as the package manager. Always use uv commands - never use pip directly.
./run.shThis starts the FastAPI server on port 8000 with auto-reload enabled. The application will:
- Load course documents from
docs/folder - Process them into 800-char chunks with 100-char overlap
- Create/load ChromaDB embeddings (first run downloads 90MB embedding model)
- Serve the web interface at http://localhost:8000
The run.sh script uses uv run internally.
cd backend
uv run uvicorn app:app --reload --port 8000uv sync# Add a new package
uv add package-name
# Add a dev dependency
uv add --dev package-name# Always use uv run to execute Python code
uv run python script.py
# NOT: python script.py
# NOT: pip install ...Create .env file with:
ANTHROPIC_API_KEY=sk-ant-api03-...
This is a tool-based RAG system where Claude decides when to search, not a traditional "always search" RAG.
Query Processing Flow:
- User query → FastAPI endpoint (
/api/query) - RAG System orchestrates the flow
- First Claude API call: Claude receives query + tool definition, decides if search is needed
- If search needed: Tool execution → Vector search → Format results
- Second Claude API call: Claude receives search results, synthesizes final answer
- Response + sources returned to frontend
Key Insight: There are two Claude API calls per query - one for decision-making, one for synthesis.
Frontend (frontend/)
- Vanilla JS (no framework)
- Uses
marked.jsfor markdown rendering - Session-based conversation tracking
- Displays collapsible source citations
Backend (backend/)
- app.py: FastAPI server, REST endpoints, startup document loading
- rag_system.py: Main orchestrator - coordinates all components
- ai_generator.py: Claude API wrapper with tool calling support
- System prompt defines search behavior (one search max, no meta-commentary)
- Handles two-phase tool execution (request → execute → synthesize)
- vector_store.py: ChromaDB interface with two collections
course_catalog: For fuzzy course name matching (e.g., "MCP" → full title)course_content: Actual content chunks for semantic search
- document_processor.py: Parses structured course documents into chunks
- Sentence-based chunking (preserves semantic boundaries)
- Adds context prefixes: "Course X Lesson Y content: ..."
- search_tools.py: Tool abstraction layer
CourseSearchTool: Implements search with course/lesson filteringToolManager: Registers and routes tool calls from Claude
- session_manager.py: Conversation history (max 2 exchanges by default)
- config.py: Centralized configuration (see below)
Important: Course.title is used as the unique identifier throughout the system.
- Course: Contains title (ID), instructor, link, and list of Lessons
- Lesson: Contains lesson_number, title, and link
- CourseChunk: Contains content, course_title (FK), lesson_number, chunk_index
Two-Collection Architecture:
-
course_catalog collection:
- Purpose: Fuzzy course name resolution
- Documents: "Course: {title} taught by {instructor}" + lesson entries
- Used when user says "MCP course" → resolves to full title
-
course_content collection:
- Purpose: Semantic search of actual content
- Documents: Text chunks with context prefixes
- Metadata: course_title, lesson_number, chunk_index, links
- Filtering: Can filter by exact course_title AND/OR lesson_number
Search Flow:
- If course_name provided: Query
course_catalogto resolve fuzzy name - Build ChromaDB filter:
{"$and": [{"course_title": "X"}, {"lesson_number": Y}]} - Query
course_contentwith semantic search + filters - Return top 5 chunks by cosine similarity
Course documents in docs/ must follow this structure:
Course Title: [title]
Course Link: [url]
Course Instructor: [name]
Lesson 0: [title]
Lesson Link: [url]
[content...]
Lesson 1: [title]
Lesson Link: [url]
[content...]
The parser (document_processor.py) extracts this metadata and creates chunks with context.
Key settings to be aware of:
ANTHROPIC_MODEL: "claude-sonnet-4-20250514" (Claude Sonnet 4)EMBEDDING_MODEL: "all-MiniLM-L6-v2" (384-dim vectors)CHUNK_SIZE: 800 chars (with CHUNK_OVERLAP: 100 chars)MAX_RESULTS: 5 search results returned to ClaudeMAX_HISTORY: 2 conversation exchanges kept in contextCHROMA_PATH: "./chroma_db" (persistent vector storage)
The system prompt in ai_generator.py defines critical behavior:
- Use search tool ONLY for course-specific questions
- One search per query maximum (prevents multiple searches)
- No meta-commentary (no "based on the search results" phrases)
- Responses must be: brief, educational, clear, example-supported
Sessions track conversation history:
- Session ID created on first query (e.g., "session_1")
- Stores last
MAX_HISTORY * 2messages (user + assistant pairs) - History formatted as: "User: ...\nAssistant: ...\n..." for context
- Appended to system prompt on subsequent queries in same session
POST /api/query
- Request:
{ "query": "...", "session_id": "session_1" (optional) } - Response:
{ "answer": "...", "sources": ["..."], "session_id": "..." } - Creates session if not provided
GET /api/courses
- Response:
{ "total_courses": 4, "course_titles": ["..."] } - Used by frontend sidebar
- First run: Downloads embedding model, creates collections, processes documents (~30-60 seconds)
- Subsequent runs: Loads existing ChromaDB from
./chroma_db(fast startup) - Documents only reprocessed if course title doesn't exist in catalog
- To rebuild: Delete
./chroma_dbfolder and restart
Adding New Documents:
- Place
.txt,.pdf, or.docxfiles indocs/folder - Follow the document format structure above
- Restart server - documents auto-loaded on startup
- Check logs for: "Added new course: X (Y chunks)"
Modifying Chunk Size:
- Edit
config.py:CHUNK_SIZEandCHUNK_OVERLAP - Delete
./chroma_dbfolder to force reprocessing - Restart application
Debugging Search:
- Search tool tracks sources in
last_sourcesattribute - Sources shown in UI as collapsible section
- Check
vector_store.pyfor filter logic
Conversation Context:
- Modify
MAX_HISTORYinconfig.pyto change context window - History is string-formatted and prepended to system prompt
- Trade-off: More history = more context but higher token usage
This system is NOT a traditional RAG where every query triggers a search. Instead:
- Claude analyzes each query and decides if search is warranted
- General knowledge questions answered without search
- Course-specific questions trigger tool use
- This reduces unnecessary vector searches and improves response quality
Frontend maintains:
- Current session_id in memory
- Sends with each query for conversation continuity
Backend returns:
- answer: The synthesized response from Claude
- sources: List of "Course Title - Lesson N" strings for UI
- session_id: Same or newly created session ID
Source Tracking:
- Search tool stores sources during execution
- RAG system retrieves after AI generation completes
- Sources reset after each query to prevent leakage