CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Essential Commands

Development Setup

# Install dependencies
uv sync

# Add new dependencies
uv add package_name

# Set up environment variables
cp .env.example .env
# Edit .env to add your ANTHROPIC_API_KEY

Running the Application

# Quick start (recommended)
chmod +x run.sh
./run.sh

# Manual start
cd backend && uv run uvicorn app:app --reload --port 8000

Development Commands

# Run from backend directory
cd backend && uv run uvicorn app:app --reload --port 8000

# Test API endpoints directly
curl http://localhost:8000/api/courses
curl -X POST http://localhost:8000/api/query -H "Content-Type: application/json" -d '{"query":"your question here"}'

Architecture Overview

This is a RAG (Retrieval-Augmented Generation) System with a three-layer architecture:

Core RAG Pipeline

Document Processing: Course transcripts → chunked text with metadata
Vector Storage: ChromaDB stores embeddings for semantic search
AI Generation: Claude API generates contextual responses using retrieved content
Tool Integration: AI can dynamically search the knowledge base using tools

Key Components

RAGSystem (rag_system.py) - Central orchestrator that coordinates:

Document processing and chunking
Vector storage operations
AI response generation with tool access
Session management for conversation history

Tool-Based Search Architecture - The system uses a tool-based approach where:

ToolManager registers and manages available tools
CourseSearchTool performs content searches within course materials
CourseOutlineTool retrieves complete course structures with lesson lists
Claude API calls tools dynamically during response generation
Tools return sources that are tracked and returned to frontend with links

Data Models (models.py):

Course: Contains title, instructor, lessons list
CourseChunk: Text chunks with course/lesson metadata for vector storage
Lesson: Individual lessons with titles and optional links

Configuration (`config.py`)

Key settings:

CHUNK_SIZE: 800 - Text chunk size for vector storage
CHUNK_OVERLAP: 100 - Overlap between chunks
MAX_RESULTS: 5 - Vector search result limit
MAX_HISTORY: 2 - Conversation memory depth

Data Flow

Course documents in docs/ are processed into CourseChunk objects
Chunks are embedded and stored in ChromaDB (./chroma_db/)
User queries trigger tool-based searches via Claude API
Retrieved chunks provide context for AI response generation
Session history maintains conversation continuity

Frontend Integration

FastAPI serves both API endpoints (/api/*) and static frontend files
Frontend communicates via /api/query for chat and /api/courses for statistics
CORS configured for development with live reload support

Environment Requirements

Required environment variable:

ANTHROPIC_API_KEY=your_anthropic_api_key_here

The system expects course documents in docs/ folder as .txt, .pdf, or .docx files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Essential Commands

Development Setup

Running the Application

Development Commands

Architecture Overview

Core RAG Pipeline

Key Components

Configuration (`config.py`)

Data Flow

Frontend Integration

Environment Requirements

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Essential Commands

Development Setup

Running the Application

Development Commands

Architecture Overview

Core RAG Pipeline

Key Components

Configuration (config.py)

Data Flow

Frontend Integration

Environment Requirements

Configuration (`config.py`)