Skip to content

fishdev20/rag-chat

Repository files navigation

RAG Chat with PDF (First RAG Project)

This project is practical implementation of Retrieval-Augmented Generation (RAG). The goal is simple: upload/use a PDF as knowledge, then let users chat with an AI assistant grounded in that PDF.

1. Problem Statement

A base LLM is powerful, but it does not automatically know your private PDF content. If the user asks specific questions (for example, "what is in case #32?"), the model may hallucinate or answer vaguely without data grounding.

flowchart LR
    D[Large set of PDF pages] --> LLM[LLM]
    U[User asks a specific question] --> LLM
    LLM --> R[Response]
    N[No built-in context about your private docs] -.-> LLM
Loading

2. Naive Retrieval-Based Approach (Why It Breaks)

A naive approach is: extract all text and send everything into the prompt every time.

Problems:

  • High token cost
  • Context window limits
  • Slower responses
  • Unnecessary irrelevant text in prompt
flowchart LR
    P[All PDF text] --> S[System Prompt + Full Text]
    Q[User Query] --> S
    S --> LLM[LLM]
    LLM --> A[Answer]

    C1[High cost] -.-> S
    C2[Context window limit] -.-> S
Loading

3. RAG Pipeline Overview

RAG solves this by splitting work into two phases:

  • Indexing phase
  • Retrieval + generation phase
flowchart LR
    subgraph Indexing
        A[PDF] --> B[Chunking]
        B --> C[Embedding Model]
        C --> D[(Vector DB / Qdrant)]
    end

    subgraph Query Time
        Q[User Query] --> QE[Query Embedding]
        QE --> D
        D --> R[Top-K Relevant Chunks]
        R --> G[LLM Answer Generation]
        Q --> G
    end
Loading

4. Indexing Phase (Implemented in index.py)

In this project:

  1. Load PDF (PyPDFLoader)
  2. Split text into chunks (RecursiveCharacterTextSplitter)
  3. Convert chunks to vectors (OpenAIEmbeddings)
  4. Store vectors + metadata in Qdrant (QdrantVectorStore)

Metadata (like page number and source file) is preserved so answers can point users to where information came from.

flowchart LR
    PDF[PDF Document] --> SPLIT[Chunking]
    SPLIT --> CH1[Chunk A + metadata]
    SPLIT --> CH2[Chunk B + metadata]
    SPLIT --> CH3[Chunk C + metadata]
    CH1 --> EMB[Embedding Model]
    CH2 --> EMB
    CH3 --> EMB
    EMB --> VDB[(Qdrant Vector DB)]
Loading

5. Retrieval + Answering Phase (Implemented in chat.py)

At query time:

  1. User asks a question
  2. Query is embedded
  3. Similar chunks are retrieved from Qdrant
  4. Retrieved context is injected into prompt
  5. LLM answers based on that context
flowchart LR
    U[User Query] --> E[Query Embedding]
    E --> VDB[(Qdrant Vector DB)]
    VDB --> K[Top Relevant Chunks]
    K --> P[Prompt with Context]
    U --> P
    P --> LLM[GPT Model]
    LLM --> OUT[Grounded Answer + Page References]
Loading

6. Project Files

  • index.py: indexing pipeline (PDF -> chunks -> embeddings -> Qdrant)
  • chat.py: retrieval + answer generation loop
  • constants.py: shared constants (Qdrant URL, collection name, embedding model)
  • .env.example: required environment variables template

7. Environment Variables

Use .env based on .env.example:

OPENAI_API_KEY=your_openai_api_key_here

8. Run Steps

  1. Start Qdrant:
docker compose up -d
  1. Build the vector index:
python3 index.py
  1. Start chat:
python3s chat.py

9. What I Learned (First RAG)

  • LLM alone is not enough for private knowledge grounding.
  • Sending full documents in every prompt is expensive and limited.
  • RAG gives a scalable pattern: index once, retrieve relevant chunks, generate grounded answers.
  • Good chunking and metadata quality directly improve answer quality.

About

This project is practical implementation of Retrieval-Augmented Generation (RAG). The goal is simple: upload/use a PDF as knowledge, then let users chat with an AI assistant grounded in that PDF.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages