RAG Chat with PDF (First RAG Project)

This project is practical implementation of Retrieval-Augmented Generation (RAG). The goal is simple: upload/use a PDF as knowledge, then let users chat with an AI assistant grounded in that PDF.

1. Problem Statement

A base LLM is powerful, but it does not automatically know your private PDF content. If the user asks specific questions (for example, "what is in case #32?"), the model may hallucinate or answer vaguely without data grounding.

flowchart LR
    D[Large set of PDF pages] --> LLM[LLM]
    U[User asks a specific question] --> LLM
    LLM --> R[Response]
    N[No built-in context about your private docs] -.-> LLM

2. Naive Retrieval-Based Approach (Why It Breaks)

A naive approach is: extract all text and send everything into the prompt every time.

Problems:

High token cost
Context window limits
Slower responses
Unnecessary irrelevant text in prompt

flowchart LR
    P[All PDF text] --> S[System Prompt + Full Text]
    Q[User Query] --> S
    S --> LLM[LLM]
    LLM --> A[Answer]

    C1[High cost] -.-> S
    C2[Context window limit] -.-> S

3. RAG Pipeline Overview

RAG solves this by splitting work into two phases:

Indexing phase
Retrieval + generation phase

flowchart LR
    subgraph Indexing
        A[PDF] --> B[Chunking]
        B --> C[Embedding Model]
        C --> D[(Vector DB / Qdrant)]
    end

    subgraph Query Time
        Q[User Query] --> QE[Query Embedding]
        QE --> D
        D --> R[Top-K Relevant Chunks]
        R --> G[LLM Answer Generation]
        Q --> G
    end

4. Indexing Phase (Implemented in `index.py`)

In this project:

Load PDF (PyPDFLoader)
Split text into chunks (RecursiveCharacterTextSplitter)
Convert chunks to vectors (OpenAIEmbeddings)
Store vectors + metadata in Qdrant (QdrantVectorStore)

Metadata (like page number and source file) is preserved so answers can point users to where information came from.

flowchart LR
    PDF[PDF Document] --> SPLIT[Chunking]
    SPLIT --> CH1[Chunk A + metadata]
    SPLIT --> CH2[Chunk B + metadata]
    SPLIT --> CH3[Chunk C + metadata]
    CH1 --> EMB[Embedding Model]
    CH2 --> EMB
    CH3 --> EMB
    EMB --> VDB[(Qdrant Vector DB)]

5. Retrieval + Answering Phase (Implemented in `chat.py`)

At query time:

User asks a question
Query is embedded
Similar chunks are retrieved from Qdrant
Retrieved context is injected into prompt
LLM answers based on that context

flowchart LR
    U[User Query] --> E[Query Embedding]
    E --> VDB[(Qdrant Vector DB)]
    VDB --> K[Top Relevant Chunks]
    K --> P[Prompt with Context]
    U --> P
    P --> LLM[GPT Model]
    LLM --> OUT[Grounded Answer + Page References]

6. Project Files

index.py: indexing pipeline (PDF -> chunks -> embeddings -> Qdrant)
chat.py: retrieval + answer generation loop
constants.py: shared constants (Qdrant URL, collection name, embedding model)
.env.example: required environment variables template

7. Environment Variables

Use .env based on .env.example:

OPENAI_API_KEY=your_openai_api_key_here

8. Run Steps

Start Qdrant:

docker compose up -d

Build the vector index:

python3 index.py

Start chat:

python3s chat.py

9. What I Learned (First RAG)

LLM alone is not enough for private knowledge grounding.
Sending full documents in every prompt is expensive and limited.
RAG gives a scalable pattern: index once, retrieve relevant chunks, generate grounded answers.
Good chunking and metadata quality directly improve answer quality.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
chat.py		chat.py
constants.py		constants.py
docker-compose.yml		docker-compose.yml
index.py		index.py
nodejs.pdf		nodejs.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Chat with PDF (First RAG Project)

1. Problem Statement

2. Naive Retrieval-Based Approach (Why It Breaks)

3. RAG Pipeline Overview

4. Indexing Phase (Implemented in `index.py`)

5. Retrieval + Answering Phase (Implemented in `chat.py`)

6. Project Files

7. Environment Variables

8. Run Steps

9. What I Learned (First RAG)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Chat with PDF (First RAG Project)

1. Problem Statement

2. Naive Retrieval-Based Approach (Why It Breaks)

3. RAG Pipeline Overview

4. Indexing Phase (Implemented in index.py)

5. Retrieval + Answering Phase (Implemented in chat.py)

6. Project Files

7. Environment Variables

8. Run Steps

9. What I Learned (First RAG)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4. Indexing Phase (Implemented in `index.py`)

5. Retrieval + Answering Phase (Implemented in `chat.py`)

Packages