RAG Document Chatbot

Upload any PDF and query it in natural language. Returns grounded answers with exact source page references.

Built to explore RAG (Retrieval Augmented Generation), a core pattern for serving AI over internal documents without hallucination.

What it does

Upload any PDF via API
System chunks, embeds, and indexes it locally using FAISS
Ask any question in natural language
Get a answer with the exact page numbers it came from

If the answer isn't in the document, the system says so, it doesn't hallucinate.

Architecture

PDF Upload → PyPDF Loader → Text Chunker (1000 chars, 200 overlap)
                                        ↓
                            HuggingFace Embeddings (local, free)
                                        ↓
                                 FAISS Vector Store
                                        ↓
User Question → Embed Question → Similarity Search → Top 4 Chunks
                                        ↓
                            Groq (llama-3.1-8b-instant)
                                        ↓
                          Grounded Answer + Source Pages

Built using LCEL (LangChain Expression Language) — LangChain's modern chain composition pattern.

Why RAG?

LLMs hallucinate when asked about information they weren't trained on. RAG solves this by retrieving relevant chunks from a document at inference time and grounding the LLM's answer in that context only.

This means:

Answers are traceable to specific pages
The model won't make things up
No retraining needed when documents change

Why chunk_overlap matters

Documents are split into 1000-character chunks with 200-character overlap. Without overlap, a sentence spanning a chunk boundary loses context at the edges. Overlap ensures meaning is preserved across chunk boundaries.

Real output example

Upload:

POST /upload → { "session_id": "Honasa", "status": "ready" }

Query:

{
  "session_id": "Honasa",
  "question": "What is the internship stipend?"
}

Response:

{
  "answer": "The internship stipend is INR 50,000 per month.",
  "source_pages": [7],
  "latency_seconds": 1.8
}

When answer isn't in the document:

{
  "answer": "I don't have enough information to answer this.",
  "source_pages": [],
  "latency_seconds": 1.2
}

API Endpoints

POST `/upload`

Upload a PDF for indexing.

Request: multipart/form-data with a PDF file

Response:

{
  "session_id": "filename",
  "status": "ready"
}

POST `/query`

Ask a question against an uploaded PDF.

Request:

{
  "session_id": "filename",
  "question": "your question here"
}

Response:

{
  "answer": "...",
  "source_pages": [3, 7],
  "latency_seconds": 1.8
}

GET `/health`

{
  "status": "ok",
  "service": "rag-chatbot"
}

How to Run

Without Docker

git clone https://github.com/Shreshthaaa/rag-chatbot
cd rag-chatbot
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Add your Groq API key to .env
cd app
uvicorn main:app --reload

Open http://localhost:8000/docs for the interactive API UI.

With Docker

docker build -t rag-chatbot .
docker run -p 8000:8000 --env-file .env rag-chatbot

Environment Variables

GROQ_API_KEY=your_groq_key_here

Embeddings run fully locally via HuggingFace all-MiniLM-L6-v2.

Tech Stack

Tool	Purpose
Python	Core language
LangChain (LCEL)	RAG pipeline composition
HuggingFace (all-MiniLM-L6-v2)	Local embeddings, zero API cost
FAISS	Local vector store for similarity search
Groq (llama-3.1-8b-instant)	LLM for answer generation
FastAPI	API layer
Docker	Containerization

Project Structure

rag-chatbot/
├── app/
│   ├── main.py            # FastAPI app + endpoints
│   ├── rag_pipeline.py    # Ingestion + LCEL chain
│   ├── schemas.py         # Pydantic request/response models
│   └── data/              # Uploaded PDFs (gitignored)
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md

Production Considerations

This is a prototype. In a production system I would:

Vector store — replace FAISS with Vertex AI Vector Search or Pinecone for multi-user scale
Session management — replace in-memory dict with Redis so sessions persist across restarts
Authentication — API key auth on all endpoints
Async ingestion — large PDFs block the upload endpoint; use a background task queue (Celery/ARQ) instead
Eval logging — track answer quality over time, flag low-confidence responses for human review
Multi-format support — extend beyond PDF to .docx, .csv, web URLs
Monitoring — latency, token usage, error rates via LangSmith or Prometheus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Document Chatbot

What it does

Architecture

Why RAG?

Why chunk_overlap matters

Real output example

API Endpoints

POST `/upload`

POST `/query`

GET `/health`

How to Run

Without Docker

With Docker

Environment Variables

Tech Stack

Project Structure

Production Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RAG Document Chatbot

What it does

Architecture

Why RAG?

Why chunk_overlap matters

Real output example

API Endpoints

POST /upload

POST /query

GET /health

How to Run

Without Docker

With Docker

Environment Variables

Tech Stack

Project Structure

Production Considerations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/upload`

POST `/query`

GET `/health`

Packages