Skip to content

Shreshthaaa/rag-chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Document Chatbot

Upload any PDF and query it in natural language. Returns grounded answers with exact source page references.

Built to explore RAG (Retrieval Augmented Generation), a core pattern for serving AI over internal documents without hallucination.


What it does

  1. Upload any PDF via API
  2. System chunks, embeds, and indexes it locally using FAISS
  3. Ask any question in natural language
  4. Get a answer with the exact page numbers it came from

If the answer isn't in the document, the system says so, it doesn't hallucinate.


Architecture

PDF Upload → PyPDF Loader → Text Chunker (1000 chars, 200 overlap)
                                        ↓
                            HuggingFace Embeddings (local, free)
                                        ↓
                                 FAISS Vector Store
                                        ↓
User Question → Embed Question → Similarity Search → Top 4 Chunks
                                        ↓
                            Groq (llama-3.1-8b-instant)
                                        ↓
                          Grounded Answer + Source Pages

Built using LCEL (LangChain Expression Language) — LangChain's modern chain composition pattern.


Why RAG?

LLMs hallucinate when asked about information they weren't trained on. RAG solves this by retrieving relevant chunks from a document at inference time and grounding the LLM's answer in that context only.

This means:

  • Answers are traceable to specific pages
  • The model won't make things up
  • No retraining needed when documents change

Why chunk_overlap matters

Documents are split into 1000-character chunks with 200-character overlap. Without overlap, a sentence spanning a chunk boundary loses context at the edges. Overlap ensures meaning is preserved across chunk boundaries.


Real output example

Upload:

POST /upload → { "session_id": "Honasa", "status": "ready" }

Query:

{
  "session_id": "Honasa",
  "question": "What is the internship stipend?"
}

Response:

{
  "answer": "The internship stipend is INR 50,000 per month.",
  "source_pages": [7],
  "latency_seconds": 1.8
}

When answer isn't in the document:

{
  "answer": "I don't have enough information to answer this.",
  "source_pages": [],
  "latency_seconds": 1.2
}

API Endpoints

POST /upload

Upload a PDF for indexing.

Request: multipart/form-data with a PDF file

Response:

{
  "session_id": "filename",
  "status": "ready"
}

POST /query

Ask a question against an uploaded PDF.

Request:

{
  "session_id": "filename",
  "question": "your question here"
}

Response:

{
  "answer": "...",
  "source_pages": [3, 7],
  "latency_seconds": 1.8
}

GET /health

{
  "status": "ok",
  "service": "rag-chatbot"
}

How to Run

Without Docker

git clone https://github.com/Shreshthaaa/rag-chatbot
cd rag-chatbot
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Add your Groq API key to .env
cd app
uvicorn main:app --reload

Open http://localhost:8000/docs for the interactive API UI.

With Docker

docker build -t rag-chatbot .
docker run -p 8000:8000 --env-file .env rag-chatbot

Environment Variables

GROQ_API_KEY=your_groq_key_here

Embeddings run fully locally via HuggingFace all-MiniLM-L6-v2.


Tech Stack

Tool Purpose
Python Core language
LangChain (LCEL) RAG pipeline composition
HuggingFace (all-MiniLM-L6-v2) Local embeddings, zero API cost
FAISS Local vector store for similarity search
Groq (llama-3.1-8b-instant) LLM for answer generation
FastAPI API layer
Docker Containerization

Project Structure

rag-chatbot/
├── app/
│   ├── main.py            # FastAPI app + endpoints
│   ├── rag_pipeline.py    # Ingestion + LCEL chain
│   ├── schemas.py         # Pydantic request/response models
│   └── data/              # Uploaded PDFs (gitignored)
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md

Production Considerations

This is a prototype. In a production system I would:

  • Vector store — replace FAISS with Vertex AI Vector Search or Pinecone for multi-user scale
  • Session management — replace in-memory dict with Redis so sessions persist across restarts
  • Authentication — API key auth on all endpoints
  • Async ingestion — large PDFs block the upload endpoint; use a background task queue (Celery/ARQ) instead
  • Eval logging — track answer quality over time, flag low-confidence responses for human review
  • Multi-format support — extend beyond PDF to .docx, .csv, web URLs
  • Monitoring — latency, token usage, error rates via LangSmith or Prometheus

About

RAG pipeline that lets you upload any PDF and query it in natural language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors