Upload any PDF and query it in natural language. Returns grounded answers with exact source page references.
Built to explore RAG (Retrieval Augmented Generation), a core pattern for serving AI over internal documents without hallucination.
- Upload any PDF via API
- System chunks, embeds, and indexes it locally using FAISS
- Ask any question in natural language
- Get a answer with the exact page numbers it came from
If the answer isn't in the document, the system says so, it doesn't hallucinate.
PDF Upload → PyPDF Loader → Text Chunker (1000 chars, 200 overlap)
↓
HuggingFace Embeddings (local, free)
↓
FAISS Vector Store
↓
User Question → Embed Question → Similarity Search → Top 4 Chunks
↓
Groq (llama-3.1-8b-instant)
↓
Grounded Answer + Source Pages
Built using LCEL (LangChain Expression Language) — LangChain's modern chain composition pattern.
LLMs hallucinate when asked about information they weren't trained on. RAG solves this by retrieving relevant chunks from a document at inference time and grounding the LLM's answer in that context only.
This means:
- Answers are traceable to specific pages
- The model won't make things up
- No retraining needed when documents change
Documents are split into 1000-character chunks with 200-character overlap. Without overlap, a sentence spanning a chunk boundary loses context at the edges. Overlap ensures meaning is preserved across chunk boundaries.
Upload:
POST /upload → { "session_id": "Honasa", "status": "ready" }
Query:
{
"session_id": "Honasa",
"question": "What is the internship stipend?"
}Response:
{
"answer": "The internship stipend is INR 50,000 per month.",
"source_pages": [7],
"latency_seconds": 1.8
}When answer isn't in the document:
{
"answer": "I don't have enough information to answer this.",
"source_pages": [],
"latency_seconds": 1.2
}Upload a PDF for indexing.
Request: multipart/form-data with a PDF file
Response:
{
"session_id": "filename",
"status": "ready"
}Ask a question against an uploaded PDF.
Request:
{
"session_id": "filename",
"question": "your question here"
}Response:
{
"answer": "...",
"source_pages": [3, 7],
"latency_seconds": 1.8
}{
"status": "ok",
"service": "rag-chatbot"
}git clone https://github.com/Shreshthaaa/rag-chatbot
cd rag-chatbot
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Add your Groq API key to .env
cd app
uvicorn main:app --reloadOpen http://localhost:8000/docs for the interactive API UI.
docker build -t rag-chatbot .
docker run -p 8000:8000 --env-file .env rag-chatbotGROQ_API_KEY=your_groq_key_here
Embeddings run fully locally via HuggingFace all-MiniLM-L6-v2.
| Tool | Purpose |
|---|---|
| Python | Core language |
| LangChain (LCEL) | RAG pipeline composition |
| HuggingFace (all-MiniLM-L6-v2) | Local embeddings, zero API cost |
| FAISS | Local vector store for similarity search |
| Groq (llama-3.1-8b-instant) | LLM for answer generation |
| FastAPI | API layer |
| Docker | Containerization |
rag-chatbot/
├── app/
│ ├── main.py # FastAPI app + endpoints
│ ├── rag_pipeline.py # Ingestion + LCEL chain
│ ├── schemas.py # Pydantic request/response models
│ └── data/ # Uploaded PDFs (gitignored)
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md
This is a prototype. In a production system I would:
- Vector store — replace FAISS with Vertex AI Vector Search or Pinecone for multi-user scale
- Session management — replace in-memory dict with Redis so sessions persist across restarts
- Authentication — API key auth on all endpoints
- Async ingestion — large PDFs block the upload endpoint; use a background task queue (Celery/ARQ) instead
- Eval logging — track answer quality over time, flag low-confidence responses for human review
- Multi-format support — extend beyond PDF to .docx, .csv, web URLs
- Monitoring — latency, token usage, error rates via LangSmith or Prometheus