Skip to content

hinata-devcode/RAG-Application

Repository files navigation

🚀 Enterprise RAG Architecture: RazorpayX AI Support

An enterprise-grade, multi-tenant Retrieval-Augmented Generation (RAG) system. This project demonstrates a highly scalable Modular Monolith backend architecture, resilient LLM fallback patterns, and strict Role-Based Access Control (RBAC) enforced at the vector database layer.

🏗 System Architecture

The application is decoupled into a clear Client-Server model to ensure strict separation of concerns, maintainability, and API-first design.

1. The Backend (FastAPI Modular Monolith)

  • API Gateway & Routing: Utilizes FastAPI APIRouter to strictly separate domain logic into micro-namespaces (/api/v1/chat, /api/v1/knowledge, /api/v1/logs).
  • Data Validation: Strict JSON payload validation using Pydantic DTOs (Data Transfer Objects) to prevent malformed queries from reaching the AI layer.
  • Resilient Model Factory (Circuit Breaker): Implements automated fallback routing. If the primary model (Gemini 2.5 Pro) triggers a 429 Rate Limit or 500 Server Error, traffic is seamlessly re-routed to a faster tier (Gemini 2.5 Flash), and ultimately to Azure OpenAI (GPT-4o) ensuring high availability.
  • Vector Storage: Local, persistent ChromaDB for fast semantic search and document retrieval.

2. The Frontend (Streamlit Client)

  • Dumb Client Pattern: The UI layer contains zero database or AI logic. It operates purely as a presentation layer, communicating with the FastAPI backend via standard REST HTTP POST/GET requests.
  • State Management: Secure session state handling for user authentication, role tracking, and chat history.

🔐 Security & Role-Based Access Control (RBAC)

Security is strictly enforced at the database query level, rather than just the UI level.

  • Internal vs. External Sovereignty: Users authenticating with @razorpay.com domains are granted INTERNAL roles; all others are EXTERNAL.
  • Vector Metadata Filtering: During document ingestion, vectors are tagged with an access_level (INTERNAL or EXTERNAL). When an external user queries the API, the backend systematically injects {"access_level": "EXTERNAL"} into the ChromaDB where filter. It is cryptographically impossible for external users to retrieve internal knowledge chunks.

⚙️ Core Mechanics & Observability

RAG Ingestion Pipeline

  • Chunking Strategy: Documents (PDF, DOCX, CSV, TXT, URLs) are parsed via LangChain using a RecursiveCharacterTextSplitter.
  • Optimization: Chunk size is strictly bounded to 1000 characters with a 200 character overlap to maintain semantic continuity.
  • Self-Healing Knowledge (Teach Mode): Administrators can forcibly inject hardcoded Q&A pairings directly into the vector database (tagged as "manual_fix") to correct LLM knowledge gaps natively without retraining.

Production Observability (MCP)

  • Model Context Protocol (MCP): Integrates directly with Coralogix via a local binary proxy.
  • Telemetry Queries: The backend communicates via JSON-RPC handshakes (tools/call -> search_logs), allowing internal admins to run deep Lucene searches against production telemetry directly from the support UI.

💻 Tech Stack

Domain Technology
Backend Framework Python 3.11, FastAPI, Uvicorn
Frontend Framework Streamlit, Requests
AI & LLM Integration Google Gemini (Pro/Flash), Azure OpenAI, LangChain
Vector Database ChromaDB (Persistent)
Package Management uv (Fast Python dependency resolver)

🛠 Local Setup & Installation

1. Prerequisites

Ensure you have Python 3.11+ and uv installed on your machine. Clone the repository and configure your environment:

# Create the environment file
touch .env

Add the following keys to your .env file:

GEMINI_API_KEY="your_google_api_key"
AZURE_OPENAI_API_KEY="your_azure_api_key"
AZURE_OPENAI_ENDPOINT="your_azure_endpoint"
CORA_AUTH_TOKEN="your_coralogix_token" # Optional: For system logs

2. Install Dependencies

This project uses uv for lightning-fast, reproducible dependency management.

uv sync

3. Run the Microservices

Because this is a decoupled architecture, you must run the Backend API and the Frontend Client as separate processes.

Terminal 1: Start the FastAPI Backend

uv run --env-file .env uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Tip: Once running, visit http://127.0.0.1:8000/docs to view the auto-generated Swagger/OpenAPI documentation.

Terminal 2: Start the Streamlit Frontend

uv run streamlit run app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages