An enterprise-grade, multi-tenant Retrieval-Augmented Generation (RAG) system. This project demonstrates a highly scalable Modular Monolith backend architecture, resilient LLM fallback patterns, and strict Role-Based Access Control (RBAC) enforced at the vector database layer.
The application is decoupled into a clear Client-Server model to ensure strict separation of concerns, maintainability, and API-first design.
- API Gateway & Routing: Utilizes FastAPI
APIRouterto strictly separate domain logic into micro-namespaces (/api/v1/chat,/api/v1/knowledge,/api/v1/logs). - Data Validation: Strict JSON payload validation using Pydantic DTOs (Data Transfer Objects) to prevent malformed queries from reaching the AI layer.
- Resilient Model Factory (Circuit Breaker): Implements automated fallback routing. If the primary model (Gemini 2.5 Pro) triggers a
429 Rate Limitor500 Server Error, traffic is seamlessly re-routed to a faster tier (Gemini 2.5 Flash), and ultimately to Azure OpenAI (GPT-4o) ensuring high availability. - Vector Storage: Local, persistent ChromaDB for fast semantic search and document retrieval.
- Dumb Client Pattern: The UI layer contains zero database or AI logic. It operates purely as a presentation layer, communicating with the FastAPI backend via standard REST HTTP POST/GET requests.
- State Management: Secure session state handling for user authentication, role tracking, and chat history.
Security is strictly enforced at the database query level, rather than just the UI level.
- Internal vs. External Sovereignty: Users authenticating with
@razorpay.comdomains are grantedINTERNALroles; all others areEXTERNAL. - Vector Metadata Filtering: During document ingestion, vectors are tagged with an
access_level(INTERNALorEXTERNAL). When an external user queries the API, the backend systematically injects{"access_level": "EXTERNAL"}into the ChromaDBwherefilter. It is cryptographically impossible for external users to retrieve internal knowledge chunks.
- Chunking Strategy: Documents (PDF, DOCX, CSV, TXT, URLs) are parsed via LangChain using a
RecursiveCharacterTextSplitter. - Optimization: Chunk size is strictly bounded to
1000characters with a200character overlap to maintain semantic continuity. - Self-Healing Knowledge (Teach Mode): Administrators can forcibly inject hardcoded Q&A pairings directly into the vector database (tagged as
"manual_fix") to correct LLM knowledge gaps natively without retraining.
- Model Context Protocol (MCP): Integrates directly with Coralogix via a local binary proxy.
- Telemetry Queries: The backend communicates via JSON-RPC handshakes (
tools/call->search_logs), allowing internal admins to run deep Lucene searches against production telemetry directly from the support UI.
| Domain | Technology |
|---|---|
| Backend Framework | Python 3.11, FastAPI, Uvicorn |
| Frontend Framework | Streamlit, Requests |
| AI & LLM Integration | Google Gemini (Pro/Flash), Azure OpenAI, LangChain |
| Vector Database | ChromaDB (Persistent) |
| Package Management | uv (Fast Python dependency resolver) |
Ensure you have Python 3.11+ and uv installed on your machine.
Clone the repository and configure your environment:
# Create the environment file
touch .envAdd the following keys to your .env file:
GEMINI_API_KEY="your_google_api_key"
AZURE_OPENAI_API_KEY="your_azure_api_key"
AZURE_OPENAI_ENDPOINT="your_azure_endpoint"
CORA_AUTH_TOKEN="your_coralogix_token" # Optional: For system logsThis project uses uv for lightning-fast, reproducible dependency management.
uv syncBecause this is a decoupled architecture, you must run the Backend API and the Frontend Client as separate processes.
Terminal 1: Start the FastAPI Backend
uv run --env-file .env uvicorn main:app --host 0.0.0.0 --port 8000 --reloadTip: Once running, visit http://127.0.0.1:8000/docs to view the auto-generated Swagger/OpenAPI documentation.
Terminal 2: Start the Streamlit Frontend
uv run streamlit run app.py