🚀 Enterprise RAG Architecture: RazorpayX AI Support

An enterprise-grade, multi-tenant Retrieval-Augmented Generation (RAG) system. This project demonstrates a highly scalable Modular Monolith backend architecture, resilient LLM fallback patterns, and strict Role-Based Access Control (RBAC) enforced at the vector database layer.

🏗 System Architecture

The application is decoupled into a clear Client-Server model to ensure strict separation of concerns, maintainability, and API-first design.

1. The Backend (FastAPI Modular Monolith)

API Gateway & Routing: Utilizes FastAPI APIRouter to strictly separate domain logic into micro-namespaces (/api/v1/chat, /api/v1/knowledge, /api/v1/logs).
Data Validation: Strict JSON payload validation using Pydantic DTOs (Data Transfer Objects) to prevent malformed queries from reaching the AI layer.
Resilient Model Factory (Circuit Breaker): Implements automated fallback routing. If the primary model (Gemini 2.5 Pro) triggers a 429 Rate Limit or 500 Server Error, traffic is seamlessly re-routed to a faster tier (Gemini 2.5 Flash), and ultimately to Azure OpenAI (GPT-4o) ensuring high availability.
Vector Storage: Local, persistent ChromaDB for fast semantic search and document retrieval.

2. The Frontend (Streamlit Client)

Dumb Client Pattern: The UI layer contains zero database or AI logic. It operates purely as a presentation layer, communicating with the FastAPI backend via standard REST HTTP POST/GET requests.
State Management: Secure session state handling for user authentication, role tracking, and chat history.

🔐 Security & Role-Based Access Control (RBAC)

Security is strictly enforced at the database query level, rather than just the UI level.

Internal vs. External Sovereignty: Users authenticating with @razorpay.com domains are granted INTERNAL roles; all others are EXTERNAL.
Vector Metadata Filtering: During document ingestion, vectors are tagged with an access_level (INTERNAL or EXTERNAL). When an external user queries the API, the backend systematically injects {"access_level": "EXTERNAL"} into the ChromaDB where filter. It is cryptographically impossible for external users to retrieve internal knowledge chunks.

⚙️ Core Mechanics & Observability

RAG Ingestion Pipeline

Chunking Strategy: Documents (PDF, DOCX, CSV, TXT, URLs) are parsed via LangChain using a RecursiveCharacterTextSplitter.
Optimization: Chunk size is strictly bounded to 1000 characters with a 200 character overlap to maintain semantic continuity.
Self-Healing Knowledge (Teach Mode): Administrators can forcibly inject hardcoded Q&A pairings directly into the vector database (tagged as "manual_fix") to correct LLM knowledge gaps natively without retraining.

Production Observability (MCP)

Model Context Protocol (MCP): Integrates directly with Coralogix via a local binary proxy.
Telemetry Queries: The backend communicates via JSON-RPC handshakes (tools/call -> search_logs), allowing internal admins to run deep Lucene searches against production telemetry directly from the support UI.

💻 Tech Stack

Domain	Technology
Backend Framework	Python 3.11, FastAPI, Uvicorn
Frontend Framework	Streamlit, Requests
AI & LLM Integration	Google Gemini (Pro/Flash), Azure OpenAI, LangChain
Vector Database	ChromaDB (Persistent)
Package Management	`uv` (Fast Python dependency resolver)

🛠 Local Setup & Installation

1. Prerequisites

Ensure you have Python 3.11+ and uv installed on your machine. Clone the repository and configure your environment:

# Create the environment file
touch .env

Add the following keys to your .env file:

GEMINI_API_KEY="your_google_api_key"
AZURE_OPENAI_API_KEY="your_azure_api_key"
AZURE_OPENAI_ENDPOINT="your_azure_endpoint"
CORA_AUTH_TOKEN="your_coralogix_token" # Optional: For system logs

2. Install Dependencies

This project uses uv for lightning-fast, reproducible dependency management.

uv sync

3. Run the Microservices

Because this is a decoupled architecture, you must run the Backend API and the Frontend Client as separate processes.

Terminal 1: Start the FastAPI Backend

uv run --env-file .env uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Tip: Once running, visit http://127.0.0.1:8000/docs to view the auto-generated Swagger/OpenAPI documentation.

Terminal 2: Start the Streamlit Frontend

uv run streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.streamlit		.streamlit
routers		routers
.gitignore		.gitignore
README.md		README.md
api.py		api.py
app.py		app.py
document_loaders.py		document_loaders.py
main.py		main.py
pyproject.toml		pyproject.toml
test_backend.py		test_backend.py
test_env.py		test_env.py
uv.lock		uv.lock
vector_db.py		vector_db.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Enterprise RAG Architecture: RazorpayX AI Support

🏗 System Architecture

1. The Backend (FastAPI Modular Monolith)

2. The Frontend (Streamlit Client)

🔐 Security & Role-Based Access Control (RBAC)

⚙️ Core Mechanics & Observability

RAG Ingestion Pipeline

Production Observability (MCP)

💻 Tech Stack

🛠 Local Setup & Installation

1. Prerequisites

2. Install Dependencies

3. Run the Microservices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 Enterprise RAG Architecture: RazorpayX AI Support

🏗 System Architecture

1. The Backend (FastAPI Modular Monolith)

2. The Frontend (Streamlit Client)

🔐 Security & Role-Based Access Control (RBAC)

⚙️ Core Mechanics & Observability

RAG Ingestion Pipeline

Production Observability (MCP)

💻 Tech Stack

🛠 Local Setup & Installation

1. Prerequisites

2. Install Dependencies

3. Run the Microservices

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages