⚖️ Nyay Mitra

Production-Grade Agentic Legal AI Platform for Indian Law

Multi-agent reasoning · Adaptive RAG · Lawyer workspace · Multilingual support

📌 Overview

Nyay Mitra is a full-stack, production-grade legal AI platform built for the Indian legal system. It uses a LangGraph-based agentic execution pipeline that replaces traditional single-LLM chatbot patterns with a controlled, parallel, role-aware multi-agent reasoning system.

Unlike conventional chatbots, Nyay Mitra:

Routes queries through a deterministic + semantic intent classification system
Executes multiple agents in parallel (RAG, Chat, Document Analysis)
Aggregates and synthesizes results with context-aware memory
Supports lawyer workspace mode with persistent matter context
Handles multilingual Indian legal queries with translation pipelines

✨ Features

For Citizens (Normal Users)

Normal Mode — Direct lightweight chat for quick legal queries
Agent Mode — Full LangGraph execution with RAG, Document Analysis, and Chat
Document Upload & Chat — Upload PDFs/text files and ask questions
Multilingual Support — Query in Hindi, Bengali, Urdu, Tamil, Telugu, and more
Contextual Memory — Rolling conversation summary for follow-on queries

For Lawyers

Matter-Based Workspace — Persistent legal memory per matter (case)
Workspace Memory Injection — Party summaries, legal issues, chronology injected into every AI response
Multi-Document RAG — Vector-indexed matter documents with semantic retrieval
Document Analysis — Structured legal analysis with clause-by-clause breakdown
Contract Review — Professional (lawyer) and client-friendly review modes
Client Communications — Auto-generate WhatsApp messages, emails, and voice note scripts
Team Collaboration — Multiple lawyers on a matter with role-based access

Platform

Adaptive RAG — Direct, map-reduce, and section-based retrieval strategies based on document size
Parallel Execution — LangGraph fan-out with asyncio.gather for concurrent agent execution
Intelligent Synthesis — Single LLM call produces both final response and updated memory summary
Document Deduplication — SHA-256 content hashing prevents duplicate ingestion
HuggingFace Hub Storage — Documents stored on HF Hub with background reindexing on restart
Legal Document Generation — Jinja2 template-based legal draft generation

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (React)                         │
└─────────────────────────┬───────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│              Node.js BFF (Express + Prisma)                     │
│                                                                 │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────────┐  │
│  │ Citizen API │  │  Lawyer API  │  │  Workspace Memory API  │  │
│  │  /citizen   │  │   /lawyer    │  │  Matter · Documents    │  │
│  └─────────────┘  └──────────────┘  └────────────────────────┘  │
│                                                                 │
│  PostgreSQL (via Prisma) — Conversations, Matters, Memory       │
└─────────────────────────┬───────────────────────────────────────┘
                          │ HTTP
┌─────────────────────────▼───────────────────────────────────────┐
│              Python FastAPI Backend (v3)                        │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  LangGraph Orchestrator                  │   │
│  │                                                          │   │
│  │workspace_context → language_detection → translation_query│   │
│  │                            ↓                   ↓         │   │
│  │                      intent_router <- document_ingestion │   │ 
│  │                             ↓                            │   │
│  │              ┌──────────────┬──────────┐                 │   │
│  │              ▼              ▼          ▼                 │   │
│  │          chat_node      rag_node   doc_analysis          │   │
│  │              └──────────────┴──────────┘                 │   │
│  │                            ↓                             │   │
│  │                       aggregator                         │   │
│  │                            ↓                             │   │
│  │                       synthesizer                        │   │
│  │                            ↓                             │   │
│  │                   translation_response                   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │  RAG Service │  │ Chat Service │  │ Translation Service  │   │
│  │  ChromaDB    │  │  Gemini LLM  │  │   Gemini (Argos      │   │
│  │  Gemini Emb  │  │  Rolling Mem │  │   fallback planned)  │   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                          │
              ┌───────────▼────────────┐
              │   HuggingFace Hub      │
              │   Document Storage     │
              │   (PDF/TXT uploads)    │
              └────────────────────────┘

🧠 LangGraph Execution Pipeline

Execution Graph

[START]
   │
   ▼
workspace_context_node     ← Injects matter/workspace context for lawyer mode
   │
   ▼
language_detection_node    ← Detects language, sets translation flags
   │                          Early exit if input_language = "en"
   ├──[needs translation]──► translation_query_node
   │
   ▼
document_ingestion_node    ← SHA-256 dedup, chunking, Gemini embeddings → ChromaDB
   │                          Skips if no storage_url
   ▼
intent_router_node         ← 3-layer routing:
   │                          L1: Deterministic (has_document, is_summary, is_web)
   │                          L2: LLM semantic classifier (7 intent classes)
   │                          L3: Safety net (chat fallback)
   │
   ├──[use_chat]────────────► chat_node
   ├──[use_rag]─────────────► rag_node          ← Parallel fan-out
   ├──[use_document_analysis]► doc_analysis_node ← via asyncio.gather
   └──[use_web]─────────────► web_node (planned)
                │
                ▼
          aggregator_node    ← Collects outputs from all parallel nodes
                │
                ▼
          synthesizer_node   ← Context-aware LLM synthesis
                │              previous_summary + all outputs → final_response + updated_summary
                │              Single LLM call for both outputs
                ├──[chat only]──► passthrough (no LLM call)
                │
                ▼
   translation_response_node ← Translates response if output_language ≠ en
                │
                ▼
            [END] → final_response + updated_summary → BFF

Intent Classification (3 Layers)

Layer	Trigger	Method	Latency
L1 Deterministic	`has_document`, `is_summary`, `is_web`	State-based rules	~0ms
L2 LLM Semantic	General queries, no document	Gemini classifier (7 intents)	~1-2s
L3 Safety Net	LLM timeout/failure	Fallback to `chat`	~0ms

7 Intent Classes:

legal_explanation  → chat only
document_question  → rag only
document_summary   → document_analysis only
comparison         → rag + chat
legal_research     → web + chat
greeting           → chat only
followup           → chat only

🔄 Adaptive RAG System

Retrieval Strategy Selection

Document chunks:
  ≤ 6 chunks  → Direct analysis (single LLM call)
  ≤ 20 chunks → Summarization analysis (parallel batch LLM calls)
  > 20 chunks → Section-based analysis (concurrent section LLM calls)

Query retrieval:
  ≤ 4 chunks  → Direct RAG (single LLM call)
  > 4 chunks  → Map-Reduce RAG (parallel batches → reduce)

RAG Pipeline

Query → Semantic Embedding → ChromaDB Vector Search
     → Top-K Retrieval (semantic / full mode)
     → Context Construction (token budget: 8000)
     → Direct / Map-Reduce LLM Execution
     → Answer + Sources

Performance Metrics

Scenario	Before Optimization	After Optimization
Document Analysis (13 chunks)	~46s sequential	~5-8s parallel
RAG Query (large doc)	~36s sequential	~6-8s map-reduce
English query routing	~5s (LLM detection)	~0ms (early exit)
First doc upload	—	~5-10s
Follow-on doc query	—	~3-5s

🧩 Agent Internals

`general_chat_agent`

Wraps chat_service.get_chatbot_response()
Memory: previous_summary (rolling, 10-sentence LLM summary) + history[-6:] (exact window)
Two concurrent LLM calls: chat response + summary update (asyncio.gather)
Workspace context injection for lawyer mode

`rag_agent`

Selects retrieval strategy based on document size and query type
Meta-query detection (avoids vector search for "what document is this?" type queries)
Two prompt variants: semantic precision and comprehensive retrieval

`document_analysis_agent`

Three analysis tiers (direct, summarization, section-based)
Analysis cache: keyed by document_id:mode:top_k, max 100 entries FIFO
All batch/section LLM calls parallelized

`translation_agent`

Wraps translate_service.translate_text()
Gemini-based translation with legal terminology preservation
Prefix stripping for clean output

`language_detection_agent`

langdetect library with confidence scoring
User-provided language early exit (skips agent entirely)
Falls back to English for unsupported languages

🧾 Memory Architecture

Citizen Conversations

previous_summary (rolling, 10 sentences, LLM-generated)
    + history[-6:] (exact last 6 messages, 800 char/msg cap)
    → chat_service.main_chat_prompt
    → response + updated_summary (parallel)

Lawyer Workspace Conversations

previous_summary (conversation-level rolling memory)
    + workspace_context (matter-level persistent memory)
        - partySummary, legalIssues, factChronology
        - lawyerNotes, aiSummary, keyDates, documentIndex
    + history[-10:] (larger window for lawyer workflows)
    → injected into chat prompt + synthesizer prompt

Synthesizer Memory

For all non-chat paths (RAG, Analysis, multi-source):
    previous_summary + query + all node outputs
    → Single LLM call → final_response + updated_summary
    No extra LLM call — both outputs from same inference

📁 Project Structure

python_backend/
├── main.py                          # FastAPI app, lifespan, routers
├── orchestrator/
│   ├── graph.py                     # LangGraph StateGraph definition
│   ├── state.py                     # GraphState with reducers
│   └── service_registry.py          # Agent singletons
├── nodes/
│   ├── workspace_context_node.py    # Lawyer workspace injection
│   ├── language_detection_node.py   # Language detection + early exit
│   ├── translation_query_node.py    # Query translation
│   ├── document_ingestion_node.py   # Upload, chunk, embed, store
│   ├── intent_router_node.py        # 3-layer intent classification
│   ├── chat_node.py                 # General chat execution
│   ├── rag_node.py                  # RAG execution
│   ├── document_analysis_node.py    # Document analysis execution
│   ├── aggregator_node.py           # Parallel output collection
│   ├── synthesizer_node.py          # Context-aware synthesis
│   └── translation_response_node.py # Response translation
├── agents/
│   ├── general_chat_agent.py        # Chat agent wrapper
│   ├── rag_agent.py                 # RAG agent (no inner LangGraph)
│   ├── document_analysis_agent.py   # Analysis agent (no inner LangGraph)
│   ├── translation_agent.py         # Translation agent
│   └── language_detection_agent.py  # Language detection agent
├── services/
│   ├── rag_service.py               # ChromaDB, embeddings, retrieval, analysis
│   ├── chat_service.py              # Chat LLM, rolling memory, summary
│   ├── docgen_service.py            # Document generation service
│   ├── translate_service.py         # Gemini translation
│   └── language_detection_service.py # langdetect wrapper
├── routers/
│   ├── agent_router.py              # /agent/chat, /agent/upload-and-chat
│   ├── chatbot_router.py            # Legacy direct chat
│   ├── translate_router.py          # Translation API
│   ├── language_detection_router.py # Language detection API
│   └── docgen_router.py             # Document generation API
└── utils/
    ├── llm_service.py               # llm_generate with fence stripping + JSON extraction
    ├── graph_utils.py               # is_executed() guard
    ├── logger.py                    # Structured logging
    ├── config.py                    # Settings from env
    └── constants.py                 # Node name constants

node_backend/
├── src/
│   ├── module/
│   │   ├── lawyer/
│   │   │   ├── chat/                # Lawyer chat service + controller
│   │   │   ├── comms/               # Client communication generation
│   │   │   ├── contract/            # Contract review
│   │   │   ├── document/            # Matter document management
│   │   │   ├── matter/              # Matter CRUD
│   │   │   └── workspace/           # WorkspaceMemory service
│   │   ├── citizen/                 # Citizen chat + agent mode
│   │   └── auth/                    # JWT auth
│   ├── services/
│   │   └── python-backend.service.ts # Python API client
│   └── config/
│       └── database.ts              # Prisma client
└── prisma/
    └── schema.prisma                # Matter, WorkspaceMemory, LawyerConversation...

🌐 API Reference

Agent Endpoints (Python — `/api/v3`)

`POST /agent/chat`

Full LangGraph agent chat for text queries.

{
  "query": "what is anticipatory bail under CrPC",
  "session_id": "uuid",
  "document_id": "chromadb-uuid",
  "history": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}],
  "previous_summary": "Prior conversation summary...",
  "conversation_id": "conv-uuid",
  "conversation_type": "matter_workspace | standalone | client_comms_generation",
  "user_role": "LAWYER | CITIZEN",
  "matter_id": "matter-uuid",
  "workspace_memory": {
    "partySummary": "...",
    "factChronology": "...",
    "legalIssues": "...",
    "lawyerNotes": "...",
    "aiSummary": "...",
    "keyDates": [{"label": "Next hearing", "date": "2024-08-01"}],
    "documentIndex": [{"id": "...", "title": "...", "vectorDocId": "..."}]
  },
  "input_language": "en",
  "output_language": "hi"
}

Response:

{
  "response": "...",
  "session_id": "uuid",
  "updated_summary": "...",
  "agents_used": ["intent_router", "rag_agent", "synthesizer"],
  "execution_trace": [...],
  "agent_outputs": {...},
  "language_info": {"detected_language": "en", "translated_query": "..."}
}

`POST /agent/upload-and-chat`

Upload a document and ask a question in one request (multipart/form-data).

Field	Type	Required
`file`	binary	✅
`query`	string	✅
`session_id`	string	—
`previous_summary`	string	—
`input_language`	string	—
`output_language`	string	—
`user_role`	string	—

Response adds document_id and storage_url to the standard agent response.

`POST /translate`

Translate text using Gemini with legal terminology preservation.

{
  "text": "...",
  "source_lang": "en",
  "target_lang": "hi"
}

`GET /health`

System health check — RAG service, translation service, language detection service.

🚀 Deployment

HuggingFace Spaces (Python Backend)

The Python backend is designed for HuggingFace Spaces deployment:

# In your HF Space app.py / main.py
# ChromaDB uses /tmp/chroma_db (ephemeral but restored on restart via _reindex_from_hub)
# Background reindexing runs on startup — server is immediately available

On startup the system:

Initializes all services (translation, language detection, RAG)
Starts background re-indexing from HF Hub (non-blocking)
Restores all documents from HF Hub storage within 30-60s

Environment Notes

/tmp/chroma_db — ephemeral, restored on restart via _reindex_from_hub()
Documents uploaded to HF Hub dataset survive restarts
Translation service is Gemini-based — no package downloads needed
Language detection uses langdetect (lightweight, no model downloads)

🔐 Security

JWT authentication for all lawyer endpoints
authenticateLawyer middleware on all lawyer routes
Matter ownership verified on every request
Document access scoped to matter membership
CORS configured (restrict allow_origins in production)
API keys via environment variables only

📊 Key Technical Decisions

Decision	Rationale
LangGraph over custom orchestrator	Reliable parallel execution, built-in state management, conditional routing
Gemini 2.5 Flash over GPT-4	Cost efficiency, strong multilingual Indian language support
ChromaDB over Pinecone	Self-hosted, no per-query cost, sufficient for current scale
Rolling summary over full history	Bounded token cost regardless of conversation length
Synthesizer owns `updated_summary`	Single source of truth — no inconsistency across agents
SHA-256 deduplication	Prevents re-ingestion of same document across uploads/restarts
Background reindexing	Server available immediately — documents restored async
Gemini translation over Argos	Quality for Indian legal content — no model downloads, correct terminology

🔧 Configuration Reference

RAG Service Constants

CHUNK_SIZE = 800
CHUNK_OVERLAP = 150
MAX_CHUNKS_PER_QUERY = 16
MAX_CONTEXT_TOKENS = 8000
MAX_ANALYSIS_TOKENS = 20000
RAG_DIRECT_THRESHOLD = 4      # chunks — below this: single LLM call
RAG_BATCH_SIZE = 4             # chunks per map-reduce batch
_llm_semaphore = Semaphore(5)  # max concurrent Gemini calls
_analysis_cache_max = 100      # FIFO eviction

Chat Service Constants

HISTORY_WINDOW = 6             # recent messages (exact)
SUMMARY_MAX_SENTENCES = 10     # rolling summary cap
MESSAGE_CHAR_CAP = 800         # per message in recent_block
TEMPERATURE_CHAT = 0.5         # balanced for legal accuracy
TEMPERATURE_SUMMARY = 0.0      # deterministic summaries

Intent Router Constants

LLM_CLASSIFY_TIMEOUT = 10.0   # seconds before fallback

🧪 Testing

Agent Chat (Swagger UI)

GET /docs → Swagger UI
POST /api/v3/agent/chat

Health Check

curl http://localhost:8080/health

Translation Test

curl -X POST http://localhost:8080/api/v3/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "What is bail in Indian law", "source_lang": "en", "target_lang": "hi"}'

📈 Roadmap

Web search agent integration (Serper/Tavily)
Workspace context injection for lawyer mode (in progress)
Hinglish (Roman script Hindi) detection and handling
Multi-document RAG (query across multiple documents simultaneously)
Persistent ChromaDB storage (upgrade to paid HF Spaces)
Query rewriting for ambiguous pronoun resolution
Streaming responses
Rate limiting and request queuing
Gemini paid tier upgrade (removes 20 req/day limit)

⚠️ Disclaimer

Nyay Mitra is an AI-powered tool for legal information purposes only. It does not constitute legal advice. Always consult a qualified legal professional for advice specific to your situation.

👨‍💻 Authors

Tejasvi Aryan — AI/ML Architecture, LangGraph Pipeline, RAG System, Python Backend

Anubrata Guin — Node.js BFF, Prisma Schema, Lawyer Dashboard, Workspace System

⭐ Star this repository if you find it useful

Made with ❤️ for Indian legal accessibility

FilesExpand file tree

README.md

Latest commit

History