Skip to content

Latest commit

 

History

History
563 lines (464 loc) · 23.4 KB

File metadata and controls

563 lines (464 loc) · 23.4 KB

⚖️ Nyay Mitra

Production-Grade Agentic Legal AI Platform for Indian Law

Multi-agent reasoning · Adaptive RAG · Lawyer workspace · Multilingual support


📌 Overview

Nyay Mitra is a full-stack, production-grade legal AI platform built for the Indian legal system. It uses a LangGraph-based agentic execution pipeline that replaces traditional single-LLM chatbot patterns with a controlled, parallel, role-aware multi-agent reasoning system.

Unlike conventional chatbots, Nyay Mitra:

  • Routes queries through a deterministic + semantic intent classification system
  • Executes multiple agents in parallel (RAG, Chat, Document Analysis)
  • Aggregates and synthesizes results with context-aware memory
  • Supports lawyer workspace mode with persistent matter context
  • Handles multilingual Indian legal queries with translation pipelines

✨ Features

For Citizens (Normal Users)

  • Normal Mode — Direct lightweight chat for quick legal queries
  • Agent Mode — Full LangGraph execution with RAG, Document Analysis, and Chat
  • Document Upload & Chat — Upload PDFs/text files and ask questions
  • Multilingual Support — Query in Hindi, Bengali, Urdu, Tamil, Telugu, and more
  • Contextual Memory — Rolling conversation summary for follow-on queries

For Lawyers

  • Matter-Based Workspace — Persistent legal memory per matter (case)
  • Workspace Memory Injection — Party summaries, legal issues, chronology injected into every AI response
  • Multi-Document RAG — Vector-indexed matter documents with semantic retrieval
  • Document Analysis — Structured legal analysis with clause-by-clause breakdown
  • Contract Review — Professional (lawyer) and client-friendly review modes
  • Client Communications — Auto-generate WhatsApp messages, emails, and voice note scripts
  • Team Collaboration — Multiple lawyers on a matter with role-based access

Platform

  • Adaptive RAG — Direct, map-reduce, and section-based retrieval strategies based on document size
  • Parallel Execution — LangGraph fan-out with asyncio.gather for concurrent agent execution
  • Intelligent Synthesis — Single LLM call produces both final response and updated memory summary
  • Document Deduplication — SHA-256 content hashing prevents duplicate ingestion
  • HuggingFace Hub Storage — Documents stored on HF Hub with background reindexing on restart
  • Legal Document Generation — Jinja2 template-based legal draft generation

🏗️ System Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (React)                         │
└─────────────────────────┬───────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│              Node.js BFF (Express + Prisma)                     │
│                                                                 │
│  ┌─────────────┐  ┌──────────────┐  ┌────────────────────────┐  │
│  │ Citizen API │  │  Lawyer API  │  │  Workspace Memory API  │  │
│  │  /citizen   │  │   /lawyer    │  │  Matter · Documents    │  │
│  └─────────────┘  └──────────────┘  └────────────────────────┘  │
│                                                                 │
│  PostgreSQL (via Prisma) — Conversations, Matters, Memory       │
└─────────────────────────┬───────────────────────────────────────┘
                          │ HTTP
┌─────────────────────────▼───────────────────────────────────────┐
│              Python FastAPI Backend (v3)                        │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  LangGraph Orchestrator                  │   │
│  │                                                          │   │
│  │workspace_context → language_detection → translation_query│   │
│  │                            ↓                   ↓         │   │
│  │                      intent_router <- document_ingestion │   │ 
│  │                             ↓                            │   │
│  │              ┌──────────────┬──────────┐                 │   │
│  │              ▼              ▼          ▼                 │   │
│  │          chat_node      rag_node   doc_analysis          │   │
│  │              └──────────────┴──────────┘                 │   │
│  │                            ↓                             │   │
│  │                       aggregator                         │   │
│  │                            ↓                             │   │
│  │                       synthesizer                        │   │
│  │                            ↓                             │   │
│  │                   translation_response                   │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                 │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐   │
│  │  RAG Service │  │ Chat Service │  │ Translation Service  │   │
│  │  ChromaDB    │  │  Gemini LLM  │  │   Gemini (Argos      │   │
│  │  Gemini Emb  │  │  Rolling Mem │  │   fallback planned)  │   │
│  └──────────────┘  └──────────────┘  └──────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                          │
              ┌───────────▼────────────┐
              │   HuggingFace Hub      │
              │   Document Storage     │
              │   (PDF/TXT uploads)    │
              └────────────────────────┘

🧠 LangGraph Execution Pipeline

Execution Graph

[START]
   │
   ▼
workspace_context_node     ← Injects matter/workspace context for lawyer mode
   │
   ▼
language_detection_node    ← Detects language, sets translation flags
   │                          Early exit if input_language = "en"
   ├──[needs translation]──► translation_query_node
   │
   ▼
document_ingestion_node    ← SHA-256 dedup, chunking, Gemini embeddings → ChromaDB
   │                          Skips if no storage_url
   ▼
intent_router_node         ← 3-layer routing:
   │                          L1: Deterministic (has_document, is_summary, is_web)
   │                          L2: LLM semantic classifier (7 intent classes)
   │                          L3: Safety net (chat fallback)
   │
   ├──[use_chat]────────────► chat_node
   ├──[use_rag]─────────────► rag_node          ← Parallel fan-out
   ├──[use_document_analysis]► doc_analysis_node ← via asyncio.gather
   └──[use_web]─────────────► web_node (planned)
                │
                ▼
          aggregator_node    ← Collects outputs from all parallel nodes
                │
                ▼
          synthesizer_node   ← Context-aware LLM synthesis
                │              previous_summary + all outputs → final_response + updated_summary
                │              Single LLM call for both outputs
                ├──[chat only]──► passthrough (no LLM call)
                │
                ▼
   translation_response_node ← Translates response if output_language ≠ en
                │
                ▼
            [END] → final_response + updated_summary → BFF

Intent Classification (3 Layers)

Layer Trigger Method Latency
L1 Deterministic has_document, is_summary, is_web State-based rules ~0ms
L2 LLM Semantic General queries, no document Gemini classifier (7 intents) ~1-2s
L3 Safety Net LLM timeout/failure Fallback to chat ~0ms

7 Intent Classes:

legal_explanation  → chat only
document_question  → rag only
document_summary   → document_analysis only
comparison         → rag + chat
legal_research     → web + chat
greeting           → chat only
followup           → chat only

🔄 Adaptive RAG System

Retrieval Strategy Selection

Document chunks:
  ≤ 6 chunks  → Direct analysis (single LLM call)
  ≤ 20 chunks → Summarization analysis (parallel batch LLM calls)
  > 20 chunks → Section-based analysis (concurrent section LLM calls)

Query retrieval:
  ≤ 4 chunks  → Direct RAG (single LLM call)
  > 4 chunks  → Map-Reduce RAG (parallel batches → reduce)

RAG Pipeline

Query → Semantic Embedding → ChromaDB Vector Search
     → Top-K Retrieval (semantic / full mode)
     → Context Construction (token budget: 8000)
     → Direct / Map-Reduce LLM Execution
     → Answer + Sources

Performance Metrics

Scenario Before Optimization After Optimization
Document Analysis (13 chunks) ~46s sequential ~5-8s parallel
RAG Query (large doc) ~36s sequential ~6-8s map-reduce
English query routing ~5s (LLM detection) ~0ms (early exit)
First doc upload ~5-10s
Follow-on doc query ~3-5s

🧩 Agent Internals

general_chat_agent

  • Wraps chat_service.get_chatbot_response()
  • Memory: previous_summary (rolling, 10-sentence LLM summary) + history[-6:] (exact window)
  • Two concurrent LLM calls: chat response + summary update (asyncio.gather)
  • Workspace context injection for lawyer mode

rag_agent

  • Selects retrieval strategy based on document size and query type
  • Meta-query detection (avoids vector search for "what document is this?" type queries)
  • Two prompt variants: semantic precision and comprehensive retrieval

document_analysis_agent

  • Three analysis tiers (direct, summarization, section-based)
  • Analysis cache: keyed by document_id:mode:top_k, max 100 entries FIFO
  • All batch/section LLM calls parallelized

translation_agent

  • Wraps translate_service.translate_text()
  • Gemini-based translation with legal terminology preservation
  • Prefix stripping for clean output

language_detection_agent

  • langdetect library with confidence scoring
  • User-provided language early exit (skips agent entirely)
  • Falls back to English for unsupported languages

🧾 Memory Architecture

Citizen Conversations

previous_summary (rolling, 10 sentences, LLM-generated)
    + history[-6:] (exact last 6 messages, 800 char/msg cap)
    → chat_service.main_chat_prompt
    → response + updated_summary (parallel)

Lawyer Workspace Conversations

previous_summary (conversation-level rolling memory)
    + workspace_context (matter-level persistent memory)
        - partySummary, legalIssues, factChronology
        - lawyerNotes, aiSummary, keyDates, documentIndex
    + history[-10:] (larger window for lawyer workflows)
    → injected into chat prompt + synthesizer prompt

Synthesizer Memory

For all non-chat paths (RAG, Analysis, multi-source):
    previous_summary + query + all node outputs
    → Single LLM call → final_response + updated_summary
    No extra LLM call — both outputs from same inference

📁 Project Structure

python_backend/
├── main.py                          # FastAPI app, lifespan, routers
├── orchestrator/
│   ├── graph.py                     # LangGraph StateGraph definition
│   ├── state.py                     # GraphState with reducers
│   └── service_registry.py          # Agent singletons
├── nodes/
│   ├── workspace_context_node.py    # Lawyer workspace injection
│   ├── language_detection_node.py   # Language detection + early exit
│   ├── translation_query_node.py    # Query translation
│   ├── document_ingestion_node.py   # Upload, chunk, embed, store
│   ├── intent_router_node.py        # 3-layer intent classification
│   ├── chat_node.py                 # General chat execution
│   ├── rag_node.py                  # RAG execution
│   ├── document_analysis_node.py    # Document analysis execution
│   ├── aggregator_node.py           # Parallel output collection
│   ├── synthesizer_node.py          # Context-aware synthesis
│   └── translation_response_node.py # Response translation
├── agents/
│   ├── general_chat_agent.py        # Chat agent wrapper
│   ├── rag_agent.py                 # RAG agent (no inner LangGraph)
│   ├── document_analysis_agent.py   # Analysis agent (no inner LangGraph)
│   ├── translation_agent.py         # Translation agent
│   └── language_detection_agent.py  # Language detection agent
├── services/
│   ├── rag_service.py               # ChromaDB, embeddings, retrieval, analysis
│   ├── chat_service.py              # Chat LLM, rolling memory, summary
│   ├── docgen_service.py            # Document generation service
│   ├── translate_service.py         # Gemini translation
│   └── language_detection_service.py # langdetect wrapper
├── routers/
│   ├── agent_router.py              # /agent/chat, /agent/upload-and-chat
│   ├── chatbot_router.py            # Legacy direct chat
│   ├── translate_router.py          # Translation API
│   ├── language_detection_router.py # Language detection API
│   └── docgen_router.py             # Document generation API
└── utils/
    ├── llm_service.py               # llm_generate with fence stripping + JSON extraction
    ├── graph_utils.py               # is_executed() guard
    ├── logger.py                    # Structured logging
    ├── config.py                    # Settings from env
    └── constants.py                 # Node name constants

node_backend/
├── src/
│   ├── module/
│   │   ├── lawyer/
│   │   │   ├── chat/                # Lawyer chat service + controller
│   │   │   ├── comms/               # Client communication generation
│   │   │   ├── contract/            # Contract review
│   │   │   ├── document/            # Matter document management
│   │   │   ├── matter/              # Matter CRUD
│   │   │   └── workspace/           # WorkspaceMemory service
│   │   ├── citizen/                 # Citizen chat + agent mode
│   │   └── auth/                    # JWT auth
│   ├── services/
│   │   └── python-backend.service.ts # Python API client
│   └── config/
│       └── database.ts              # Prisma client
└── prisma/
    └── schema.prisma                # Matter, WorkspaceMemory, LawyerConversation...

🌐 API Reference

Agent Endpoints (Python — /api/v3)

POST /agent/chat

Full LangGraph agent chat for text queries.

{
  "query": "what is anticipatory bail under CrPC",
  "session_id": "uuid",
  "document_id": "chromadb-uuid",
  "history": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}],
  "previous_summary": "Prior conversation summary...",
  "conversation_id": "conv-uuid",
  "conversation_type": "matter_workspace | standalone | client_comms_generation",
  "user_role": "LAWYER | CITIZEN",
  "matter_id": "matter-uuid",
  "workspace_memory": {
    "partySummary": "...",
    "factChronology": "...",
    "legalIssues": "...",
    "lawyerNotes": "...",
    "aiSummary": "...",
    "keyDates": [{"label": "Next hearing", "date": "2024-08-01"}],
    "documentIndex": [{"id": "...", "title": "...", "vectorDocId": "..."}]
  },
  "input_language": "en",
  "output_language": "hi"
}

Response:

{
  "response": "...",
  "session_id": "uuid",
  "updated_summary": "...",
  "agents_used": ["intent_router", "rag_agent", "synthesizer"],
  "execution_trace": [...],
  "agent_outputs": {...},
  "language_info": {"detected_language": "en", "translated_query": "..."}
}

POST /agent/upload-and-chat

Upload a document and ask a question in one request (multipart/form-data).

Field Type Required
file binary
query string
session_id string
previous_summary string
input_language string
output_language string
user_role string

Response adds document_id and storage_url to the standard agent response.

POST /translate

Translate text using Gemini with legal terminology preservation.

{
  "text": "...",
  "source_lang": "en",
  "target_lang": "hi"
}

GET /health

System health check — RAG service, translation service, language detection service.


🚀 Deployment

HuggingFace Spaces (Python Backend)

The Python backend is designed for HuggingFace Spaces deployment:

# In your HF Space app.py / main.py
# ChromaDB uses /tmp/chroma_db (ephemeral but restored on restart via _reindex_from_hub)
# Background reindexing runs on startup — server is immediately available

On startup the system:

  1. Initializes all services (translation, language detection, RAG)
  2. Starts background re-indexing from HF Hub (non-blocking)
  3. Restores all documents from HF Hub storage within 30-60s

Environment Notes

  • /tmp/chroma_db — ephemeral, restored on restart via _reindex_from_hub()
  • Documents uploaded to HF Hub dataset survive restarts
  • Translation service is Gemini-based — no package downloads needed
  • Language detection uses langdetect (lightweight, no model downloads)

🔐 Security

  • JWT authentication for all lawyer endpoints
  • authenticateLawyer middleware on all lawyer routes
  • Matter ownership verified on every request
  • Document access scoped to matter membership
  • CORS configured (restrict allow_origins in production)
  • API keys via environment variables only

📊 Key Technical Decisions

Decision Rationale
LangGraph over custom orchestrator Reliable parallel execution, built-in state management, conditional routing
Gemini 2.5 Flash over GPT-4 Cost efficiency, strong multilingual Indian language support
ChromaDB over Pinecone Self-hosted, no per-query cost, sufficient for current scale
Rolling summary over full history Bounded token cost regardless of conversation length
Synthesizer owns updated_summary Single source of truth — no inconsistency across agents
SHA-256 deduplication Prevents re-ingestion of same document across uploads/restarts
Background reindexing Server available immediately — documents restored async
Gemini translation over Argos Quality for Indian legal content — no model downloads, correct terminology

🔧 Configuration Reference

RAG Service Constants

CHUNK_SIZE = 800
CHUNK_OVERLAP = 150
MAX_CHUNKS_PER_QUERY = 16
MAX_CONTEXT_TOKENS = 8000
MAX_ANALYSIS_TOKENS = 20000
RAG_DIRECT_THRESHOLD = 4      # chunks — below this: single LLM call
RAG_BATCH_SIZE = 4             # chunks per map-reduce batch
_llm_semaphore = Semaphore(5)  # max concurrent Gemini calls
_analysis_cache_max = 100      # FIFO eviction

Chat Service Constants

HISTORY_WINDOW = 6             # recent messages (exact)
SUMMARY_MAX_SENTENCES = 10     # rolling summary cap
MESSAGE_CHAR_CAP = 800         # per message in recent_block
TEMPERATURE_CHAT = 0.5         # balanced for legal accuracy
TEMPERATURE_SUMMARY = 0.0      # deterministic summaries

Intent Router Constants

LLM_CLASSIFY_TIMEOUT = 10.0   # seconds before fallback

🧪 Testing

Agent Chat (Swagger UI)

GET /docs → Swagger UI
POST /api/v3/agent/chat

Health Check

curl http://localhost:8080/health

Translation Test

curl -X POST http://localhost:8080/api/v3/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "What is bail in Indian law", "source_lang": "en", "target_lang": "hi"}'

📈 Roadmap

  • Web search agent integration (Serper/Tavily)
  • Workspace context injection for lawyer mode (in progress)
  • Hinglish (Roman script Hindi) detection and handling
  • Multi-document RAG (query across multiple documents simultaneously)
  • Persistent ChromaDB storage (upgrade to paid HF Spaces)
  • Query rewriting for ambiguous pronoun resolution
  • Streaming responses
  • Rate limiting and request queuing
  • Gemini paid tier upgrade (removes 20 req/day limit)

⚠️ Disclaimer

Nyay Mitra is an AI-powered tool for legal information purposes only. It does not constitute legal advice. Always consult a qualified legal professional for advice specific to your situation.


👨‍💻 Authors

Tejasvi Aryan — AI/ML Architecture, LangGraph Pipeline, RAG System, Python Backend

Anubrata Guin — Node.js BFF, Prisma Schema, Lawyer Dashboard, Workspace System


⭐ Star this repository if you find it useful

Made with ❤️ for Indian legal accessibility