BharatRAG: Distributed Legal Data Engine

BharatRAG is a fault-tolerant, horizontally scalable RAG (Retrieval Augmented Generation) pipeline designed for processing massive volumes of unstructured legal and government data in resource-constrained environments.

🚀 How to Start

1. Prerequisites

Python 3.9+
A working internet connection (to download models)

2. Installation

Install the required dependencies using pip:

pip install -r requirements.txt

3. Running Locally (Simulation Mode)

To see the system in action with simulated OCR delays and fault tolerance:

python3 bharat_rag_engine.py --mode local

What to expect:

You will see logs of "Ingestion Workers" starting up.
The system will process dummy PDF/Image files.
It will intentionally fail on "corrupt" files (and log them to failed_ingestion_log.jsonl).
Finally, it will perform a search query ("land ownership in Bhopal") and show results.

4. Running as a Deployment (API)

To run as a scalable Ray Serve deployment:

python3 bharat_rag_engine.py --mode deploy

This exposes the engine as an HTTP API.

📂 Project Structure

bharat_rag_engine.py: Core logic for Ray Actors and Haystack pipeline.
bharat_config.yaml: Configuration for workers, models, and scaling.
RESUME_POINTS.md: Technical summary for your portfolio/resume.

Local Execution Notes

If Ray is not installed, the engine runs in LOCAL SIMULATION mode.
Some ingestion failures (e.g., corrupted or unreadable files) are expected and handled gracefully.
On Windows, HuggingFace may show cache/symlink warnings; these do not affect execution and can be safely ignored.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
RESUME_POINTS.md		RESUME_POINTS.md
app.py		app.py
bharat_config.yaml		bharat_config.yaml
bharat_rag_engine.py		bharat_rag_engine.py
create_dummy_data.py		create_dummy_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BharatRAG: Distributed Legal Data Engine

🚀 How to Start

1. Prerequisites

2. Installation

3. Running Locally (Simulation Mode)

4. Running as a Deployment (API)

📂 Project Structure

Local Execution Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BharatRAG: Distributed Legal Data Engine

🚀 How to Start

1. Prerequisites

2. Installation

3. Running Locally (Simulation Mode)

4. Running as a Deployment (API)

📂 Project Structure

Local Execution Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages