BharatRAG is a fault-tolerant, horizontally scalable RAG (Retrieval Augmented Generation) pipeline designed for processing massive volumes of unstructured legal and government data in resource-constrained environments.
- Python 3.9+
- A working internet connection (to download models)
Install the required dependencies using pip:
pip install -r requirements.txtTo see the system in action with simulated OCR delays and fault tolerance:
python3 bharat_rag_engine.py --mode localWhat to expect:
- You will see logs of "Ingestion Workers" starting up.
- The system will process dummy PDF/Image files.
- It will intentionally fail on "corrupt" files (and log them to
failed_ingestion_log.jsonl). - Finally, it will perform a search query ("land ownership in Bhopal") and show results.
To run as a scalable Ray Serve deployment:
python3 bharat_rag_engine.py --mode deployThis exposes the engine as an HTTP API.
bharat_rag_engine.py: Core logic for Ray Actors and Haystack pipeline.bharat_config.yaml: Configuration for workers, models, and scaling.RESUME_POINTS.md: Technical summary for your portfolio/resume.
- If Ray is not installed, the engine runs in LOCAL SIMULATION mode.
- Some ingestion failures (e.g., corrupted or unreadable files) are expected and handled gracefully.
- On Windows, HuggingFace may show cache/symlink warnings; these do not affect execution and can be safely ignored.