A local playground for running local Retrieval-Augmented Generation (RAG) system specifically designed for pharmacy and medical knowledge retrieval. This system runs locally, ensuring maximum data privacy and security for healthcare organizations.
- 100% Local Operation: All models run locally via Ollama - no data leaves your infrastructure
- No External API Calls: Zero dependency on cloud services or third-party APIs
- HIPAA-Ready Architecture: Designed with healthcare data privacy in mind
- Local Vector Storage: ChromaDB runs locally with no external database connections
- Multi-Modal Document Support: Process PDF, DOCX, and TXT medical documents
- Advanced Retrieval: Semantic search with configurable relevance scoring
- Source Attribution: Every answer includes specific document references and page numbers
- Comprehensive Testing: Industry-standard benchmarking for accuracy and safety
- Performance Monitoring: Built-in metrics for response time and quality assessment
- Python: 3.8 or higher
- Memory: Minimum 8GB RAM (16GB recommended for optimal performance)
- Storage: 10GB+ free space for models and vector database
- OS: Linux, macOS, or Windows
- Ollama: Local LLM runtime environment
git clone https://github.com/Lapintam/pharmacy-rag-example.git
cd pharmacy-rag-example
# Run the automated setup script
python setup.pyThe setup script will:
- Check system requirements
- Install Python dependencies
- Verify Ollama installation
- Check for required models
- Provide next steps guidance
git clone https://github.com/Lapintam/pharmacy-rag-example.git
cd pharmacy-rag-example
# Create isolated virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Visit https://ollama.com for installation instructions
# Then pull the recommended model:
ollama pull llama3.2:3b
# Start Ollama service
ollama serve# Place your pharmacy/medical documents in the data directory
# Supported formats: PDF, DOCX, TXT
cp your_medical_documents/* data/# Process and index all documents
python process_documents.py# Interactive querying
python query_data.py "What are the contraindications for beta-blockers?"
# With custom parameters
python query_data.py "Acetaminophen overdose treatment" --k 10 --model llama3.2:3bRun comprehensive benchmarks to validate system accuracy and safety:
# Run full test suite with industry-standard metrics
python test_rag_system.py- Accuracy Testing: Keyword-based relevance scoring
- Safety-Critical Validation: Special focus on medication safety
- Performance Metrics: Response time and throughput analysis
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Medical │ │ Document │ │ Vector │
│ Documents │───▶│ Processor │───▶│ Database │
│ (PDF/DOCX) │ │ (Local) │ │ (ChromaDB) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Query │ │ RAG Engine │◀────────────┘
│ Interface │───▶│ (Local LLM) │
│ │ │ via Ollama │
└─────────────────┘ └──────────────────┘
--k: Number of relevant chunks to retrieve (default: 5)--model: Ollama model to use (default: "llama3.2:3b")--format: Output format - "text" or "json" (default: "text")
Modify get_embedding_function.py to use different embedding models:
- Default:
all-MiniLM-L6-v2(fast, efficient) - Alternative:
all-MiniLM-L12-v2(higher accuracy)
Adjust chunk size and overlap in process_documents.py:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Adjust based on document complexity
chunk_overlap=100, # Increase for better context preservation
)pharmacy-rag-example/
├── data/ # Medical document storage
│ ├── USERS LOCAL DATA 1/ # E.g. Cardiovascular medications
│ ├── USERS LOCAL DATA 2/ # E.g. Poison control & antidotes
│ ├── USERS LOCAL DATA 3/ # E.g.Pediatric dosing guidelines
│ └── ... # Additional medical specialties
├── process_documents.py # Document ingestion pipeline
├── chroma/ # Vector database (auto-generated)
├── query_data.py # Query interface and RAG engine
├── get_embedding_function.py # Embedding model configuration
├── test_rag_system.py # Comprehensive testing suite
├── setup.py # Automated setup and system check
└── requirements.txt # Python dependencies
- No Cloud Dependencies: All processing occurs locally
- Encrypted Storage: Vector database can be encrypted at rest
- Access Control: Implement file-system level permissions
- Audit Logging: Track all queries and system access
- HIPAA: Architecture supports HIPAA compliance requirements
- SOC 2: Local operation eliminates many third-party risks
- GDPR: No data transmission to external services
- FDA 21 CFR Part 11: Audit trail capabilities for regulated environments
- CPU: Multi-core processor (8+ cores recommended)
- RAM: 16GB+ for optimal model performance
- Storage: SSD for faster vector database operations
- GPU: Optional, but improves embedding generation speed
- Horizontal Scaling: Deploy multiple instances behind load balancer
- Database Optimization: Tune ChromaDB settings for large document sets
- Model Selection: Balance accuracy vs. speed based on use case
Database Not Found
# Rebuild the vector database
python process_documents.pyOllama Connection Issues
# Ensure Ollama is running
ollama serve
# Verify model availability
ollama listPoor Query Results
# Run benchmarks to identify issues
python test_rag_system.py
# Check document quality and relevanceMemory Issues
- Reduce chunk size in
process_documents.py - Use smaller embedding model
- Limit concurrent queries
- Fork the repository
- Create a feature branch
- Add comprehensive tests
- Ensure all benchmarks pass
- Submit a pull request
MIT License - See LICENSE file for details.
This system is designed for informational purposes and clinical decision support. It should not replace professional medical judgment or be used as the sole basis for patient care decisions. Always consult current medical literature and follow institutional protocols.