This is a Production-Ready Retrieval-Augmented Generation (RAG) System built with enterprise-grade components:
- Framework: LangChain (RAG orchestration)
- Embeddings: Google Gemini API (semantic understanding)
- Vector DB: Pinecone (scalable vector storage)
- LLM: Google Gemini Pro (generation)
- UI: Streamlit (web interface)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β (Streamlit Web App) β
β - File Upload - Chat Interface - Display Results β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β RAG Processing Pipeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. Document Processor β
β - Text Extraction (.txt, .pdf, .docx) β
β - Text Chunking (with overlap) β
β β
β 2. Embedding Service β
β - Generate vectors via Google Gemini β
β - Dimension: 768 β
β β
β 3. Vector Storage β
β - Pinecone index management β
β - Metadata storage β
β β
β 4. RAG Chain β
β - Retrieval (semantic search) β
β - Generation (LLM response) β
β - Prompt engineering β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
rag-project/
β
βββ src/ # Main source code
β βββ __init__.py # Package initialization
β β
β βββ config/ # Configuration module
β β βββ __init__.py
β β βββ config.py # Centralized config (reads .env)
β β
β βββ rag/ # Core RAG implementation
β β βββ __init__.py
β β βββ pinecone_manager.py # Pinecone CRUD operations
β β βββ embedding_service.py # Google Gemini embeddings
β β βββ document_processor.py# Document pipeline
β β βββ rag_chain.py # LangChain RAG chain
β β
β βββ utils/ # Utility modules
β βββ __init__.py
β βββ helpers.py # Logging, formatting
β βββ chunking.py # Text splitting logic
β βββ text_processor.py # File parsing
β
βββ app.py # Streamlit web interface
βββ main.py # CLI entry point
βββ setup_project.py # Setup automation
βββ requirements.txt # Python dependencies
β
βββ .env.template # Config template
βββ .env # Config (create from template)
β
βββ README.md # User documentation
User Upload File
β
Extract Text (text_processor.py)
β
Split into Chunks (chunking.py)
β
Generate Embeddings (embedding_service.py)
β
Upsert to Pinecone (pinecone_manager.py)
β
Document Ready for Queries
User Question (Chat Interface)
β
Generate Question Embedding (embedding_service.py)
β
Search Pinecone for Similar Chunks (pinecone_manager.py)
β
Retrieve Top-K Results (default: 5)
β
Format Context from Retrieved Chunks
β
Send to Gemini with Custom Prompt (rag_chain.py)
β
Stream Response to User
- Models Used:
models/embedding-001- Text embeddingsgemini-2.5-flash- Text generation
- Key Operations:
embed_content()- Generate embeddingsChatGoogleGenerativeAI()- LLM interface
- Index:
rag-documents-index(configurable) - Key Operations:
create_index()- Initialize vector databaseupsert()- Store vectors with metadataquery()- Semantic search
{
"id": "filename_0_a1b2c3d4",
"values": [0.23, 0.45, ...], // 768-dimensional embedding
"metadata": {
"chunk_index": 0,
"source": "document.txt",
"text": "First 500 characters of chunk..."
}
}chunk_index: Position in source documentsource: Original filenametext: Content preview (first 500 chars)
| Parameter | Default | Purpose |
|---|---|---|
CHUNK_SIZE |
1000 | Characters per chunk |
CHUNK_OVERLAP |
200 | Character overlap between chunks |
RETRIEVAL_TOP_K |
5 | Number of results to retrieve |
EMBEDDING_DIMENSION |
768 | Embedding vector dimension |
LANGCHAIN_VERBOSE |
False | Enable verbose logging |
-
Hallucination Prevention
- Custom prompt instructs model to refuse out-of-scope questions
- "I don't have information in the uploaded documents to answer that."
-
Context Verification
- Only uses retrieved documents as context
- No external data sources
-
Source Attribution
- Links answers back to source documents
- Shows document excerpts
-
Logging
- All operations logged for audit trail
- Configurable log levels
- Recursive character splitting
- Respects semantic boundaries (paragraphs, sentences)
- Configurable size and overlap
- Batch processing for multiple texts
- Caching ready (can be added)
- Async support ready
- Vector similarity search (cosine distance)
- Top-K filtering
- Metadata filtering support
-
New File Formats
# Add to text_processor.py elif file_ext == ".new_format": return TextProcessor._extract_from_new_format(file_path)
-
Custom Prompt Templates
# Modify in rag_chain.py _create_qa_chain() CUSTOM_PROMPT = PromptTemplate( template="Your custom template...", input_variables=["context", "question"] )
-
New Retrieval Strategies
# Create in rag_chain.py def retrieve_with_reranking(self, question: str): # Custom retrieval logic
# In .env file
LANGCHAIN_VERBOSE=True
LOG_LEVEL=DEBUGfrom src.rag import PineconeManager
pm = PineconeManager()
stats = pm.get_index_stats()
print(stats) # Shows vector counts, dimensions# Create test file
echo "Test content" > test.txt
# Process it
python main.py process test.txtfrom src.rag import RAGChain
chain = RAGChain()
result = chain.query("Test question?")
print(result["answer"])from src.rag import DocumentProcessor
processor = DocumentProcessor()
chunks = processor.process_file("document.txt", "document.txt")
print(f"Created {chunks} chunks")from src.rag import RAGChain
chain = RAGChain()
result = chain.query("What is the main topic?")
print(result["answer"])
for doc in result["source_documents"]:
print(f"Source: {doc.metadata['source']}")- Process new documents (appends to index)
- Use
--namespaceflag for isolation - Clear index if needed: Update PINECONE_INDEX_NAME in .env
- Streamlit UI: "Clear All Data" button
- CLI: Create new index with different name
- Small to medium document repositories (millions of vectors)
- Real-time query performance needs
- Multi-tenant support (via namespaces)
- Cost-effective vector storage
- Consider vector database partitioning
- Implement caching layer
- Add async batch processing
- Monitor Pinecone index size
| Issue | Cause | Solution |
|---|---|---|
| No embeddings generated | Invalid API key | Check GOOGLE_API_KEY |
| Connection refused | API timeout | Check internet, retry |
| Hallucinated answers | Prompt design | Adjust prompt template |
| Slow queries | Large TOP_K | Reduce RETRIEVAL_TOP_K |
| Memory issues | Large documents | Reduce CHUNK_SIZE |
Configclass with all settingsvalidate()method for config checks
EmbeddingServicefor generating vectorsembed_text()- single textembed_texts()- batch processing
PineconeManagerfor index operationscreate_index()- setupupsert_vectors()- storequery_vectors()- retrieve
DocumentProcessorfor full pipelineprocess_file()- single fileprocess_multiple_files()- batch
RAGChainfor Q&Aquery()- get answersis_relevant_to_documents()- check relevance
- Beginner: Use Streamlit UI only
- Intermediate: Explore CLI commands
- Advanced: Modify code and add features
- Expert: Integrate into production systems
- Google Generative AI: https://ai.google.dev/
- Pinecone Docs: https://docs.pinecone.io/
- LangChain: https://python.langchain.com/
- Streamlit: https://docs.streamlit.io/
Version: 1.0.0
Last Updated: December 2024
Status: Production Ready