Here is an updated production-style README for your V6 FAISS-native RAG system.
Production-grade Retrieval-Augmented Generation (RAG) system powering the OmniBioAI ecosystem documentation, architecture search, workflow discovery, and developer assistant APIs.
- FAISS-native vector search
- Incremental indexing
- Ollama local embeddings + local LLM inference
- FastAPI API server
- Streaming responses (SSE)
- Chunk-level document retrieval
- Repository-wide multi-project indexing
- Fully local execution
- No OpenAI dependency
- Production-safe embedding normalization
- Hybrid-ready architecture
- V6 dimension consistency enforcement
Repositories
↓
Document Loader
↓
Chunker
↓
Ollama Embeddings (768-d)
↓
FAISS Vector Index
↓
RAG Engine
↓
FastAPI API
↓
LLM Answer Generation
Previous versions used brute-force cosine scanning across vectors.
V6 uses:
faiss.IndexFlatIPBenefits:
- 10–50x faster retrieval
- scalable search
- lower latency
- future ANN support
A major issue in previous builds was embedding mismatch.
| Stage | Model | Dimension |
|---|---|---|
| Indexing | all-MiniLM-L6-v2 | 384 |
| Querying | nomic-embed-text | 768 |
This caused FAISS assertion failures:
AssertionError: d == self.dNow BOTH ingestion and retrieval use:
nomic-embed-text
Dimension:
768
This guarantees:
- stable retrieval
- no dimension mismatch
- deterministic FAISS behavior
omnibioai-dev-hub/
│
├── api/
│ ├── main.py
│ └── routes/
│
├── rag/
│ ├── engine.py
│ └── control_plane.py
│
├── index/
│ └── vector_store.py
│
├── embeddings/
│ └── embedder.py
│
├── ingestion/
│ └── doc_loader.py
│
├── processing/
│ └── chunker.py
│
├── scripts/
│ └── build_index.py
│
└── data/
Recommended:
Python 3.11
Install:
curl -fsSL https://ollama.com/install.sh | shollama pull nomic-embed-textRecommended:
ollama pull mistralOptional:
ollama pull llama3
ollama pull deepseek-coder
ollama pull deepseek-r1conda create -n chemoinfo python=3.11 -y
conda activate chemoinfopip install fastapi uvicorn requests numpy faiss-cpu sentence-transformersrm -rf data/*python scripts/build_index.pyExpected output:
🚀 Incremental V6 Indexing Starting...
✅ V6 Index Complete
uvicorn api.main:app --host 0.0.0.0 --port 8082 --reloadcurl -X POST http://localhost:8082/rag/query \
-H "Content-Type: application/json" \
-d '{"query":"What is workflow engine in OmniBioAI?"}'Example response:
{
"query": "What is workflow engine in OmniBioAI?",
"answer": "According to the provided context...",
"sources": [
"../omnibioai-workflow-bundles/README.md"
],
"context_used": 5,
"version": "v6-faiss",
"api_version": "v6"
}Endpoint:
POST /rag/stream
Uses:
- Server-Sent Events (SSE)
- token streaming
- real-time generation
Documents are split into semantic chunks.
Each chunk is embedded using:
nomic-embed-text
Output dimension:
768
Vectors are stored in:
faiss.IndexFlatIPUser query is embedded using the SAME embedding model.
FAISS retrieves nearest chunks.
Retrieved chunks become context.
Prompt sent to local Ollama model.
Current indexing targets:
repos = [
"../omnibioai",
"../omnibioai-rag",
"../omnibioai-toolserver",
"../omnibioai-sdk",
"../omnibioai-workflow-bundles",
"../omnibioai-control-center",
"../omnibioai-lims",
"../omnibioai-model-registry",
"../omnibioai-dev-docker"
]- brute-force cosine scan
- slow retrieval
- dimension mismatch bugs
- unstable indexing
- FAISS-native retrieval
- stable dimensions
- fast semantic search
- local-only execution
- scalable architecture
Error:
AssertionError: d == self.d
Cause:
Different embedding models used during indexing vs querying.
Fix:
Rebuild index using the SAME embedding model.
Error:
Read timed out
Fix:
Use a smaller generation model:
model="mistral"instead of:
deepseek-r1Check:
python -c "
from index.vector_store import VectorStore
import numpy as np
vs = VectorStore()
vs.add([np.random.rand(768)], [{'text':'test'}])
print(vs.index.ntotal)
"Expected:
1
- IVF indexes
- HNSW search
- metadata filtering
- hybrid BM25 + vector search
- reranking
- cross-encoder scoring
- persistent FAISS storage
- multi-user collections
- distributed indexing
- workflow-aware retrieval
- graph RAG
- plugin-aware retrieval
Internal OmniBioAI Development License.
RAG V6 powers:
- architecture discovery
- workflow documentation search
- plugin documentation retrieval
- developer assistant APIs
- AI infrastructure exploration
- cross-repository semantic search
- internal engineering copilots