A semantic search and discovery tool for OpenAPI specifications powered by RAG (Retrieval Augmented Generation). Service Discovery RAG helps developers quickly find relevant APIs, endpoints, and services by understanding natural language queries.
- 🔍 Semantic Search: Find APIs using natural language queries instead of exact keyword matching
- 📚 OpenAPI Support: Automatically parses and indexes OpenAPI/Swagger specifications
- 🤖 LLM-Powered: Uses local LLMs for query understanding and response generation
- 🎯 Intelligent Reranking: Multi-stage retrieval with reranking for better results
- 💬 Interactive CLI: Beautiful command-line interface with rich formatting
- 🚀 Vector Search: Fast similarity search using Qdrant vector database
- 📊 Confidence Scoring: Provides confidence levels for search results
Service Discovery RAG uses a RAG (Retrieval Augmented Generation) pipeline:
- Ingestion: Parses OpenAPI specs → Chunks content → Generates embeddings → Stores in Qdrant
- Query Processing: Natural language query → LLM expansion → Intent classification
- Retrieval: Vector similarity search → Top-K results
- Reranking: Cross-encoder reranking for improved relevance
- Generation: LLM generates natural language response with citations
- Python 3.9+
- Docker and Docker Compose (for Qdrant)
- Local LLM server (llama.cpp)
-
Clone the repository
git clone <repository-url> cd service-discovery-rag
-
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Start Qdrant (vector database)
docker-compose up -d
-
Configure environment (optional) Create a
.envfile to override default settings:QDRANT_URL=http://localhost:6333 LLAMA_URL=http://127.0.0.1:8080 EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
-
Set up LLM server
# Download a GGUF model to data/models/ # Start llama.cpp server llama-server --model data/models/mistral-7b-instruct.Q4_K_M.gguf
First, add your OpenAPI/Swagger JSON files to data/specs/, then run:
python scripts/ingest_specs.pyThis will:
- Parse all OpenAPI specifications
- Chunk the content intelligently
- Generate embeddings
- Store everything in Qdrant
Single query:
python -m src.cli.app search "How do I create a payment?"Interactive mode:
python -m src.cli.app interactiveExample queries:
- "How do I authenticate a user?"
- "Find endpoints for account management"
- "What APIs handle payment processing?"
- "Show me POST endpoints for creating orders"
Configuration is managed in config/settings.py. Key settings:
- Qdrant: Vector database URL and collection names
- Embedding Model: Model for generating embeddings (default:
BAAI/bge-large-en-v1.5) - LLM: Local LLM server URL and parameters
- Reranker: Model for reranking results (default:
BAAI/bge-reranker-base) - Search Parameters: Top-K values for retrieval and reranking
service-discovery-rag/
├── config/ # Configuration settings
├── data/
│ ├── models/ # LLM model files (GGUF)
│ ├── qdrant_storage/ # Qdrant database storage
│ └── specs/ # OpenAPI/Swagger JSON files
├── scripts/
│ └── ingest_specs.py # Ingestion script
├── src/
│ ├── cli/ # Command-line interface
│ ├── ingestion/ # Parsing, chunking, embedding
│ ├── models/ # Data models and LLM manager
│ ├── query/ # Query processing, retrieval, reranking
│ ├── storage/ # Qdrant client
│ └── utils/ # Logging utilities
├── docker-compose.yml # Qdrant service
└── requirements.txt # Python dependencies
- Parser (
src/ingestion/parser.py): Parses OpenAPI specs into structured data - Chunker (
src/ingestion/chunker.py): Intelligently chunks API documentation - Embedder (
src/ingestion/embedder.py): Generates embeddings using sentence transformers - Retriever (
src/query/retriever.py): Performs vector similarity search - Reranker (
src/query/reranker.py): Reranks results using cross-encoder - Generator (
src/query/generator.py): Generates natural language responses
Logs are written to logs/service-discovery-rag-YYYY-MM-DD.log with detailed information about:
- Query processing
- Retrieval results
- LLM interactions
- Errors and warnings
- Vector Database: Qdrant
- Embeddings: Sentence Transformers (BGE models)
- LLM: llama.cpp (local inference)
- Reranking: Cross-encoder models
- CLI: Click + Rich
- Parsing: Prance (OpenAPI validation)