Service Discovery RAG

A semantic search and discovery tool for OpenAPI specifications powered by RAG (Retrieval Augmented Generation). Service Discovery RAG helps developers quickly find relevant APIs, endpoints, and services by understanding natural language queries.

Features

🔍 Semantic Search: Find APIs using natural language queries instead of exact keyword matching
📚 OpenAPI Support: Automatically parses and indexes OpenAPI/Swagger specifications
🤖 LLM-Powered: Uses local LLMs for query understanding and response generation
🎯 Intelligent Reranking: Multi-stage retrieval with reranking for better results
💬 Interactive CLI: Beautiful command-line interface with rich formatting
🚀 Vector Search: Fast similarity search using Qdrant vector database
📊 Confidence Scoring: Provides confidence levels for search results

Architecture

Service Discovery RAG uses a RAG (Retrieval Augmented Generation) pipeline:

Ingestion: Parses OpenAPI specs → Chunks content → Generates embeddings → Stores in Qdrant
Query Processing: Natural language query → LLM expansion → Intent classification
Retrieval: Vector similarity search → Top-K results
Reranking: Cross-encoder reranking for improved relevance
Generation: LLM generates natural language response with citations

Installation

Prerequisites

Python 3.9+
Docker and Docker Compose (for Qdrant)
Local LLM server (llama.cpp)

Setup

Clone the repository

git clone <repository-url>
cd service-discovery-rag

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Start Qdrant (vector database)
```
docker-compose up -d
```

Configure environment (optional) Create a .env file to override default settings:

QDRANT_URL=http://localhost:6333
LLAMA_URL=http://127.0.0.1:8080
EMBEDDING_MODEL=BAAI/bge-large-en-v1.5

Set up LLM server

# Download a GGUF model to data/models/
# Start llama.cpp server
llama-server --model data/models/mistral-7b-instruct.Q4_K_M.gguf

Usage

Ingest API Specifications

First, add your OpenAPI/Swagger JSON files to data/specs/, then run:

python scripts/ingest_specs.py

This will:

Parse all OpenAPI specifications
Chunk the content intelligently
Generate embeddings
Store everything in Qdrant

Search APIs

Single query:

python -m src.cli.app search "How do I create a payment?"

Interactive mode:

python -m src.cli.app interactive

Example queries:

"How do I authenticate a user?"
"Find endpoints for account management"
"What APIs handle payment processing?"
"Show me POST endpoints for creating orders"

Configuration

Configuration is managed in config/settings.py. Key settings:

Qdrant: Vector database URL and collection names
Embedding Model: Model for generating embeddings (default: BAAI/bge-large-en-v1.5)
LLM: Local LLM server URL and parameters
Reranker: Model for reranking results (default: BAAI/bge-reranker-base)
Search Parameters: Top-K values for retrieval and reranking

Project Structure

service-discovery-rag/
├── config/              # Configuration settings
├── data/
│   ├── models/         # LLM model files (GGUF)
│   ├── qdrant_storage/ # Qdrant database storage
│   └── specs/          # OpenAPI/Swagger JSON files
├── scripts/
│   └── ingest_specs.py # Ingestion script
├── src/
│   ├── cli/            # Command-line interface
│   ├── ingestion/      # Parsing, chunking, embedding
│   ├── models/         # Data models and LLM manager
│   ├── query/          # Query processing, retrieval, reranking
│   ├── storage/        # Qdrant client
│   └── utils/          # Logging utilities
├── docker-compose.yml  # Qdrant service
└── requirements.txt    # Python dependencies

Development

Key Components

Parser (src/ingestion/parser.py): Parses OpenAPI specs into structured data
Chunker (src/ingestion/chunker.py): Intelligently chunks API documentation
Embedder (src/ingestion/embedder.py): Generates embeddings using sentence transformers
Retriever (src/query/retriever.py): Performs vector similarity search
Reranker (src/query/reranker.py): Reranks results using cross-encoder
Generator (src/query/generator.py): Generates natural language responses

Logging

Logs are written to logs/service-discovery-rag-YYYY-MM-DD.log with detailed information about:

Query processing
Retrieval results
LLM interactions
Errors and warnings

Technologies

Vector Database: Qdrant
Embeddings: Sentence Transformers (BGE models)
LLM: llama.cpp (local inference)
Reranking: Cross-encoder models
CLI: Click + Rich
Parsing: Prance (OpenAPI validation)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Service Discovery RAG

Features

Architecture

Installation

Prerequisites

Setup

Usage

Ingest API Specifications

Search APIs

Configuration

Project Structure

Development

Key Components

Logging

Technologies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Service Discovery RAG

Features

Architecture

Installation

Prerequisites

Setup

Usage

Ingest API Specifications

Search APIs

Configuration

Project Structure

Development

Key Components

Logging

Technologies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages