Skip to content

rdotjain/service-discovery-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Service Discovery RAG

A semantic search and discovery tool for OpenAPI specifications powered by RAG (Retrieval Augmented Generation). Service Discovery RAG helps developers quickly find relevant APIs, endpoints, and services by understanding natural language queries.

Features

  • 🔍 Semantic Search: Find APIs using natural language queries instead of exact keyword matching
  • 📚 OpenAPI Support: Automatically parses and indexes OpenAPI/Swagger specifications
  • 🤖 LLM-Powered: Uses local LLMs for query understanding and response generation
  • 🎯 Intelligent Reranking: Multi-stage retrieval with reranking for better results
  • 💬 Interactive CLI: Beautiful command-line interface with rich formatting
  • 🚀 Vector Search: Fast similarity search using Qdrant vector database
  • 📊 Confidence Scoring: Provides confidence levels for search results

Architecture

Service Discovery RAG uses a RAG (Retrieval Augmented Generation) pipeline:

  1. Ingestion: Parses OpenAPI specs → Chunks content → Generates embeddings → Stores in Qdrant
  2. Query Processing: Natural language query → LLM expansion → Intent classification
  3. Retrieval: Vector similarity search → Top-K results
  4. Reranking: Cross-encoder reranking for improved relevance
  5. Generation: LLM generates natural language response with citations

Installation

Prerequisites

  • Python 3.9+
  • Docker and Docker Compose (for Qdrant)
  • Local LLM server (llama.cpp)

Setup

  1. Clone the repository

    git clone <repository-url>
    cd service-discovery-rag
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Start Qdrant (vector database)

    docker-compose up -d
  5. Configure environment (optional) Create a .env file to override default settings:

    QDRANT_URL=http://localhost:6333
    LLAMA_URL=http://127.0.0.1:8080
    EMBEDDING_MODEL=BAAI/bge-large-en-v1.5
  6. Set up LLM server

    # Download a GGUF model to data/models/
    # Start llama.cpp server
    llama-server --model data/models/mistral-7b-instruct.Q4_K_M.gguf

Usage

Ingest API Specifications

First, add your OpenAPI/Swagger JSON files to data/specs/, then run:

python scripts/ingest_specs.py

This will:

  • Parse all OpenAPI specifications
  • Chunk the content intelligently
  • Generate embeddings
  • Store everything in Qdrant

Search APIs

Single query:

python -m src.cli.app search "How do I create a payment?"

Interactive mode:

python -m src.cli.app interactive

Example queries:

  • "How do I authenticate a user?"
  • "Find endpoints for account management"
  • "What APIs handle payment processing?"
  • "Show me POST endpoints for creating orders"

Configuration

Configuration is managed in config/settings.py. Key settings:

  • Qdrant: Vector database URL and collection names
  • Embedding Model: Model for generating embeddings (default: BAAI/bge-large-en-v1.5)
  • LLM: Local LLM server URL and parameters
  • Reranker: Model for reranking results (default: BAAI/bge-reranker-base)
  • Search Parameters: Top-K values for retrieval and reranking

Project Structure

service-discovery-rag/
├── config/              # Configuration settings
├── data/
│   ├── models/         # LLM model files (GGUF)
│   ├── qdrant_storage/ # Qdrant database storage
│   └── specs/          # OpenAPI/Swagger JSON files
├── scripts/
│   └── ingest_specs.py # Ingestion script
├── src/
│   ├── cli/            # Command-line interface
│   ├── ingestion/      # Parsing, chunking, embedding
│   ├── models/         # Data models and LLM manager
│   ├── query/          # Query processing, retrieval, reranking
│   ├── storage/        # Qdrant client
│   └── utils/          # Logging utilities
├── docker-compose.yml  # Qdrant service
└── requirements.txt    # Python dependencies

Development

Key Components

  • Parser (src/ingestion/parser.py): Parses OpenAPI specs into structured data
  • Chunker (src/ingestion/chunker.py): Intelligently chunks API documentation
  • Embedder (src/ingestion/embedder.py): Generates embeddings using sentence transformers
  • Retriever (src/query/retriever.py): Performs vector similarity search
  • Reranker (src/query/reranker.py): Reranks results using cross-encoder
  • Generator (src/query/generator.py): Generates natural language responses

Logging

Logs are written to logs/service-discovery-rag-YYYY-MM-DD.log with detailed information about:

  • Query processing
  • Retrieval results
  • LLM interactions
  • Errors and warnings

Technologies

  • Vector Database: Qdrant
  • Embeddings: Sentence Transformers (BGE models)
  • LLM: llama.cpp (local inference)
  • Reranking: Cross-encoder models
  • CLI: Click + Rich
  • Parsing: Prance (OpenAPI validation)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages