Skip to content

NVISIA/sdd-globobank-rag-governance-solution

Repository files navigation

GloboBank RAG Governance Solution

A hierarchical, multi-tier Retrieval-Augmented Generation (RAG) system designed for enterprise governance knowledge management and AI-powered Spec-Driven Development (SDD).

🎯 Overview

This solution provides intelligent governance guidance to development teams and AI coding agents through a sophisticated RAG system that understands organizational hierarchy and context. Built specifically for GitHub Copilot integration and Spec-Driven Development workflows.

Key Features

  • 🏗️ Three-Tier Knowledge Architecture: Organization → Center of Excellence → Domain levels
  • 🤖 MCP Server Integration: Direct integration with AI coding agents and GitHub Copilot
  • 🔍 Multi-Pass Retrieval: Intelligent query analysis and hierarchical search
  • 🐳 Docker-Based Deployment: Complete containerized solution with Qdrant vector database
  • 📊 Semantic Document Processing: Advanced chunking and embedding strategies
  • Real-time Query Processing: Fast, contextual governance guidance

🚀 Quick Start

Prerequisites

  • Docker Desktop or Docker Engine
  • Git
  • VS Code with GitHub Copilot (recommended)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd sdd-globobank-rag-governance-solution
  2. Start the system:

    # Windows
    .\scripts\start.bat
    
    # Linux/Mac
    ./scripts/start.sh
  3. Verify installation:

First Query

Test the system with a governance query:

curl -X POST "http://localhost:8000/mcp/call_tool" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "governance_guidance",
    "arguments": {
      "query": "API security requirements for customer data",
      "context": "Building a new customer management API"
    }
  }'

📁 Project Structure

sdd-globobank-rag-governance-solution/
├── README.md                          # This file
├── DEVELOPER_GUIDE.md                 # Comprehensive development guide
├── docker-compose.yml                 # Docker orchestration
├── Dockerfile.rag-server             # RAG server container definition
├── requirements.txt                   # Python dependencies
├── 
├── config/
│   └── settings.py                    # Configuration management
├── 
├── src/
│   ├── models/
│   │   └── schema.py                  # Data models and schemas
│   ├── ingestion/
│   │   └── document_processor.py      # Document processing pipeline
│   ├── rag_engine/
│   │   └── retrieval.py              # Multi-pass retrieval engine
│   └── rag_server/
│       └── mcp_server.py             # MCP server implementation
├── 
├── scripts/
│   ├── start.sh                       # Linux/Mac startup script
│   ├── start.bat                      # Windows startup script
│   ├── stop.sh                        # Linux/Mac stop script
│   └── stop.bat                       # Windows stop script
├── 
├── knowledge-bases/                   # Knowledge base documents
│   ├── organization/                  # Enterprise-wide policies (Priority 1.0)
│   ├── coe/                          # Center of Excellence (Priority 0.8)
│   └── domain/                       # Domain-specific knowledge (Priority 0.6)
├── 
├── data/                              # Persistent data (created on startup)
│   └── qdrant_storage/               # Vector database storage
└── 
└── logs/                              # Application logs (created on startup)

🏛️ Architecture

System Components

graph TB
    A[GitHub Copilot] --> B[MCP Server :8000]
    B --> C[RAG Engine]
    C --> D[Qdrant Vector DB :6333]
    C --> E[OpenAI Query Analyzer]
    F[Knowledge Base] --> G[Document Processor]
    G --> D
    
    subgraph "Knowledge Hierarchy"
        H[Organization Level<br/>Priority: 1.0]
        I[COE Level<br/>Priority: 0.8]
        J[Domain Level<br/>Priority: 0.6]
    end
    
    D --> H
    D --> I
    D --> J
Loading

Knowledge Tiers

Tier Priority Description Examples
Organization 1.0 Enterprise-wide policies and standards Security policies, compliance frameworks
COE 0.8 Center of Excellence best practices Architecture patterns, development standards
Domain 0.6 Domain-specific procedures API specifications, testing guidelines

🔧 Configuration

Environment Variables

Create a .env file to customize the system:

# Vector Database
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION_NAME=governance_knowledge

# Embedding Configuration
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384

# Query Analysis
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_TEMPERATURE=0.1

# Retrieval Settings
MAX_RETRIEVAL_RESULTS=10
RELEVANCE_THRESHOLD=0.7
ENABLE_MULTI_PASS=true

# Server Configuration
RAG_SERVER_HOST=0.0.0.0
RAG_SERVER_PORT=8000
LOG_LEVEL=INFO

# Tier Weights (for retrieval prioritization)
ORGANIZATION_WEIGHT=1.0
COE_WEIGHT=0.8
DOMAIN_WEIGHT=0.6

Knowledge Base Structure

Organize your governance documents following this structure:

knowledge-bases/
├── organization/
│   ├── governance/
│   │   ├── enterprise-security-policy.md
│   │   └── compliance-framework.md
│   ├── security/
│   └── compliance/
├── 
├── coe/
│   ├── architecture/
│   │   ├── api-design-standards.md
│   │   └── microservices-patterns.md
│   ├── development/
│   └── testing/
└── 
└── domain/
    ├── payments/
    ├── lending/
    └── customer-mgmt/
        ├── api-specifications.md
        └── data-models.md

🤖 GitHub Copilot Integration

VS Code Configuration

Add to your .vscode/settings.json:

{
  "github.copilot.enable": {
    "*": true,
    "yaml": true,
    "plaintext": true,
    "markdown": true
  },
  "mcp.servers": {
    "globobank-rag": {
      "url": "http://localhost:8000",
      "enabled": true,
      "timeout": 30000
    }
  }
}

Usage Examples

# Ask Copilot to use governance knowledge
"""
@copilot: Get GloboBank API security requirements

Context: Building a customer data access API
Query: "API authentication and authorization standards"
"""

# Validate code against policies
"""
@copilot: Review this API design against GloboBank governance

Context: REST API for payment processing
Query: "payment API security and compliance requirements"
"""

📋 Spec-Driven Development

SDD Workflow Integration

  1. Specification Creation: Use RAG for governance alignment
  2. Validation: Automated compliance checking
  3. Implementation: Governance-informed code generation
  4. Review: Policy compliance verification

Example SDD Process

# spec.yaml
spec:
  title: "Customer Account API"
  version: "1.0.0"
  governance_validation:
    query: "customer account API requirements"
    compliance_areas: ["data_privacy", "security", "audit"]

implementation:
  framework: "FastAPI"
  authentication: "OAuth2"  # From governance guidance
  logging: "structured"     # From governance guidance
  monitoring: "prometheus"  # From governance guidance

🛠️ Development

Adding Knowledge Documents

  1. Create markdown files in the appropriate tier folder

  2. Follow the document template:

    # Document Title
    
    ## Metadata
    - **Tier**: organization|coe|domain
    - **Category**: governance|security|architecture
    - **Tags**: api, security, compliance
    - **Last Updated**: 2024-01-15
    
    ## Content
    Your governance content here...
  3. Restart the ingestion process:

    docker-compose restart rag-server

Testing Queries

Use the interactive API documentation:

  1. Open http://localhost:8000/docs
  2. Navigate to the /mcp/call_tool endpoint
  3. Test with sample queries

Local Development

# Set up development environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Start Qdrant only
docker-compose up qdrant -d

# Run RAG server locally
python -m src.rag_server.mcp_server

📊 Monitoring and Maintenance

Health Checks

# System health
curl http://localhost:8000/health

# Qdrant status
curl http://localhost:6333/collections/governance_knowledge

# Document count
curl http://localhost:6333/collections/governance_knowledge/points/count

Log Analysis

# Application logs
docker-compose logs -f rag-server

# Query performance
docker-compose logs rag-server | grep "query_time"

# Error monitoring
docker-compose logs rag-server | grep -i "error\|exception"

Performance Tuning

Adjust these settings in .env for optimal performance:

# Increase for better accuracy, decrease for speed
RELEVANCE_THRESHOLD=0.75

# Optimize for your use case
MAX_RETRIEVAL_RESULTS=15
EMBEDDING_BATCH_SIZE=50

# Memory optimization
QDRANT_MEMORY_LIMIT=2g

🚨 Troubleshooting

Common Issues

Issue Symptoms Solution
Connection Error Cannot reach services docker-compose restart
Poor Results Irrelevant responses Adjust RELEVANCE_THRESHOLD
Slow Queries High response times Check docker stats, increase memory
No Documents Empty results Verify knowledge base structure

Debug Mode

Enable detailed logging:

echo "LOG_LEVEL=DEBUG" >> .env
docker-compose restart rag-server
docker-compose logs -f rag-server

📚 Documentation

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -am 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Development Guidelines

  • Follow existing code style and patterns
  • Add tests for new functionality
  • Update documentation for changes
  • Ensure governance compliance checks pass

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

For enterprise support, contact the GloboBank Architecture Team.


🎉 Acknowledgments

  • Built for GitHub Copilot and Spec-Driven Development
  • Powered by Qdrant Vector Database
  • Uses Sentence Transformers for embeddings
  • Inspired by enterprise governance best practices RAG solution to support hierarchical, multi-agent retrieval of GloboBank's Governance context

About

RAG solution to support hierarchical, multi-agent retrieval of GloboBank's Governance context

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors