GloboBank RAG Governance Solution

A hierarchical, multi-tier Retrieval-Augmented Generation (RAG) system designed for enterprise governance knowledge management and AI-powered Spec-Driven Development (SDD).

🎯 Overview

This solution provides intelligent governance guidance to development teams and AI coding agents through a sophisticated RAG system that understands organizational hierarchy and context. Built specifically for GitHub Copilot integration and Spec-Driven Development workflows.

Key Features

🏗️ Three-Tier Knowledge Architecture: Organization → Center of Excellence → Domain levels
🤖 MCP Server Integration: Direct integration with AI coding agents and GitHub Copilot
🔍 Multi-Pass Retrieval: Intelligent query analysis and hierarchical search
🐳 Docker-Based Deployment: Complete containerized solution with Qdrant vector database
📊 Semantic Document Processing: Advanced chunking and embedding strategies
⚡ Real-time Query Processing: Fast, contextual governance guidance

🚀 Quick Start

Prerequisites

Docker Desktop or Docker Engine
Git
VS Code with GitHub Copilot (recommended)

Installation

Clone the repository:

git clone <repository-url>
cd sdd-globobank-rag-governance-solution

Start the system:

# Windows
.\scripts\start.bat

# Linux/Mac
./scripts/start.sh

Verify installation:
- Qdrant Dashboard: http://localhost:6333/dashboard
- RAG Server Health: http://localhost:8000/health
- API Documentation: http://localhost:8000/docs

First Query

Test the system with a governance query:

curl -X POST "http://localhost:8000/mcp/call_tool" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "governance_guidance",
    "arguments": {
      "query": "API security requirements for customer data",
      "context": "Building a new customer management API"
    }
  }'

📁 Project Structure

sdd-globobank-rag-governance-solution/
├── README.md                          # This file
├── DEVELOPER_GUIDE.md                 # Comprehensive development guide
├── docker-compose.yml                 # Docker orchestration
├── Dockerfile.rag-server             # RAG server container definition
├── requirements.txt                   # Python dependencies
├── 
├── config/
│   └── settings.py                    # Configuration management
├── 
├── src/
│   ├── models/
│   │   └── schema.py                  # Data models and schemas
│   ├── ingestion/
│   │   └── document_processor.py      # Document processing pipeline
│   ├── rag_engine/
│   │   └── retrieval.py              # Multi-pass retrieval engine
│   └── rag_server/
│       └── mcp_server.py             # MCP server implementation
├── 
├── scripts/
│   ├── start.sh                       # Linux/Mac startup script
│   ├── start.bat                      # Windows startup script
│   ├── stop.sh                        # Linux/Mac stop script
│   └── stop.bat                       # Windows stop script
├── 
├── knowledge-bases/                   # Knowledge base documents
│   ├── organization/                  # Enterprise-wide policies (Priority 1.0)
│   ├── coe/                          # Center of Excellence (Priority 0.8)
│   └── domain/                       # Domain-specific knowledge (Priority 0.6)
├── 
├── data/                              # Persistent data (created on startup)
│   └── qdrant_storage/               # Vector database storage
└── 
└── logs/                              # Application logs (created on startup)

🏛️ Architecture

System Components

graph TB
    A[GitHub Copilot] --> B[MCP Server :8000]
    B --> C[RAG Engine]
    C --> D[Qdrant Vector DB :6333]
    C --> E[OpenAI Query Analyzer]
    F[Knowledge Base] --> G[Document Processor]
    G --> D
    
    subgraph "Knowledge Hierarchy"
        H[Organization Level<br/>Priority: 1.0]
        I[COE Level<br/>Priority: 0.8]
        J[Domain Level<br/>Priority: 0.6]
    end
    
    D --> H
    D --> I
    D --> J

Knowledge Tiers

Tier	Priority	Description	Examples
Organization	1.0	Enterprise-wide policies and standards	Security policies, compliance frameworks
COE	0.8	Center of Excellence best practices	Architecture patterns, development standards
Domain	0.6	Domain-specific procedures	API specifications, testing guidelines

🔧 Configuration

Environment Variables

Create a .env file to customize the system:

# Vector Database
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION_NAME=governance_knowledge

# Embedding Configuration
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_DIMENSION=384

# Query Analysis
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_TEMPERATURE=0.1

# Retrieval Settings
MAX_RETRIEVAL_RESULTS=10
RELEVANCE_THRESHOLD=0.7
ENABLE_MULTI_PASS=true

# Server Configuration
RAG_SERVER_HOST=0.0.0.0
RAG_SERVER_PORT=8000
LOG_LEVEL=INFO

# Tier Weights (for retrieval prioritization)
ORGANIZATION_WEIGHT=1.0
COE_WEIGHT=0.8
DOMAIN_WEIGHT=0.6

Knowledge Base Structure

Organize your governance documents following this structure:

knowledge-bases/
├── organization/
│   ├── governance/
│   │   ├── enterprise-security-policy.md
│   │   └── compliance-framework.md
│   ├── security/
│   └── compliance/
├── 
├── coe/
│   ├── architecture/
│   │   ├── api-design-standards.md
│   │   └── microservices-patterns.md
│   ├── development/
│   └── testing/
└── 
└── domain/
    ├── payments/
    ├── lending/
    └── customer-mgmt/
        ├── api-specifications.md
        └── data-models.md

🤖 GitHub Copilot Integration

VS Code Configuration

Add to your .vscode/settings.json:

{
  "github.copilot.enable": {
    "*": true,
    "yaml": true,
    "plaintext": true,
    "markdown": true
  },
  "mcp.servers": {
    "globobank-rag": {
      "url": "http://localhost:8000",
      "enabled": true,
      "timeout": 30000
    }
  }
}

Usage Examples

# Ask Copilot to use governance knowledge
"""
@copilot: Get GloboBank API security requirements

Context: Building a customer data access API
Query: "API authentication and authorization standards"
"""

# Validate code against policies
"""
@copilot: Review this API design against GloboBank governance

Context: REST API for payment processing
Query: "payment API security and compliance requirements"
"""

📋 Spec-Driven Development

SDD Workflow Integration

Specification Creation: Use RAG for governance alignment
Validation: Automated compliance checking
Implementation: Governance-informed code generation
Review: Policy compliance verification

Example SDD Process

# spec.yaml
spec:
  title: "Customer Account API"
  version: "1.0.0"
  governance_validation:
    query: "customer account API requirements"
    compliance_areas: ["data_privacy", "security", "audit"]

implementation:
  framework: "FastAPI"
  authentication: "OAuth2"  # From governance guidance
  logging: "structured"     # From governance guidance
  monitoring: "prometheus"  # From governance guidance

🛠️ Development

Adding Knowledge Documents

Create markdown files in the appropriate tier folder

Follow the document template:

# Document Title

## Metadata
- **Tier**: organization|coe|domain
- **Category**: governance|security|architecture
- **Tags**: api, security, compliance
- **Last Updated**: 2024-01-15

## Content
Your governance content here...

Restart the ingestion process:
```
docker-compose restart rag-server
```

Testing Queries

Use the interactive API documentation:

Open http://localhost:8000/docs
Navigate to the /mcp/call_tool endpoint
Test with sample queries

Local Development

# Set up development environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Start Qdrant only
docker-compose up qdrant -d

# Run RAG server locally
python -m src.rag_server.mcp_server

📊 Monitoring and Maintenance

Health Checks

# System health
curl http://localhost:8000/health

# Qdrant status
curl http://localhost:6333/collections/governance_knowledge

# Document count
curl http://localhost:6333/collections/governance_knowledge/points/count

Log Analysis

# Application logs
docker-compose logs -f rag-server

# Query performance
docker-compose logs rag-server | grep "query_time"

# Error monitoring
docker-compose logs rag-server | grep -i "error\|exception"

Performance Tuning

Adjust these settings in .env for optimal performance:

# Increase for better accuracy, decrease for speed
RELEVANCE_THRESHOLD=0.75

# Optimize for your use case
MAX_RETRIEVAL_RESULTS=15
EMBEDDING_BATCH_SIZE=50

# Memory optimization
QDRANT_MEMORY_LIMIT=2g

🚨 Troubleshooting

Common Issues

Issue	Symptoms	Solution
Connection Error	Cannot reach services	`docker-compose restart`
Poor Results	Irrelevant responses	Adjust `RELEVANCE_THRESHOLD`
Slow Queries	High response times	Check `docker stats`, increase memory
No Documents	Empty results	Verify knowledge base structure

Debug Mode

Enable detailed logging:

echo "LOG_LEVEL=DEBUG" >> .env
docker-compose restart rag-server
docker-compose logs -f rag-server

📚 Documentation

Developer Guide: Comprehensive development documentation
API Reference: Interactive API documentation
Architecture Decisions: Architectural decision records
Examples: Usage examples and templates

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit changes: git commit -am 'Add amazing feature'
Push to branch: git push origin feature/amazing-feature
Open a Pull Request

Development Guidelines

Follow existing code style and patterns
Add tests for new functionality
Update documentation for changes
Ensure governance compliance checks pass

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙋‍♂️ Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Developer Guide

For enterprise support, contact the GloboBank Architecture Team.

🎉 Acknowledgments

Built for GitHub Copilot and Spec-Driven Development
Powered by Qdrant Vector Database
Uses Sentence Transformers for embeddings
Inspired by enterprise governance best practices RAG solution to support hierarchical, multi-agent retrieval of GloboBank's Governance context

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
data/qdrant_storage		data/qdrant_storage
knowledge-bases		knowledge-bases
logs		logs
scripts		scripts
src		src
.gitignore		.gitignore
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
Dockerfile.rag-server		Dockerfile.rag-server
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_ingestion.py		test_ingestion.py

Folders and files

Latest commit

History

Repository files navigation

GloboBank RAG Governance Solution

🎯 Overview

Key Features

🚀 Quick Start

Prerequisites

Installation

First Query

📁 Project Structure

🏛️ Architecture

System Components

Knowledge Tiers

🔧 Configuration

Environment Variables

Knowledge Base Structure

🤖 GitHub Copilot Integration

VS Code Configuration

Usage Examples

📋 Spec-Driven Development

SDD Workflow Integration

Example SDD Process

🛠️ Development

Adding Knowledge Documents

Testing Queries

Local Development

📊 Monitoring and Maintenance

Health Checks

Log Analysis

Performance Tuning

🚨 Troubleshooting

Common Issues

Debug Mode

📚 Documentation

🤝 Contributing

Development Guidelines

📄 License

🙋‍♂️ Support

🎉 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages