LlamaGate

LlamaGate is a lean, production-ready, OpenAI-compatible API gateway for local LLMs (Ollama). It lets you point existing OpenAI SDKs (Python, Node, etc.) at local models as a drop-in replacement, with streaming, tool/function calling (via MCP), authentication, rate limiting, caching, and structured logging.

🚀 New to LlamaGate? Quick Start Guide — Get running in 2 minutes.

Features

✅ OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
✅ Streaming Chat Completions: Full support for Server-Sent Events (SSE) streaming
✅ Tool / Function Calling: Execute MCP tools in multi-round loops with safety limits (round limits, call limits, timeouts, size limits, allow/deny lists)
✅ Authentication: Optional API key authentication via headers
✅ Rate Limiting: Configurable rate limiting using leaky bucket algorithm
✅ Request Correlation & Structured Logging: JSON logging with request IDs using Zerolog
✅ Caching: In-memory caching for identical prompts to reduce Ollama load
✅ MCP Client Support: Connect to MCP servers and expose their tools to models (MCP Guide | Quick Start)

Note: Agentic modules and the extension system (workflows, middleware, observability) were removed in Phase 1. LlamaGate is now a core-only OpenAI-compatible gateway. See Core Contract and Phase 1 Removal.

Documentation

📖 Quick Start Guide - Get running in 2 minutes
📚 Full Documentation Index - Browse all documentation
🔧 MCP Integration - Model Context Protocol guide
🚀 MCP Quick Start - Get started with MCP in 5 minutes
🎯 MCP Demo Guide - Full demo with multiple servers
🌐 MCP HTTP API - Complete API reference for MCP management
📋 Core Contract - Core endpoints and config (post–Phase 1)
🧪 Testing Guide - Testing your setup
📦 Installation Guide - Detailed installation instructions
✅ Manual Acceptance Test - Comprehensive acceptance test checklist for human verification

Example Repositories

🎯 MCP Examples - See MCP Demo Guide and MCP Quick Start.

Installation

⚡ Method 1: One-Line Command (Recommended)

Copy and paste one command - it downloads the installer and runs it automatically!

This method downloads a pre-built binary (no Go required):

Windows (PowerShell):
If you have the repo locally, run install\windows\install-binary.ps1. Otherwise build from source (see Method 3).

Unix/Linux/macOS:
If you have the repo locally, run install/unix/install-binary.sh. Otherwise build from source (see Method 3).

What happens:

Installer (or you) downloads the pre-built binary from your release host, or you build from source
Runs the installer / places the binary
Installer downloads the pre-built binary for your platform
Sets up the executable and automatically creates .env configuration file if missing

That's it! You're ready to run LlamaGate.

🔧 Method 2: Run Installer Directly (If You've Cloned the Repo)

If you've already cloned the repository, you can run the installer directly:

Binary installer (downloads pre-built binary):

Windows:

install\windows\install-binary.cmd

Unix/Linux/macOS:

chmod +x install/unix/install-binary.sh
./install/unix/install-binary.sh

Source installer (builds from source):

Windows:

install\windows\install.cmd

Unix/Linux/macOS:

chmod +x install/unix/install.sh
./install/unix/install.sh

The source installer will:

✅ Check for Go and install it if needed
✅ Check for Ollama and guide you to install it
✅ Install all Go dependencies
✅ Build the LlamaGate binary from source
✅ Automatically create .env configuration file (from .env.example or with defaults)

🔨 Method 3: Build from Source (For Developers)

If you need to build from source, you have two options:

Option A: One-Line Command (Downloads Source Installer)

Windows (PowerShell):

# From repo root: .\install\windows\install.ps1

Unix/Linux/macOS:

# From repo root: ./install/unix/install.sh

This downloads and runs the source installer, which handles Go installation and builds from source.

Option B: Manual Build (If You Have Go Installed)

A full build of all packages (go build ./...) must succeed so that downstream tooling (CI, E2E, forked automation) that builds LlamaGate from source does not break. Then build the main binary:

Unix/Linux/macOS:

# Clone the repository (replace with your repo URL if you host elsewhere)
git clone <your-llamagate-repo-url>.git
cd LlamaGate

# Build all packages (required for CI/E2E/build-from-source integrators)
go build ./...

# Build main binary
go build -o llamagate ./cmd/llamagate

# Or install to $GOPATH/bin
go install ./cmd/llamagate

Windows (PowerShell):

# Clone the repository (handle stderr output)
$ErrorActionPreference = "Continue"  # Git writes progress to stderr
git clone <your-llamagate-repo-url>.git
$ErrorActionPreference = "Stop"  # Restore if needed

cd LlamaGate

# Build all packages (required for CI/E2E/build-from-source integrators)
go build ./...

# Build main binary
go build -o llamagate.exe ./cmd/llamagate

# Or install to $GOPATH/bin
go install ./cmd/llamagate

Note: Git writes progress messages to stderr even on success. In PowerShell with $ErrorActionPreference = "Stop", this can cause failures. See Troubleshooting section below for details.

🐳 Docker

Build and run LlamaGate in a container. The image does not include Ollama; set OLLAMA_HOST to your Ollama instance (host, another container, or service URL).

Build:

docker build -t llamagate .

Run (Ollama on host):

# Linux
docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 llamagate

# Windows/macOS (host.docker.internal works)
docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 llamagate

Run with optional API key:

docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 -e API_KEY=sk-your-key llamagate

Key env vars: OLLAMA_HOST (required if Ollama not on localhost), PORT (default 11435), API_KEY (optional), MCP_ENABLED (optional). See Configuration for details and .env.example for all supported env vars.

Run with Ollama in one command: docker compose up — see below.

Docker Compose (LlamaGate + Ollama)

From the repo root, run both LlamaGate and Ollama with one command:

docker compose up -d

LlamaGate: http://localhost:11435
Ollama: http://localhost:11434 (pull models with ollama pull llama2 or use the UI)

To rebuild LlamaGate after code changes: docker compose up -d --build. See docker-compose.yml for env overrides (e.g. API_KEY).

Development Setup

Note: This section enhances LlamaGate's existing installation methods by adding a developer-focused one-command workflow. It complements (does not replace) the existing binary installer, source installer, and manual build methods documented above.

For developers integrating LlamaGate into projects, the one-command setup process provides a single command that automates the complete development workflow:

✅ Validates environment (Go, ports) - Catches issues before build
✅ LlamaGate auto-starts Ollama if not running - Built into LlamaGate application
✅ Auto-clones LlamaGate if missing (standardized sibling directory)
✅ Smart build - Only rebuilds if source is newer than binary
✅ Auto-starts LlamaGate - No manual start needed
✅ Verifies it's running - Health check confirmation

This enhances existing methods by:

Adding developer workflow automation (complements installation methods)
Providing standardized directory structure guidance
Enabling smart rebuilds (only when needed)
Automating the complete setup-to-running workflow

Standardized Directory Structure

For integration projects, use this recommended structure for consistency:

YourProjectParent/
├── LlamaGate/           # ← Clone LlamaGate here (sibling directory)
└── YourProject/         # ← Your application

Why sibling directory?

Consistent across all integration projects
Easy to reference with relative paths
Works well with version control
Standard practice for integration workflows

One-Command Setup

Windows PowerShell:

Save this script to your project (e.g., scripts/setup-llamagate-dev.ps1):

# One-Command LlamaGate Development Setup
# Standardized process: Validate → Clone (if needed) → Build → Start → Verify
# Based on community best practices for LlamaGate integration

param(
    [string]$LlamaGatePath = "..\LlamaGate",
    [int]$LlamaGatePort = 11435,
    [switch]$SkipClone,
    [switch]$SkipBuild
)

$ErrorActionPreference = "Stop"

Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "LlamaGate One-Command Development Setup" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""

# Step 1: Environment Validation
Write-Host "[1/6] Validating environment..." -ForegroundColor Yellow

# Check Go
try {
    $goVersion = go version 2>&1
    if ($LASTEXITCODE -ne 0) {
        Write-Host "  ✗ Go is not installed" -ForegroundColor Red
        Write-Host "  Please install Go 1.19+ from: https://go.dev/dl/" -ForegroundColor Yellow
        exit 1
    }
    Write-Host "  ✓ Go: $goVersion" -ForegroundColor Green
} catch {
    Write-Host "  ✗ Go is not installed" -ForegroundColor Red
    Write-Host "  Please install Go 1.19+ from: https://go.dev/dl/" -ForegroundColor Yellow
    exit 1
}

# Note: LlamaGate will automatically start Ollama if not running
Write-Host "  Note: LlamaGate will auto-start Ollama if needed" -ForegroundColor Gray

# Step 2: Check if LlamaGate is already running
Write-Host "[2/6] Checking if LlamaGate is already running..." -ForegroundColor Yellow
try {
    $response = Invoke-WebRequest -Uri "http://localhost:$LlamaGatePort/health" -Method GET -TimeoutSec 2 -ErrorAction SilentlyContinue
    if ($response.StatusCode -eq 200) {
        Write-Host "  ✓ LlamaGate is already running" -ForegroundColor Green
        Write-Host ""
        Write-Host "LlamaGate is ready!" -ForegroundColor Green
        Write-Host "URL: http://localhost:$LlamaGatePort" -ForegroundColor Cyan
        exit 0
    }
} catch {
    Write-Host "  LlamaGate is not running" -ForegroundColor Yellow
}

# Step 3: Find or Clone LlamaGate (Standardized: Sibling Directory)
Write-Host "[3/6] Locating LlamaGate source..." -ForegroundColor Yellow

# Standardized approach: Primary is sibling directory
$siblingPath = Resolve-Path "..\LlamaGate" -ErrorAction SilentlyContinue
$foundPath = $null

if ($siblingPath -and (Test-Path (Join-Path $siblingPath "cmd\llamagate"))) {
    $foundPath = $siblingPath
    Write-Host "  ✓ Found at sibling directory: $foundPath" -ForegroundColor Green
} else {
    # Check environment variable override
    $envPath = $env:LLAMAGATE_PATH
    if ($envPath -and (Test-Path (Join-Path $envPath "cmd\llamagate"))) {
        $foundPath = Resolve-Path $envPath
        Write-Host "  ✓ Found via LLAMAGATE_PATH: $foundPath" -ForegroundColor Green
    } else {
        if ($SkipClone) {
            Write-Host "  ✗ LlamaGate source not found" -ForegroundColor Red
            Write-Host "  Expected location: $(Resolve-Path ".." -ErrorAction SilentlyContinue)\LlamaGate" -ForegroundColor Yellow
            Write-Host "  Or set LLAMAGATE_PATH environment variable" -ForegroundColor Yellow
            exit 1
        } else {
            Write-Host "  LlamaGate source not found" -ForegroundColor Yellow
            Write-Host "  Cloning LlamaGate as sibling directory..." -ForegroundColor Yellow
            
            # Clone as sibling directory (standardized)
            $parentDir = Resolve-Path ".." -ErrorAction Stop
            $clonePath = Join-Path $parentDir "LlamaGate"
            
            if (Test-Path $clonePath) {
                Write-Host "  ✗ Directory already exists: $clonePath" -ForegroundColor Red
                Write-Host "  Please remove it or use -SkipClone to skip cloning" -ForegroundColor Yellow
                exit 1
            }
            
            Push-Location $parentDir
            try {
                Write-Host "  Cloning from GitHub..." -ForegroundColor Gray
                git clone <your-llamagate-repo-url>.git
                if ($LASTEXITCODE -ne 0) {
                    Write-Host "  ✗ Clone failed" -ForegroundColor Red
                    Pop-Location
                    exit 1
                }
                $foundPath = Resolve-Path "LlamaGate"
                Write-Host "  ✓ Cloned successfully to: $foundPath" -ForegroundColor Green
            } catch {
                Write-Host "  ✗ Clone failed: $_" -ForegroundColor Red
                Pop-Location
                exit 1
            } finally {
                Pop-Location
            }
        }
    }
}

# Step 4: Check Port Availability
Write-Host "[4/6] Checking port availability..." -ForegroundColor Yellow
try {
    $tcpClient = New-Object System.Net.Sockets.TcpClient
    $asyncResult = $tcpClient.BeginConnect("localhost", $LlamaGatePort, $null, $null)
    $wait = $asyncResult.AsyncWaitHandle.WaitOne(500, $false)
    if ($wait) {
        $tcpClient.EndConnect($asyncResult)
        $tcpClient.Close()
        Write-Host "  ✗ Port $LlamaGatePort is already in use" -ForegroundColor Red
        Write-Host "  Please stop the process using this port or use a different port" -ForegroundColor Yellow
        exit 1
    }
} catch {
    # Port is free (connection failed is expected)
}
Write-Host "  ✓ Port $LlamaGatePort is available" -ForegroundColor Green

# Step 5: Build LlamaGate
if (-not $SkipBuild) {
    Write-Host "[5/6] Building LlamaGate from source..." -ForegroundColor Yellow
    Push-Location $foundPath
    
    try {
        # Check if binary exists and is newer than source
        $binaryPath = Join-Path $foundPath "llamagate.exe"
        $needsBuild = $true
        
        if (Test-Path $binaryPath) {
            $binaryTime = (Get-Item $binaryPath).LastWriteTime
            $sourceTime = (Get-ChildItem -Path (Join-Path $foundPath "cmd\llamagate") -Recurse -File | 
                          Measure-Object -Property LastWriteTime -Maximum).Maximum
            
            if ($binaryTime -gt $sourceTime) {
                Write-Host "  Binary is up to date, skipping build" -ForegroundColor Gray
                $needsBuild = $false
            }
        }
        
        if ($needsBuild) {
            Write-Host "  Building (this may take a few minutes)..." -ForegroundColor Gray
            $buildOutput = go build -o llamagate.exe ./cmd/llamagate 2>&1
            
            if ($LASTEXITCODE -ne 0) {
                Write-Host "  ✗ Build failed" -ForegroundColor Red
                Write-Host $buildOutput -ForegroundColor Red
                Pop-Location
                exit 1
            }
            
            if (-not (Test-Path "llamagate.exe")) {
                Write-Host "  ✗ Binary not found after build" -ForegroundColor Red
                Pop-Location
                exit 1
            }
            
            Write-Host "  ✓ Build successful" -ForegroundColor Green
        }
    } catch {
        Write-Host "  ✗ Build error: $_" -ForegroundColor Red
        Pop-Location
        exit 1
    } finally {
        Pop-Location
    }
} else {
    Write-Host "[5/6] Skipping build (requested)" -ForegroundColor Yellow
}

# Step 6: Start LlamaGate
Write-Host "[6/6] Starting LlamaGate..." -ForegroundColor Yellow
Push-Location $foundPath

try {
    # Ensure .env exists with default configuration
    $envFile = Join-Path $foundPath ".env"
    $envExampleFile = Join-Path $foundPath ".env.example"
    
    if (-not (Test-Path $envFile)) {
        Write-Host "  Creating .env with default configuration..." -ForegroundColor Gray
        if (Test-Path $envExampleFile) {
            # Copy from .env.example
            Copy-Item $envExampleFile $envFile
            Write-Host "  ✓ Created .env from .env.example" -ForegroundColor Green
        } else {
            # Create with default values
            $defaultEnv = @"
# LlamaGate Configuration
# Generated by one-command setup with default values

# Ollama server URL
OLLAMA_HOST=http://localhost:11434

# API key for authentication (leave empty to disable authentication)
API_KEY=

# Rate limit (requests per second)
RATE_LIMIT_RPS=50

# Enable debug logging (true/false)
DEBUG=false

# Server port
PORT=$LlamaGatePort

# Log file path (leave empty to log only to console)
LOG_FILE=

# HTTP client timeout for Ollama requests (e.g., 5m, 30s, 30m - max 30 minutes)
TIMEOUT=5m
"@
            $defaultEnv | Out-File -FilePath $envFile -Encoding UTF8 -NoNewline
            Write-Host "  ✓ Created .env with default configuration" -ForegroundColor Green
        }
    }
    
    $binaryPath = Join-Path $foundPath "llamagate.exe"
    if (-not (Test-Path $binaryPath)) {
        Write-Host "  ✗ Binary not found: $binaryPath" -ForegroundColor Red
        Write-Host "  Please build first or remove -SkipBuild flag" -ForegroundColor Yellow
        Pop-Location
        exit 1
    }
    
    Write-Host "  Starting process..." -ForegroundColor Gray
    Write-Host "  Note: Windows may prompt for authorization (firewall/UAC)" -ForegroundColor Cyan
    Write-Host "  Please approve the prompt if it appears" -ForegroundColor Cyan
    $process = Start-Process -FilePath ".\llamagate.exe" -PassThru -WindowStyle Normal
    
    # Wait for LlamaGate to be ready
    $maxWait = 30
    $waited = 0
    $started = $false
    
    Write-Host "  Waiting for LlamaGate to be ready..." -ForegroundColor Gray -NoNewline
    while ($waited -lt $maxWait) {
        Start-Sleep -Seconds 1
        $waited++
        Write-Host "." -ForegroundColor Gray -NoNewline
        
        try {
            $testResponse = Invoke-WebRequest -Uri "http://localhost:$LlamaGatePort/health" -Method GET -TimeoutSec 1 -ErrorAction SilentlyContinue
            if ($testResponse.StatusCode -eq 200) {
                $started = $true
                Write-Host ""
                Write-Host "  ✓ LlamaGate started successfully" -ForegroundColor Green
                Write-Host "  PID: $($process.Id)" -ForegroundColor Cyan
                break
            }
        } catch {
            continue
        }
    }
    
    if (-not $started) {
        Write-Host ""
        Write-Host "  ✗ LlamaGate failed to start within $maxWait seconds" -ForegroundColor Red
        Stop-Process -Id $process.Id -Force -ErrorAction SilentlyContinue
        Pop-Location
        exit 1
    }
    
} catch {
    Write-Host "  ✗ Error starting LlamaGate: $_" -ForegroundColor Red
    Pop-Location
    exit 1
} finally {
    Pop-Location
}

# Success
Write-Host ""
Write-Host "========================================" -ForegroundColor Green
Write-Host "LlamaGate is ready for development!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host ""
Write-Host "PID: $($process.Id)" -ForegroundColor Cyan
Write-Host "URL: http://localhost:$LlamaGatePort" -ForegroundColor Cyan
Write-Host "Health: http://localhost:$LlamaGatePort/health" -ForegroundColor Cyan
Write-Host ""
Write-Host "To stop: Stop-Process -Id $($process.Id)" -ForegroundColor Gray
Write-Host ""

exit 0

Then run from your project directory:

.\scripts\setup-llamagate-dev.ps1

Unix/macOS:

[Unix script if available, or instructions to adapt]

When to Use

Use one-command setup when:

Integrating LlamaGate into your project
Development workflow automation
Standardized setup process
Frequent rebuilds during development

Use installation methods when:

End-user installation
Production deployment
Quick setup without Go
One-time installation

Note: This one-command process enhances LlamaGate's existing installation methods. For end-user installation, use the binary or source installers documented above.

Windows Quick Start

For Windows users, convenient batch files are provided:

scripts/windows/run.cmd - Run with default settings (no authentication)
scripts/windows/run-with-auth.cmd - Run with API key authentication enabled
scripts/windows/run-debug.cmd - Run with debug logging enabled
scripts/windows/build.cmd - Build the binary (llamagate.exe)

Run from command prompt:

scripts\windows\run.cmd

To customize settings, edit the batch file or set environment variables before running:

set OLLAMA_HOST=http://localhost:11434
set API_KEY=sk-llamagate
set RATE_LIMIT_RPS=20
scripts\windows\run.cmd

Configuration

LlamaGate can be configured via:

.env file (recommended for development) - Create a .env file in the project root
Environment variables - Takes precedence over .env file values
Default values - Used if neither .env nor environment variables are set

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`API_KEY`	(empty)	API key for authentication (optional)
`RATE_LIMIT_RPS`	`50`	Requests per second limit
`DEBUG`	`false`	Enable debug logging
`PORT`	`11435`	Server port
`LOG_FILE`	(empty)	Path to log file (optional, logs to console if empty)
`TLS_ENABLED`	`false`	Enable HTTPS/TLS
`TLS_CERT_FILE`	(empty)	Path to TLS certificate file (required if TLS_ENABLED=true)
`TLS_KEY_FILE`	(empty)	Path to TLS private key file (required if TLS_ENABLED=true)
`TIMEOUT`	`5m`	HTTP client timeout for Ollama requests (e.g., `5m`, `30s`, `30m` - max 30 minutes)
`MCP_ENABLED`	`false`	Enable MCP client functionality (see MCP docs)
`MCP_MAX_TOOL_ROUNDS`	`10`	Maximum tool execution rounds
`MCP_MAX_TOOL_CALLS_PER_ROUND`	`10`	Maximum tool calls per round
`MCP_DEFAULT_TOOL_TIMEOUT`	`30s`	Default timeout for tool execution
`MCP_MAX_TOOL_RESULT_SIZE`	`1048576`	Maximum tool result size in bytes (1MB)
`MCP_ALLOW_TOOLS`	(empty)	Comma-separated glob patterns for allowed tools
`MCP_DENY_TOOLS`	(empty)	Comma-separated glob patterns for denied tools

Note: MCP server configuration is best done via YAML/JSON config file. See mcp-config.example.yaml and MCP Documentation.

Migration Notes (Phase 1: Extensions/Modules Removed)

Removed endpoints: All /v1/extensions routes (GET/PUT/POST list, get, upsert, execute, refresh) and any dynamic extension endpoints. Requests to these paths now return 404.
Config: EXTENSIONS_UPSERT_ENABLED has been removed; remove it from your .env or config if present.
CLI: The llamagate-cli tool (import/export/list/remove/enable/disable extensions and agentic modules, migrate, sync) has been removed.
What remains: Core OpenAI-compatible endpoints (/v1/chat/completions, /v1/models), /health, /v1/hardware/recommendations, and when MCP is enabled, all /v1/mcp/* endpoints. See Core Contract.

Using .env File (Recommended)

Create a .env file in the project root (copy from .env.example):

# .env (recommended for documentation examples)
OLLAMA_HOST=http://localhost:11434
API_KEY=sk-llamagate
RATE_LIMIT_RPS=50
DEBUG=false
PORT=11435
LOG_FILE=
TIMEOUT=5m

Note: Set API_KEY=sk-llamagate to match all documentation examples. Leave it empty (API_KEY=) to disable authentication. For development/testing, you may also want to set:

DEBUG=true (to enable debug logging)
LOG_FILE=llamagate.log (to log to file)

The .env file is automatically loaded when the application starts. Environment variables set directly will override .env file values, making it easy to override settings for specific runs.

Authentication

When API_KEY is configured, all API endpoints (except /health) require authentication.

LlamaGate supports two authentication header formats:

X-API-Key Header (Recommended)

curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/models

Authorization Bearer Header (Alternative)

curl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/models

The X-API-Key header is checked first. If not present, Authorization: Bearer is checked.

Note: The /health endpoint does not require authentication and can be used for monitoring and load balancer health checks.

Example (Linux/Mac)

Note: These are testing/development values. For production, use defaults or configure via .env file.

export OLLAMA_HOST="http://localhost:11434"
export API_KEY="sk-llamagate"
export RATE_LIMIT_RPS=20
export DEBUG=true
export PORT=11435

llamagate

Example (Windows)

Note: These are testing/development values. For production, use defaults or configure via .env file.

set OLLAMA_HOST=http://localhost:11434
set API_KEY=sk-llamagate
set RATE_LIMIT_RPS=20
set DEBUG=true
set PORT=11435

llamagate.exe

Or use the provided batch files (see Windows Quick Start above).

Note: If you use a .env file, you don't need to set environment variables manually - just create .env and run the application!

Supported Authentication Headers

LlamaGate supports two authentication header formats (both are case-insensitive):

1. X-API-Key Header (Recommended)

curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/models

The header name is case-insensitive. All of the following are accepted:

X-API-Key
x-api-key
X-Api-Key
Any other case variation

2. Authorization Bearer Header (Alternative)

curl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/models

The "Bearer" scheme is case-insensitive. All of the following are accepted:

Authorization: Bearer sk-llamagate
Authorization: bearer sk-llamagate
Authorization: BEARER sk-llamagate

Header Priority: The X-API-Key header is checked first. If not present, Authorization: Bearer is checked.

Authentication Behavior

When Authentication is Required: All endpoints except /health require authentication when API_KEY is configured.
When Authentication is Missing: Requests without a valid authentication header return 401 Unauthorized with an OpenAI-compatible error response.
When Authentication is Invalid: Requests with an invalid or incorrect API key return 401 Unauthorized with an OpenAI-compatible error response.

Error Response Format

Authentication errors return HTTP 401 Unauthorized with a JSON response in OpenAI-compatible format:

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "request_id": "req-123456"
  }
}

Security

API keys are never logged: Authentication failures are logged with a generic message ("Authentication failed") but the provided API key or bearer token is never included in logs.
Constant-time comparison: API key validation uses constant-time comparison to prevent timing attacks.
Health endpoint bypass: The /health endpoint does not require authentication and can be used for monitoring and load balancer health checks.

Usage

💡 Migrating from OpenAI? See the Quick Start Guide for step-by-step migration examples.

🔧 Using MCP Tools? See the MCP Quick Start Guide to get started with MCP integration. For complete details, see the MCP Documentation.

🎯 Want to see MCP in action? Check out the MCP Demo QuickStart for a complete example with multiple document processing servers.

📚 Looking for more examples? Check out our example repositories:

Use the OpenAI SDK with base_url pointing at your LlamaGate instance (see API).

More example repositories coming soon: Extension examples, MCP examples

Usage Examples

All examples below assume:

LlamaGate running locally on http://localhost:11435 (default port)
Ollama running locally on http://localhost:11434 (default port)
Default configuration (no authentication unless specified)

💡 Model Names: Examples use "mistral" (Mistral 7B) as the default - works on most business hardware (8GB VRAM or CPU). See our Top 5 Model Recommendations for other options. Check available models with: curl http://localhost:11435/v1/models
⚠️ Production Note: Always specify models explicitly in production code. Examples use "mistral" for demonstration purposes only.

1. Non-Streaming Request (curl)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "mistral",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! I'm doing well, thank you for asking..."
    },
    "finish_reason": "stop"
  }]
}

2. Streaming Request (curl)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "Tell me a short story"}
    ],
    "stream": true
  }'

Response (Server-Sent Events):

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"mistral","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"mistral","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}

data: [DONE]

3. Using OpenAI Python SDK

Point the OpenAI Python SDK to LlamaGate using a custom base_url:

from openai import OpenAI

# Configure client to use LlamaGate instead of OpenAI
client = OpenAI(
    base_url="http://localhost:11435/v1",  # LlamaGate endpoint
    api_key="not-needed"  # Optional: only needed if API_KEY is set in LlamaGate
)

# Use it exactly like the OpenAI API
response = client.chat.completions.create(
    model="mistral",  # Use any model available in your local Ollama
    messages=[
        {"role": "user", "content": "Hello! How are you?"}
    ]
)

print(response.choices[0].message.content)

Streaming with Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="not-needed"
)

stream = client.chat.completions.create(
    model="mistral",
    messages=[
        {"role": "user", "content": "Count to 5"}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

4. Using OpenAI Node.js SDK

Point the OpenAI Node.js SDK to LlamaGate using a custom baseURL:

import OpenAI from 'openai';

// Configure client to use LlamaGate instead of OpenAI
const client = new OpenAI({
  baseURL: 'http://localhost:11435/v1',  // LlamaGate endpoint
  apiKey: 'not-needed'  // Optional: only needed if API_KEY is set in LlamaGate
});

// Use it exactly like the OpenAI API
const response = await client.chat.completions.create({
  model: 'mistral',  // Use any model available in your local Ollama
  messages: [
    { role: 'user', content: 'Hello! How are you?' }
  ]
});

console.log(response.choices[0].message.content);

Streaming with Node.js SDK:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11435/v1',
  apiKey: 'not-needed'
});

const stream = await client.chat.completions.create({
  model: 'mistral',
  messages: [
    { role: 'user', content: 'Count to 5' }
  ],
  stream: true
});

for await (const chunk of stream) {
  if (chunk.choices[0]?.delta?.content) {
    process.stdout.write(chunk.choices[0].delta.content);
  }
}

5. Authentication Example (if enabled)

If you've set API_KEY in your LlamaGate configuration, include it in requests:

curl with authentication:

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-llamagate" \
  -d '{
    "model": "mistral",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Python SDK with authentication:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="sk-llamagate"  # Your API_KEY from LlamaGate config
)

response = client.chat.completions.create(
    model="mistral",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Node.js SDK with authentication:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11435/v1',
  apiKey: 'sk-llamagate'  // Your API_KEY from LlamaGate config
});

const response = await client.chat.completions.create({
  model: 'mistral',
  messages: [
    { role: 'user', content: 'Hello!' }
  ]
});

console.log(response.choices[0].message.content);

Note: Authentication is optional. If API_KEY is not set in LlamaGate, you can omit the api_key parameter or use any value.

📚 For SDK usage, point your OpenAI client at your LlamaGate URL; see API and Testing.

Health Check

curl http://localhost:11435/health

List Models

curl http://localhost:11435/v1/models

Using with LangChain

from langchain_openai import ChatOpenAI

# Use ChatOpenAI with LlamaGate endpoint
llm = ChatOpenAI(
    model="mistral",  # Default: Mistral 7B (CPU-only or 8GB VRAM)
    base_url="http://localhost:11435/v1",  # Use base_url instead of openai_api_base
    api_key="not-needed"  # Optional: only if API_KEY is set in LlamaGate
)

response = llm.invoke("Hello, how are you?")
print(response.content)

Note: This example uses langchain_openai (LangChain v0.1+). For older versions, use from langchain.chat_models import ChatOpenAI and openai_api_base parameter.

Error Handling

Handle errors gracefully in your applications:

from openai import OpenAI
from openai import APIError

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="not-needed"
)

try:
    response = client.chat.completions.create(
        model="mistral",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)
except APIError as e:
    print(f"API Error: {e.status_code} - {e.message}")
    if e.status_code == 401:
        print("Authentication failed. Check your API key.")
    elif e.status_code == 429:
        print("Rate limit exceeded. Please retry later.")
except Exception as e:
    print(f"Unexpected error: {e}")

Tool/Function Calling with MCP

Use MCP tools for extended capabilities:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="not-needed"
)

# Request with tool calling enabled
response = client.chat.completions.create(
    model="mistral",
    messages=[
        {"role": "user", "content": "What files are in the /tmp directory?"}
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "mcp.filesystem.list_files",
            "description": "List files in a directory"
        }
    }],
    tool_choice="auto"  # Let the model decide when to use tools
)

# Handle the response
message = response.choices[0].message
print(f"Content: {message.content}")

# Check for tool calls
if message.tool_calls:
    for tool_call in message.tool_calls:
        print(f"Tool: {tool_call.function.name}")
        print(f"Arguments: {tool_call.function.arguments}")

Note: MCP must be enabled and configured in LlamaGate for tool calling to work. See MCP Quick Start for setup instructions.

Environment Variable Configuration

Configure the OpenAI client using environment variables:

import os
from openai import OpenAI

# Set environment variables (or use .env file)
os.environ["OPENAI_BASE_URL"] = "http://localhost:11435/v1"
os.environ["OPENAI_API_KEY"] = os.getenv("LLAMAGATE_API_KEY", "not-needed")

# Client automatically uses environment variables
client = OpenAI()

response = client.chat.completions.create(
    model="mistral",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Environment Variables:

OPENAI_BASE_URL - LlamaGate endpoint URL
OPENAI_API_KEY - API key (if authentication is enabled)

Production Patterns

For production use, add retries, timeouts, and connection pooling:

from openai import OpenAI
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential

# Configure HTTP client with timeouts and connection pooling
client = OpenAI(
    base_url="http://localhost:11435/v1",
    api_key="not-needed",
    http_client=httpx.Client(
        timeout=httpx.Timeout(30.0, connect=5.0),
        limits=httpx.Limits(max_keepalive_connections=5, max_connections=10)
    )
)

# Add retry logic for transient failures
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def chat_with_retry(messages):
    return client.chat.completions.create(
        model="mistral",
        messages=messages
    )

# Use the retry wrapper
try:
    response = chat_with_retry([
        {"role": "user", "content": "Hello!"}
    ])
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Failed after retries: {e}")

Production Best Practices:

Use connection pooling for better performance
Set appropriate timeouts (30s default, 5s connect)
Implement retry logic for transient failures
Monitor rate limits and adjust request patterns
Use structured logging for debugging

📚 See more: API reference and Testing guide for request examples.

API Endpoints

`POST /v1/chat/completions`

OpenAI-compatible chat completions endpoint. Forwards requests to Ollama /api/chat.

Request Body:

{
  "model": "mistral",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false,
  "temperature": 0.7
}

`GET /v1/models`

Lists available Ollama models. Forwards requests to Ollama /api/tags and converts to OpenAI format.

`GET /health`

Health check endpoint that verifies both server and Ollama connectivity.

Response (healthy):

{
  "status": "healthy",
  "ollama": "connected",
  "ollama_host": "http://localhost:11434"
}

Response (unhealthy):

{
  "status": "unhealthy",
  "error": "Ollama unreachable",
  "ollama_host": "http://localhost:11434"
}

Returns 200 OK when healthy, 503 Service Unavailable when Ollama is unreachable.

Authentication

When API_KEY is configured, all API endpoints (except /health) require authentication.

Supported Authentication Headers

LlamaGate supports two authentication header formats (both are case-insensitive):

1. X-API-Key Header (Recommended)

curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/models

The header name is case-insensitive. All of the following are accepted:

X-API-Key
x-api-key
X-Api-Key
Any other case variation

2. Authorization Bearer Header (Alternative)

curl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/models

The "Bearer" scheme is case-insensitive. All of the following are accepted:

Authorization: Bearer sk-llamagate
Authorization: bearer sk-llamagate
Authorization: BEARER sk-llamagate

Header Priority: The X-API-Key header is checked first. If not present, Authorization: Bearer is checked.

Authentication Behavior

When Authentication is Required: All endpoints except /health require authentication when API_KEY is configured.
When Authentication is Missing: Requests without a valid authentication header return 401 Unauthorized with an OpenAI-compatible error response.
When Authentication is Invalid: Requests with an invalid or incorrect API key return 401 Unauthorized with an OpenAI-compatible error response.

Error Response Format

Authentication errors return HTTP 401 Unauthorized with a JSON response in OpenAI-compatible format:

{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "request_id": "req-123456"
  }
}

Security

API keys are never logged: Authentication failures are logged with a generic message ("Authentication failed") but the provided API key or bearer token is never included in logs.
Constant-time comparison: API key validation uses constant-time comparison to prevent timing attacks.
Health endpoint bypass: The /health endpoint does not require authentication and can be used for monitoring and load balancer health checks.

If API_KEY is not set, authentication is disabled and all requests are allowed.

Caching

LlamaGate caches responses for non-streaming requests. The cache key is based on:

Model name
Messages content

Identical requests (same model + same messages) will return cached responses, reducing load on Ollama.

Rate Limiting

Rate limiting is implemented using a leaky bucket algorithm. The default limit is 50 requests per second, configurable via RATE_LIMIT_RPS.

When the limit is exceeded, requests receive a 429 Too Many Requests response with:

HTTP Status: 429 Too Many Requests
Retry-After Header: Number of seconds to wait before retrying (e.g., Retry-After: 1)
Response Body: OpenAI-compatible JSON error format

Rate Limit Response Format

Status: 429 Too Many Requests

Headers:

Retry-After: 1

Response Body:

{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "request_id": "req-123456"
  }
}

Rate-limited requests are logged with structured fields including request ID, IP address, path, retry time, and limiter decision.

Request ID and Logging

LlamaGate implements consistent request correlation and secure logging across all components.

Request ID Generation

Every inbound request receives a unique request ID:

If X-Request-ID header is provided: LlamaGate uses the provided request ID
If no header is provided: LlamaGate generates a UUID v4 request ID

The request ID is:

Included in the X-Request-ID response header
Propagated to all downstream components:
- Ollama upstream calls (via X-Request-ID header)
- Tool/function calling (via context)
- MCP tool calls (via context and HTTP headers)
Included in all structured log entries for the request

Sensitive Data Redaction

LlamaGate automatically redacts sensitive values from logs to prevent secret leakage:

Redacted Values:

API keys (X-API-Key header values)
Bearer tokens (Authorization: Bearer header values)
Any other secrets in headers, environment variables, or configuration

What is Logged:

Request method, path, status code, latency
Request ID for correlation
Client IP address
Error messages (without sensitive data)
Authentication failures (generic message only)

What is NOT Logged:

API key values
Bearer token values
Authorization header contents
Any header values that contain secrets

Example Log Entry:

{
  "level": "info",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "method": "POST",
  "path": "/v1/chat/completions",
  "status": 200,
  "latency": "1.234s",
  "ip": "192.168.1.100",
  "time": "2026-01-12T10:00:00Z",
  "message": "HTTP request"
}

Notice that the API key is not present in the log, even though it was sent in the request headers.

Graceful Shutdown

LlamaGate implements graceful shutdown to ensure clean termination without dropping in-flight requests.

Shutdown Behavior

When LlamaGate receives SIGINT or SIGTERM:

Stop accepting new requests: The server immediately stops accepting new connections
Allow in-flight requests to complete: Active requests are allowed to finish up to a configurable timeout
Close downstream connections cleanly:
- Ollama HTTP client connections are closed
- MCP server connections are closed
- Cache cleanup goroutines are stopped
Handle streaming responses safely: Streaming responses check for context cancellation and stop gracefully when the server shuts down

Configuration

The shutdown timeout is configurable via the SHUTDOWN_TIMEOUT environment variable:

# Default: 30 seconds
SHUTDOWN_TIMEOUT=30s

# Examples:
SHUTDOWN_TIMEOUT=10s   # 10 seconds
SHUTDOWN_TIMEOUT=1m     # 1 minute
SHUTDOWN_TIMEOUT=2m30s  # 2 minutes 30 seconds

Timeout Behavior:

If all in-flight requests complete before the timeout: Clean shutdown
If the timeout is reached: Remaining requests are terminated, and the server exits

Shutdown Process

Signal received (SIGINT or SIGTERM)
Server stops accepting new requests
Cache cleanup goroutines stopped
Downstream connections closed (Ollama, MCP)
In-flight requests allowed to complete (up to timeout)
Server exits gracefully

Note: Streaming responses automatically detect server shutdown via context cancellation and stop gracefully, preventing abrupt connection resets.

HTTPS/SSL Support

LlamaGate supports native HTTPS/TLS encryption. To enable HTTPS:

Set TLS configuration in .env:

TLS_ENABLED=true
TLS_CERT_FILE=/path/to/certificate.crt
TLS_KEY_FILE=/path/to/private.key

Or use YAML config:

tls_enabled: true
tls_cert_file: /path/to/certificate.crt
tls_key_file: /path/to/private.key

Generate self-signed certificate (for testing):

openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

For production with Let's Encrypt, use a reverse proxy (nginx, Caddy) for automatic certificate management and renewal.

Note: When TLS_ENABLED=true, the server will use HTTPS. Make sure to use https:// in your client URLs.

Logging

LlamaGate uses structured JSON logging with Zerolog. Each request is assigned a unique request ID.

Log Levels:

INFO: Default level, logs all requests and important events
DEBUG: Enabled with DEBUG=true, includes detailed debugging information

Log Output:

By default, logs are written to stdout (console)
To also write logs to a file, set the LOG_FILE environment variable:
```
LOG_FILE=llamagate.log
```
When LOG_FILE is set, logs are written to both console and file
The log file is created automatically if it doesn't exist, and new logs are appended to it
Note: The log file is not automatically rotated. For production use, consider using a log rotation tool or process manager

Example log entry:

{
  "level": "info",
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "method": "POST",
  "path": "/v1/chat/completions",
  "status": 200,
  "latency": "1.234s",
  "ip": "127.0.0.1",
  "time": 1703123456
}

Testing

Application Testing

See docs/TESTING.md for a comprehensive testing guide, or use the provided test script:

Windows:

scripts\windows\test.cmd

Unix/Linux/macOS:

./scripts/unix/test.sh

This will test all endpoints, caching, authentication, and more.

Installer Testing

To validate installer scripts before deployment, see docs/INSTALLER_TESTING.md:

# Test all installers
.\tests\installer\test-all-installers.ps1

# Test Windows installer only
.\tests\installer\test-installer-windows.ps1

# Test Unix installer (requires bash/WSL)
chmod +x tests/installer/test-installer-unix.sh
./tests/installer/test-installer-unix.sh

Development

Building

Using the installer (recommended):

install\windows\install.cmd

Manual build:

go build -o llamagate ./cmd/llamagate

Or use the build script:

scripts\windows\build.cmd

Running Tests

go test ./...

Running Locally

# Make sure Ollama is running
ollama serve

# In another terminal, run LlamaGate
go run ./cmd/llamagate

MCP Client Support

LlamaGate includes support for the Model Context Protocol (MCP) as a client. This allows you to:

Connect to MCP servers and discover their tools, resources, and prompts
Expose tools to chat completion requests
Execute tool calls in multi-round loops
Reference MCP resources directly in chat messages using mcp:// URIs
Enforce security with allow/deny lists
Access MCP management via HTTP API

See MCP Documentation for full details and MCP Quick Start for a getting started guide.

MCP URI Scheme

You can reference MCP resources directly in chat completion messages:

curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-API-Key: sk-llamagate" \
  -d '{
    "model": "mistral",
    "messages": [{
      "role": "user",
      "content": "Summarize mcp://filesystem/file:///docs/readme.txt"
    }]
  }'

LlamaGate will automatically fetch the resource content and inject it as context.

Project Scope & Paid Tier Boundary

LlamaGate Core is Open Source

This repository contains the core LlamaGate functionality:

OpenAI-compatible API gateway
MCP client support
Caching, authentication, rate limiting
Basic tool execution

Advanced Features (Separate Modules)

The following features are not included in this open-source core and are available as separate modules:

Advanced workflow automation packs
Enterprise connectors and integrations
Cloud fallback capabilities
Compatibility validation suites
Premium support and consulting

These advanced features are maintained separately and are not part of this repository.

Troubleshooting

Git clone fails in PowerShell with "Stop" error action

Issue: When using PowerShell with $ErrorActionPreference = "Stop", git clone may fail even though the clone succeeds. This happens because Git writes progress messages to stderr, which PowerShell treats as errors.

Symptoms:

Script fails with error even though git clone completes successfully
Error message appears but repository is actually cloned

Solution (PowerShell):

Option 1: Temporarily change error action (Recommended)

# Save current setting
$oldErrorAction = $ErrorActionPreference

# Temporarily allow errors during git clone
$ErrorActionPreference = "Continue"

# Clone the repository (replace with your repo URL if you host elsewhere)
git clone <your-llamagate-repo-url>.git

# Restore original setting
$ErrorActionPreference = $oldErrorAction

Option 2: Check exit code instead

# Clone and check exit code
git clone <your-llamagate-repo-url>.git 2>&1 | Out-Null
if ($LASTEXITCODE -ne 0) {
    Write-Error "Git clone failed with exit code $LASTEXITCODE"
    exit 1
}

Option 3: Redirect stderr

# Redirect stderr to null (suppress progress messages)
git clone <your-llamagate-repo-url>.git 2>$null

Note: This is a known Git behavior - progress messages go to stderr even on success. The installers handle this automatically, but manual git clone commands in PowerShell scripts need this workaround.

Installer fails with 404 error

If the binary installer fails because binaries aren't available yet:

Use the source installer instead (see Installation section)
Or wait for binaries to be published to releases

"Permission denied" (Linux/macOS)

Make the binary executable:

chmod +x llamagate

"Command not found"

Make sure you're in the directory where the binary was installed
Or add the directory to your PATH
Or use the full path: /path/to/llamagate

Need a different architecture?

If you need a different architecture than what's available:

Build from source (see Installation section)
The installers automatically detect your platform

Known Limitations

Supported Platforms

✅ Windows (amd64)
✅ Linux (amd64, arm64)
✅ macOS (amd64, arm64)

Model Backends

✅ Ollama - Fully supported (primary backend)
❌ Direct OpenAI API - Not included (use OpenAI SDK directly)
❌ Other LLM providers - Not included in core

MCP Implementation Status

✅ stdio transport - Fully implemented
⚠️ SSE transport - Interface prepared, implementation pending
✅ Tool execution - Multi-round loops supported
✅ Security guardrails - Allow/deny lists, timeouts, size limits

Other Limitations

HTTPS/TLS - Native HTTPS support available via TLS_ENABLED, TLS_CERT_FILE, and TLS_KEY_FILE configuration. For production with Let's Encrypt, a reverse proxy (nginx, Caddy) is still recommended for automatic certificate management.
In-memory cache only - Cache is lost on restart (persistent cache not included in core)
Global rate limiting - Per-IP rate limiting not included in core
No cloud fallback - Core is designed for local Ollama instances only
Single binary deployment - No built-in clustering or load balancing
Single instance per machine - Only one LlamaGate instance should run per machine. Multiple applications can connect to the same instance. If you try to start a second instance, you'll get a clear error message indicating the port is already in use.

What's Not Included

Database persistence (cache, logs, etc.)
Multi-tenant isolation
Advanced monitoring/observability dashboards
Enterprise SSO/authentication providers
High-availability/clustering features

Project Structure

.
├── cmd/
│   └── llamagate/
│       └── main.go          # Entry point
├── internal/
│   ├── config/
│   │   └── config.go        # Configuration management
│   ├── logger/
│   │   └── logger.go        # Logger initialization
│   ├── cache/
│   │   └── cache.go         # In-memory cache
│   ├── mcpclient/
│   │   ├── client.go        # MCP client implementation
│   │   ├── stdio.go         # stdio transport
│   │   ├── sse.go           # SSE transport (stub)
│   │   ├── types.go         # MCP protocol types
│   │   └── errors.go        # MCP errors
│   ├── tools/
│   │   ├── manager.go       # Tool registry and management
│   │   ├── mapper.go        # MCP to OpenAI format conversion
│   │   ├── guardrails.go    # Security and limits
│   │   └── types.go         # Tool types
│   ├── middleware/
│   │   ├── auth.go          # Authentication middleware
│   │   ├── rate_limit.go    # Rate limiting middleware
│   │   └── request_id.go    # Request ID middleware
│   └── proxy/
│       ├── proxy.go          # Proxy handlers
│       └── tool_loop.go     # Tool execution loop
├── docs/
│   ├── MCP.md               # MCP documentation
│   └── MCP_QUICKSTART.md    # MCP quick start guide
├── mcp-config.example.yaml  # MCP configuration example
├── Dockerfile
├── go.mod
├── go.sum
└── README.md

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
.github		.github
cmd/llamagate		cmd/llamagate
docs		docs
install		install
internal		internal
scripts		scripts
tests/installer		tests/installer
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
mcp-config.example.yaml		mcp-config.example.yaml
mcp-demo-config.yaml		mcp-demo-config.yaml

Folders and files

Latest commit

History

Repository files navigation

LlamaGate

Features

Documentation

Example Repositories

Installation

⚡ Method 1: One-Line Command (Recommended)

🔧 Method 2: Run Installer Directly (If You've Cloned the Repo)

🔨 Method 3: Build from Source (For Developers)

🐳 Docker

Docker Compose (LlamaGate + Ollama)

Development Setup

Standardized Directory Structure

One-Command Setup

When to Use

Windows Quick Start

Configuration

Migration Notes (Phase 1: Extensions/Modules Removed)

Using .env File (Recommended)

Authentication

X-API-Key Header (Recommended)

Authorization Bearer Header (Alternative)

Example (Linux/Mac)

Example (Windows)

Supported Authentication Headers

1. X-API-Key Header (Recommended)

2. Authorization Bearer Header (Alternative)

Authentication Behavior

Error Response Format

Security

Usage

Usage Examples

1. Non-Streaming Request (curl)

2. Streaming Request (curl)

3. Using OpenAI Python SDK

4. Using OpenAI Node.js SDK

5. Authentication Example (if enabled)

Health Check

List Models

Using with LangChain

Error Handling

Tool/Function Calling with MCP

Environment Variable Configuration

Production Patterns

API Endpoints

POST /v1/chat/completions

GET /v1/models

GET /health

Authentication

Supported Authentication Headers

1. X-API-Key Header (Recommended)

2. Authorization Bearer Header (Alternative)

Authentication Behavior

Error Response Format

Security

Caching

Rate Limiting

Rate Limit Response Format

Request ID and Logging

Request ID Generation

Sensitive Data Redaction

Graceful Shutdown

Shutdown Behavior

Configuration

Shutdown Process

HTTPS/SSL Support

Logging

Testing

Application Testing

Installer Testing

Development

Building

Running Tests

Running Locally

MCP Client Support

MCP URI Scheme

Project Scope & Paid Tier Boundary

`POST /v1/chat/completions`

`GET /v1/models`

`GET /health`

Packages