LlamaGate is a lean, production-ready, OpenAI-compatible API gateway for local LLMs (Ollama). It lets you point existing OpenAI SDKs (Python, Node, etc.) at local models as a drop-in replacement, with streaming, tool/function calling (via MCP), authentication, rate limiting, caching, and structured logging.
🚀 New to LlamaGate? Quick Start Guide — Get running in 2 minutes.
- ✅ OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
- ✅ Streaming Chat Completions: Full support for Server-Sent Events (SSE) streaming
- ✅ Tool / Function Calling: Execute MCP tools in multi-round loops with safety limits (round limits, call limits, timeouts, size limits, allow/deny lists)
- ✅ Authentication: Optional API key authentication via headers
- ✅ Rate Limiting: Configurable rate limiting using leaky bucket algorithm
- ✅ Request Correlation & Structured Logging: JSON logging with request IDs using Zerolog
- ✅ Caching: In-memory caching for identical prompts to reduce Ollama load
- ✅ MCP Client Support: Connect to MCP servers and expose their tools to models (MCP Guide | Quick Start)
Note: Agentic modules and the extension system (workflows, middleware, observability) were removed in Phase 1. LlamaGate is now a core-only OpenAI-compatible gateway. See Core Contract and Phase 1 Removal.
- 📖 Quick Start Guide - Get running in 2 minutes
- 📚 Full Documentation Index - Browse all documentation
- 🔧 MCP Integration - Model Context Protocol guide
- 🚀 MCP Quick Start - Get started with MCP in 5 minutes
- 🎯 MCP Demo Guide - Full demo with multiple servers
- 🌐 MCP HTTP API - Complete API reference for MCP management
- 📋 Core Contract - Core endpoints and config (post–Phase 1)
- 🧪 Testing Guide - Testing your setup
- 📦 Installation Guide - Detailed installation instructions
- ✅ Manual Acceptance Test - Comprehensive acceptance test checklist for human verification
- 🎯 MCP Examples - See MCP Demo Guide and MCP Quick Start.
Copy and paste one command - it downloads the installer and runs it automatically!
This method downloads a pre-built binary (no Go required):
Windows (PowerShell):
If you have the repo locally, run install\windows\install-binary.ps1. Otherwise build from source (see Method 3).
Unix/Linux/macOS:
If you have the repo locally, run install/unix/install-binary.sh. Otherwise build from source (see Method 3).
What happens:
- Installer (or you) downloads the pre-built binary from your release host, or you build from source
- Runs the installer / places the binary
- Installer downloads the pre-built binary for your platform
- Sets up the executable and automatically creates
.envconfiguration file if missing
That's it! You're ready to run LlamaGate.
If you've already cloned the repository, you can run the installer directly:
Binary installer (downloads pre-built binary):
Windows:
install\windows\install-binary.cmdUnix/Linux/macOS:
chmod +x install/unix/install-binary.sh
./install/unix/install-binary.shSource installer (builds from source):
Windows:
install\windows\install.cmdUnix/Linux/macOS:
chmod +x install/unix/install.sh
./install/unix/install.shThe source installer will:
- ✅ Check for Go and install it if needed
- ✅ Check for Ollama and guide you to install it
- ✅ Install all Go dependencies
- ✅ Build the LlamaGate binary from source
- ✅ Automatically create
.envconfiguration file (from.env.exampleor with defaults)
If you need to build from source, you have two options:
Option A: One-Line Command (Downloads Source Installer)
Windows (PowerShell):
# From repo root: .\install\windows\install.ps1Unix/Linux/macOS:
# From repo root: ./install/unix/install.shThis downloads and runs the source installer, which handles Go installation and builds from source.
Option B: Manual Build (If You Have Go Installed)
A full build of all packages (go build ./...) must succeed so that downstream tooling (CI, E2E, forked automation) that builds LlamaGate from source does not break. Then build the main binary:
Unix/Linux/macOS:
# Clone the repository (replace with your repo URL if you host elsewhere)
git clone <your-llamagate-repo-url>.git
cd LlamaGate
# Build all packages (required for CI/E2E/build-from-source integrators)
go build ./...
# Build main binary
go build -o llamagate ./cmd/llamagate
# Or install to $GOPATH/bin
go install ./cmd/llamagateWindows (PowerShell):
# Clone the repository (handle stderr output)
$ErrorActionPreference = "Continue" # Git writes progress to stderr
git clone <your-llamagate-repo-url>.git
$ErrorActionPreference = "Stop" # Restore if needed
cd LlamaGate
# Build all packages (required for CI/E2E/build-from-source integrators)
go build ./...
# Build main binary
go build -o llamagate.exe ./cmd/llamagate
# Or install to $GOPATH/bin
go install ./cmd/llamagateNote: Git writes progress messages to stderr even on success. In PowerShell with $ErrorActionPreference = "Stop", this can cause failures. See Troubleshooting section below for details.
Build and run LlamaGate in a container. The image does not include Ollama; set OLLAMA_HOST to your Ollama instance (host, another container, or service URL).
Build:
docker build -t llamagate .Run (Ollama on host):
# Linux
docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 llamagate
# Windows/macOS (host.docker.internal works)
docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 llamagateRun with optional API key:
docker run -p 11435:11435 -e OLLAMA_HOST=http://host.docker.internal:11434 -e API_KEY=sk-your-key llamagateKey env vars: OLLAMA_HOST (required if Ollama not on localhost), PORT (default 11435), API_KEY (optional), MCP_ENABLED (optional). See Configuration for details and .env.example for all supported env vars.
Run with Ollama in one command: docker compose up — see below.
From the repo root, run both LlamaGate and Ollama with one command:
docker compose up -d- LlamaGate: http://localhost:11435
- Ollama: http://localhost:11434 (pull models with
ollama pull llama2or use the UI)
To rebuild LlamaGate after code changes: docker compose up -d --build. See docker-compose.yml for env overrides (e.g. API_KEY).
Note: This section enhances LlamaGate's existing installation methods by adding a developer-focused one-command workflow. It complements (does not replace) the existing binary installer, source installer, and manual build methods documented above.
For developers integrating LlamaGate into projects, the one-command setup process provides a single command that automates the complete development workflow:
- ✅ Validates environment (Go, ports) - Catches issues before build
- ✅ LlamaGate auto-starts Ollama if not running - Built into LlamaGate application
- ✅ Auto-clones LlamaGate if missing (standardized sibling directory)
- ✅ Smart build - Only rebuilds if source is newer than binary
- ✅ Auto-starts LlamaGate - No manual start needed
- ✅ Verifies it's running - Health check confirmation
This enhances existing methods by:
- Adding developer workflow automation (complements installation methods)
- Providing standardized directory structure guidance
- Enabling smart rebuilds (only when needed)
- Automating the complete setup-to-running workflow
For integration projects, use this recommended structure for consistency:
YourProjectParent/
├── LlamaGate/ # ← Clone LlamaGate here (sibling directory)
└── YourProject/ # ← Your application
Why sibling directory?
- Consistent across all integration projects
- Easy to reference with relative paths
- Works well with version control
- Standard practice for integration workflows
Windows PowerShell:
Save this script to your project (e.g., scripts/setup-llamagate-dev.ps1):
# One-Command LlamaGate Development Setup
# Standardized process: Validate → Clone (if needed) → Build → Start → Verify
# Based on community best practices for LlamaGate integration
param(
[string]$LlamaGatePath = "..\LlamaGate",
[int]$LlamaGatePort = 11435,
[switch]$SkipClone,
[switch]$SkipBuild
)
$ErrorActionPreference = "Stop"
Write-Host ""
Write-Host "========================================" -ForegroundColor Cyan
Write-Host "LlamaGate One-Command Development Setup" -ForegroundColor Cyan
Write-Host "========================================" -ForegroundColor Cyan
Write-Host ""
# Step 1: Environment Validation
Write-Host "[1/6] Validating environment..." -ForegroundColor Yellow
# Check Go
try {
$goVersion = go version 2>&1
if ($LASTEXITCODE -ne 0) {
Write-Host " ✗ Go is not installed" -ForegroundColor Red
Write-Host " Please install Go 1.19+ from: https://go.dev/dl/" -ForegroundColor Yellow
exit 1
}
Write-Host " ✓ Go: $goVersion" -ForegroundColor Green
} catch {
Write-Host " ✗ Go is not installed" -ForegroundColor Red
Write-Host " Please install Go 1.19+ from: https://go.dev/dl/" -ForegroundColor Yellow
exit 1
}
# Note: LlamaGate will automatically start Ollama if not running
Write-Host " Note: LlamaGate will auto-start Ollama if needed" -ForegroundColor Gray
# Step 2: Check if LlamaGate is already running
Write-Host "[2/6] Checking if LlamaGate is already running..." -ForegroundColor Yellow
try {
$response = Invoke-WebRequest -Uri "http://localhost:$LlamaGatePort/health" -Method GET -TimeoutSec 2 -ErrorAction SilentlyContinue
if ($response.StatusCode -eq 200) {
Write-Host " ✓ LlamaGate is already running" -ForegroundColor Green
Write-Host ""
Write-Host "LlamaGate is ready!" -ForegroundColor Green
Write-Host "URL: http://localhost:$LlamaGatePort" -ForegroundColor Cyan
exit 0
}
} catch {
Write-Host " LlamaGate is not running" -ForegroundColor Yellow
}
# Step 3: Find or Clone LlamaGate (Standardized: Sibling Directory)
Write-Host "[3/6] Locating LlamaGate source..." -ForegroundColor Yellow
# Standardized approach: Primary is sibling directory
$siblingPath = Resolve-Path "..\LlamaGate" -ErrorAction SilentlyContinue
$foundPath = $null
if ($siblingPath -and (Test-Path (Join-Path $siblingPath "cmd\llamagate"))) {
$foundPath = $siblingPath
Write-Host " ✓ Found at sibling directory: $foundPath" -ForegroundColor Green
} else {
# Check environment variable override
$envPath = $env:LLAMAGATE_PATH
if ($envPath -and (Test-Path (Join-Path $envPath "cmd\llamagate"))) {
$foundPath = Resolve-Path $envPath
Write-Host " ✓ Found via LLAMAGATE_PATH: $foundPath" -ForegroundColor Green
} else {
if ($SkipClone) {
Write-Host " ✗ LlamaGate source not found" -ForegroundColor Red
Write-Host " Expected location: $(Resolve-Path ".." -ErrorAction SilentlyContinue)\LlamaGate" -ForegroundColor Yellow
Write-Host " Or set LLAMAGATE_PATH environment variable" -ForegroundColor Yellow
exit 1
} else {
Write-Host " LlamaGate source not found" -ForegroundColor Yellow
Write-Host " Cloning LlamaGate as sibling directory..." -ForegroundColor Yellow
# Clone as sibling directory (standardized)
$parentDir = Resolve-Path ".." -ErrorAction Stop
$clonePath = Join-Path $parentDir "LlamaGate"
if (Test-Path $clonePath) {
Write-Host " ✗ Directory already exists: $clonePath" -ForegroundColor Red
Write-Host " Please remove it or use -SkipClone to skip cloning" -ForegroundColor Yellow
exit 1
}
Push-Location $parentDir
try {
Write-Host " Cloning from GitHub..." -ForegroundColor Gray
git clone <your-llamagate-repo-url>.git
if ($LASTEXITCODE -ne 0) {
Write-Host " ✗ Clone failed" -ForegroundColor Red
Pop-Location
exit 1
}
$foundPath = Resolve-Path "LlamaGate"
Write-Host " ✓ Cloned successfully to: $foundPath" -ForegroundColor Green
} catch {
Write-Host " ✗ Clone failed: $_" -ForegroundColor Red
Pop-Location
exit 1
} finally {
Pop-Location
}
}
}
}
# Step 4: Check Port Availability
Write-Host "[4/6] Checking port availability..." -ForegroundColor Yellow
try {
$tcpClient = New-Object System.Net.Sockets.TcpClient
$asyncResult = $tcpClient.BeginConnect("localhost", $LlamaGatePort, $null, $null)
$wait = $asyncResult.AsyncWaitHandle.WaitOne(500, $false)
if ($wait) {
$tcpClient.EndConnect($asyncResult)
$tcpClient.Close()
Write-Host " ✗ Port $LlamaGatePort is already in use" -ForegroundColor Red
Write-Host " Please stop the process using this port or use a different port" -ForegroundColor Yellow
exit 1
}
} catch {
# Port is free (connection failed is expected)
}
Write-Host " ✓ Port $LlamaGatePort is available" -ForegroundColor Green
# Step 5: Build LlamaGate
if (-not $SkipBuild) {
Write-Host "[5/6] Building LlamaGate from source..." -ForegroundColor Yellow
Push-Location $foundPath
try {
# Check if binary exists and is newer than source
$binaryPath = Join-Path $foundPath "llamagate.exe"
$needsBuild = $true
if (Test-Path $binaryPath) {
$binaryTime = (Get-Item $binaryPath).LastWriteTime
$sourceTime = (Get-ChildItem -Path (Join-Path $foundPath "cmd\llamagate") -Recurse -File |
Measure-Object -Property LastWriteTime -Maximum).Maximum
if ($binaryTime -gt $sourceTime) {
Write-Host " Binary is up to date, skipping build" -ForegroundColor Gray
$needsBuild = $false
}
}
if ($needsBuild) {
Write-Host " Building (this may take a few minutes)..." -ForegroundColor Gray
$buildOutput = go build -o llamagate.exe ./cmd/llamagate 2>&1
if ($LASTEXITCODE -ne 0) {
Write-Host " ✗ Build failed" -ForegroundColor Red
Write-Host $buildOutput -ForegroundColor Red
Pop-Location
exit 1
}
if (-not (Test-Path "llamagate.exe")) {
Write-Host " ✗ Binary not found after build" -ForegroundColor Red
Pop-Location
exit 1
}
Write-Host " ✓ Build successful" -ForegroundColor Green
}
} catch {
Write-Host " ✗ Build error: $_" -ForegroundColor Red
Pop-Location
exit 1
} finally {
Pop-Location
}
} else {
Write-Host "[5/6] Skipping build (requested)" -ForegroundColor Yellow
}
# Step 6: Start LlamaGate
Write-Host "[6/6] Starting LlamaGate..." -ForegroundColor Yellow
Push-Location $foundPath
try {
# Ensure .env exists with default configuration
$envFile = Join-Path $foundPath ".env"
$envExampleFile = Join-Path $foundPath ".env.example"
if (-not (Test-Path $envFile)) {
Write-Host " Creating .env with default configuration..." -ForegroundColor Gray
if (Test-Path $envExampleFile) {
# Copy from .env.example
Copy-Item $envExampleFile $envFile
Write-Host " ✓ Created .env from .env.example" -ForegroundColor Green
} else {
# Create with default values
$defaultEnv = @"
# LlamaGate Configuration
# Generated by one-command setup with default values
# Ollama server URL
OLLAMA_HOST=http://localhost:11434
# API key for authentication (leave empty to disable authentication)
API_KEY=
# Rate limit (requests per second)
RATE_LIMIT_RPS=50
# Enable debug logging (true/false)
DEBUG=false
# Server port
PORT=$LlamaGatePort
# Log file path (leave empty to log only to console)
LOG_FILE=
# HTTP client timeout for Ollama requests (e.g., 5m, 30s, 30m - max 30 minutes)
TIMEOUT=5m
"@
$defaultEnv | Out-File -FilePath $envFile -Encoding UTF8 -NoNewline
Write-Host " ✓ Created .env with default configuration" -ForegroundColor Green
}
}
$binaryPath = Join-Path $foundPath "llamagate.exe"
if (-not (Test-Path $binaryPath)) {
Write-Host " ✗ Binary not found: $binaryPath" -ForegroundColor Red
Write-Host " Please build first or remove -SkipBuild flag" -ForegroundColor Yellow
Pop-Location
exit 1
}
Write-Host " Starting process..." -ForegroundColor Gray
Write-Host " Note: Windows may prompt for authorization (firewall/UAC)" -ForegroundColor Cyan
Write-Host " Please approve the prompt if it appears" -ForegroundColor Cyan
$process = Start-Process -FilePath ".\llamagate.exe" -PassThru -WindowStyle Normal
# Wait for LlamaGate to be ready
$maxWait = 30
$waited = 0
$started = $false
Write-Host " Waiting for LlamaGate to be ready..." -ForegroundColor Gray -NoNewline
while ($waited -lt $maxWait) {
Start-Sleep -Seconds 1
$waited++
Write-Host "." -ForegroundColor Gray -NoNewline
try {
$testResponse = Invoke-WebRequest -Uri "http://localhost:$LlamaGatePort/health" -Method GET -TimeoutSec 1 -ErrorAction SilentlyContinue
if ($testResponse.StatusCode -eq 200) {
$started = $true
Write-Host ""
Write-Host " ✓ LlamaGate started successfully" -ForegroundColor Green
Write-Host " PID: $($process.Id)" -ForegroundColor Cyan
break
}
} catch {
continue
}
}
if (-not $started) {
Write-Host ""
Write-Host " ✗ LlamaGate failed to start within $maxWait seconds" -ForegroundColor Red
Stop-Process -Id $process.Id -Force -ErrorAction SilentlyContinue
Pop-Location
exit 1
}
} catch {
Write-Host " ✗ Error starting LlamaGate: $_" -ForegroundColor Red
Pop-Location
exit 1
} finally {
Pop-Location
}
# Success
Write-Host ""
Write-Host "========================================" -ForegroundColor Green
Write-Host "LlamaGate is ready for development!" -ForegroundColor Green
Write-Host "========================================" -ForegroundColor Green
Write-Host ""
Write-Host "PID: $($process.Id)" -ForegroundColor Cyan
Write-Host "URL: http://localhost:$LlamaGatePort" -ForegroundColor Cyan
Write-Host "Health: http://localhost:$LlamaGatePort/health" -ForegroundColor Cyan
Write-Host ""
Write-Host "To stop: Stop-Process -Id $($process.Id)" -ForegroundColor Gray
Write-Host ""
exit 0Then run from your project directory:
.\scripts\setup-llamagate-dev.ps1Unix/macOS:
[Unix script if available, or instructions to adapt]
Use one-command setup when:
- Integrating LlamaGate into your project
- Development workflow automation
- Standardized setup process
- Frequent rebuilds during development
Use installation methods when:
- End-user installation
- Production deployment
- Quick setup without Go
- One-time installation
Note: This one-command process enhances LlamaGate's existing installation methods. For end-user installation, use the binary or source installers documented above.
For Windows users, convenient batch files are provided:
scripts/windows/run.cmd- Run with default settings (no authentication)scripts/windows/run-with-auth.cmd- Run with API key authentication enabledscripts/windows/run-debug.cmd- Run with debug logging enabledscripts/windows/build.cmd- Build the binary (llamagate.exe)
Run from command prompt:
scripts\windows\run.cmdTo customize settings, edit the batch file or set environment variables before running:
set OLLAMA_HOST=http://localhost:11434
set API_KEY=sk-llamagate
set RATE_LIMIT_RPS=20
scripts\windows\run.cmdLlamaGate can be configured via:
.envfile (recommended for development) - Create a.envfile in the project root- Environment variables - Takes precedence over
.envfile values - Default values - Used if neither
.envnor environment variables are set
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
API_KEY |
(empty) | API key for authentication (optional) |
RATE_LIMIT_RPS |
50 |
Requests per second limit |
DEBUG |
false |
Enable debug logging |
PORT |
11435 |
Server port |
LOG_FILE |
(empty) | Path to log file (optional, logs to console if empty) |
TLS_ENABLED |
false |
Enable HTTPS/TLS |
TLS_CERT_FILE |
(empty) | Path to TLS certificate file (required if TLS_ENABLED=true) |
TLS_KEY_FILE |
(empty) | Path to TLS private key file (required if TLS_ENABLED=true) |
TIMEOUT |
5m |
HTTP client timeout for Ollama requests (e.g., 5m, 30s, 30m - max 30 minutes) |
MCP_ENABLED |
false |
Enable MCP client functionality (see MCP docs) |
MCP_MAX_TOOL_ROUNDS |
10 |
Maximum tool execution rounds |
MCP_MAX_TOOL_CALLS_PER_ROUND |
10 |
Maximum tool calls per round |
MCP_DEFAULT_TOOL_TIMEOUT |
30s |
Default timeout for tool execution |
MCP_MAX_TOOL_RESULT_SIZE |
1048576 |
Maximum tool result size in bytes (1MB) |
MCP_ALLOW_TOOLS |
(empty) | Comma-separated glob patterns for allowed tools |
MCP_DENY_TOOLS |
(empty) | Comma-separated glob patterns for denied tools |
Note: MCP server configuration is best done via YAML/JSON config file. See mcp-config.example.yaml and MCP Documentation.
- Removed endpoints: All
/v1/extensionsroutes (GET/PUT/POST list, get, upsert, execute, refresh) and any dynamic extension endpoints. Requests to these paths now return 404. - Config:
EXTENSIONS_UPSERT_ENABLEDhas been removed; remove it from your.envor config if present. - CLI: The
llamagate-clitool (import/export/list/remove/enable/disable extensions and agentic modules, migrate, sync) has been removed. - What remains: Core OpenAI-compatible endpoints (
/v1/chat/completions,/v1/models),/health,/v1/hardware/recommendations, and when MCP is enabled, all/v1/mcp/*endpoints. See Core Contract.
Create a .env file in the project root (copy from .env.example):
# .env (recommended for documentation examples)
OLLAMA_HOST=http://localhost:11434
API_KEY=sk-llamagate
RATE_LIMIT_RPS=50
DEBUG=false
PORT=11435
LOG_FILE=
TIMEOUT=5mNote: Set API_KEY=sk-llamagate to match all documentation examples. Leave it empty (API_KEY=) to disable authentication. For development/testing, you may also want to set:
DEBUG=true(to enable debug logging)LOG_FILE=llamagate.log(to log to file)
The .env file is automatically loaded when the application starts. Environment variables set directly will override .env file values, making it easy to override settings for specific runs.
When API_KEY is configured, all API endpoints (except /health) require authentication.
LlamaGate supports two authentication header formats:
curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/modelscurl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/modelsThe X-API-Key header is checked first. If not present, Authorization: Bearer is checked.
Note: The /health endpoint does not require authentication and can be used for monitoring and load balancer health checks.
Note: These are testing/development values. For production, use defaults or configure via .env file.
export OLLAMA_HOST="http://localhost:11434"
export API_KEY="sk-llamagate"
export RATE_LIMIT_RPS=20
export DEBUG=true
export PORT=11435
llamagateNote: These are testing/development values. For production, use defaults or configure via .env file.
set OLLAMA_HOST=http://localhost:11434
set API_KEY=sk-llamagate
set RATE_LIMIT_RPS=20
set DEBUG=true
set PORT=11435
llamagate.exeOr use the provided batch files (see Windows Quick Start above).
Note: If you use a .env file, you don't need to set environment variables manually - just create .env and run the application!
LlamaGate supports two authentication header formats (both are case-insensitive):
curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/modelsThe header name is case-insensitive. All of the following are accepted:
X-API-Keyx-api-keyX-Api-Key- Any other case variation
curl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/modelsThe "Bearer" scheme is case-insensitive. All of the following are accepted:
Authorization: Bearer sk-llamagateAuthorization: bearer sk-llamagateAuthorization: BEARER sk-llamagate
Header Priority: The X-API-Key header is checked first. If not present, Authorization: Bearer is checked.
- When Authentication is Required: All endpoints except
/healthrequire authentication whenAPI_KEYis configured. - When Authentication is Missing: Requests without a valid authentication header return
401 Unauthorizedwith an OpenAI-compatible error response. - When Authentication is Invalid: Requests with an invalid or incorrect API key return
401 Unauthorizedwith an OpenAI-compatible error response.
Authentication errors return HTTP 401 Unauthorized with a JSON response in OpenAI-compatible format:
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"request_id": "req-123456"
}
}- API keys are never logged: Authentication failures are logged with a generic message ("Authentication failed") but the provided API key or bearer token is never included in logs.
- Constant-time comparison: API key validation uses constant-time comparison to prevent timing attacks.
- Health endpoint bypass: The
/healthendpoint does not require authentication and can be used for monitoring and load balancer health checks.
💡 Migrating from OpenAI? See the Quick Start Guide for step-by-step migration examples.
🔧 Using MCP Tools? See the MCP Quick Start Guide to get started with MCP integration. For complete details, see the MCP Documentation.
🎯 Want to see MCP in action? Check out the MCP Demo QuickStart for a complete example with multiple document processing servers.
📚 Looking for more examples? Check out our example repositories:
- Use the OpenAI SDK with
base_urlpointing at your LlamaGate instance (see API).- More example repositories coming soon: Extension examples, MCP examples
All examples below assume:
- LlamaGate running locally on
http://localhost:11435(default port) - Ollama running locally on
http://localhost:11434(default port) - Default configuration (no authentication unless specified)
💡 Model Names: Examples use
"mistral"(Mistral 7B) as the default - works on most business hardware (8GB VRAM or CPU). See our Top 5 Model Recommendations for other options. Check available models with:curl http://localhost:11435/v1/models
⚠️ Production Note: Always specify models explicitly in production code. Examples use"mistral"for demonstration purposes only.
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'Response:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "mistral",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking..."
},
"finish_reason": "stop"
}]
}curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mistral",
"messages": [
{"role": "user", "content": "Tell me a short story"}
],
"stream": true
}'Response (Server-Sent Events):
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"mistral","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1234567890,"model":"mistral","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}
data: [DONE]
Point the OpenAI Python SDK to LlamaGate using a custom base_url:
from openai import OpenAI
# Configure client to use LlamaGate instead of OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1", # LlamaGate endpoint
api_key="not-needed" # Optional: only needed if API_KEY is set in LlamaGate
)
# Use it exactly like the OpenAI API
response = client.chat.completions.create(
model="mistral", # Use any model available in your local Ollama
messages=[
{"role": "user", "content": "Hello! How are you?"}
]
)
print(response.choices[0].message.content)Streaming with Python SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="not-needed"
)
stream = client.chat.completions.create(
model="mistral",
messages=[
{"role": "user", "content": "Count to 5"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)Point the OpenAI Node.js SDK to LlamaGate using a custom baseURL:
import OpenAI from 'openai';
// Configure client to use LlamaGate instead of OpenAI
const client = new OpenAI({
baseURL: 'http://localhost:11435/v1', // LlamaGate endpoint
apiKey: 'not-needed' // Optional: only needed if API_KEY is set in LlamaGate
});
// Use it exactly like the OpenAI API
const response = await client.chat.completions.create({
model: 'mistral', // Use any model available in your local Ollama
messages: [
{ role: 'user', content: 'Hello! How are you?' }
]
});
console.log(response.choices[0].message.content);Streaming with Node.js SDK:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:11435/v1',
apiKey: 'not-needed'
});
const stream = await client.chat.completions.create({
model: 'mistral',
messages: [
{ role: 'user', content: 'Count to 5' }
],
stream: true
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}If you've set API_KEY in your LlamaGate configuration, include it in requests:
curl with authentication:
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-llamagate" \
-d '{
"model": "mistral",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'Python SDK with authentication:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="sk-llamagate" # Your API_KEY from LlamaGate config
)
response = client.chat.completions.create(
model="mistral",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)Node.js SDK with authentication:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:11435/v1',
apiKey: 'sk-llamagate' // Your API_KEY from LlamaGate config
});
const response = await client.chat.completions.create({
model: 'mistral',
messages: [
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);Note: Authentication is optional. If API_KEY is not set in LlamaGate, you can omit the api_key parameter or use any value.
📚 For SDK usage, point your OpenAI client at your LlamaGate URL; see API and Testing.
curl http://localhost:11435/healthcurl http://localhost:11435/v1/modelsfrom langchain_openai import ChatOpenAI
# Use ChatOpenAI with LlamaGate endpoint
llm = ChatOpenAI(
model="mistral", # Default: Mistral 7B (CPU-only or 8GB VRAM)
base_url="http://localhost:11435/v1", # Use base_url instead of openai_api_base
api_key="not-needed" # Optional: only if API_KEY is set in LlamaGate
)
response = llm.invoke("Hello, how are you?")
print(response.content)Note: This example uses langchain_openai (LangChain v0.1+). For older versions, use from langchain.chat_models import ChatOpenAI and openai_api_base parameter.
Handle errors gracefully in your applications:
from openai import OpenAI
from openai import APIError
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="not-needed"
)
try:
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
except APIError as e:
print(f"API Error: {e.status_code} - {e.message}")
if e.status_code == 401:
print("Authentication failed. Check your API key.")
elif e.status_code == 429:
print("Rate limit exceeded. Please retry later.")
except Exception as e:
print(f"Unexpected error: {e}")Use MCP tools for extended capabilities:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="not-needed"
)
# Request with tool calling enabled
response = client.chat.completions.create(
model="mistral",
messages=[
{"role": "user", "content": "What files are in the /tmp directory?"}
],
tools=[{
"type": "function",
"function": {
"name": "mcp.filesystem.list_files",
"description": "List files in a directory"
}
}],
tool_choice="auto" # Let the model decide when to use tools
)
# Handle the response
message = response.choices[0].message
print(f"Content: {message.content}")
# Check for tool calls
if message.tool_calls:
for tool_call in message.tool_calls:
print(f"Tool: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")Note: MCP must be enabled and configured in LlamaGate for tool calling to work. See MCP Quick Start for setup instructions.
Configure the OpenAI client using environment variables:
import os
from openai import OpenAI
# Set environment variables (or use .env file)
os.environ["OPENAI_BASE_URL"] = "http://localhost:11435/v1"
os.environ["OPENAI_API_KEY"] = os.getenv("LLAMAGATE_API_KEY", "not-needed")
# Client automatically uses environment variables
client = OpenAI()
response = client.chat.completions.create(
model="mistral",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Environment Variables:
OPENAI_BASE_URL- LlamaGate endpoint URLOPENAI_API_KEY- API key (if authentication is enabled)
For production use, add retries, timeouts, and connection pooling:
from openai import OpenAI
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
# Configure HTTP client with timeouts and connection pooling
client = OpenAI(
base_url="http://localhost:11435/v1",
api_key="not-needed",
http_client=httpx.Client(
timeout=httpx.Timeout(30.0, connect=5.0),
limits=httpx.Limits(max_keepalive_connections=5, max_connections=10)
)
)
# Add retry logic for transient failures
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def chat_with_retry(messages):
return client.chat.completions.create(
model="mistral",
messages=messages
)
# Use the retry wrapper
try:
response = chat_with_retry([
{"role": "user", "content": "Hello!"}
])
print(response.choices[0].message.content)
except Exception as e:
print(f"Failed after retries: {e}")Production Best Practices:
- Use connection pooling for better performance
- Set appropriate timeouts (30s default, 5s connect)
- Implement retry logic for transient failures
- Monitor rate limits and adjust request patterns
- Use structured logging for debugging
📚 See more: API reference and Testing guide for request examples.
OpenAI-compatible chat completions endpoint. Forwards requests to Ollama /api/chat.
Request Body:
{
"model": "mistral",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false,
"temperature": 0.7
}Lists available Ollama models. Forwards requests to Ollama /api/tags and converts to OpenAI format.
Health check endpoint that verifies both server and Ollama connectivity.
Response (healthy):
{
"status": "healthy",
"ollama": "connected",
"ollama_host": "http://localhost:11434"
}Response (unhealthy):
{
"status": "unhealthy",
"error": "Ollama unreachable",
"ollama_host": "http://localhost:11434"
}Returns 200 OK when healthy, 503 Service Unavailable when Ollama is unreachable.
When API_KEY is configured, all API endpoints (except /health) require authentication.
LlamaGate supports two authentication header formats (both are case-insensitive):
curl -H "X-API-Key: sk-llamagate" http://localhost:11435/v1/modelsThe header name is case-insensitive. All of the following are accepted:
X-API-Keyx-api-keyX-Api-Key- Any other case variation
curl -H "Authorization: Bearer sk-llamagate" http://localhost:11435/v1/modelsThe "Bearer" scheme is case-insensitive. All of the following are accepted:
Authorization: Bearer sk-llamagateAuthorization: bearer sk-llamagateAuthorization: BEARER sk-llamagate
Header Priority: The X-API-Key header is checked first. If not present, Authorization: Bearer is checked.
- When Authentication is Required: All endpoints except
/healthrequire authentication whenAPI_KEYis configured. - When Authentication is Missing: Requests without a valid authentication header return
401 Unauthorizedwith an OpenAI-compatible error response. - When Authentication is Invalid: Requests with an invalid or incorrect API key return
401 Unauthorizedwith an OpenAI-compatible error response.
Authentication errors return HTTP 401 Unauthorized with a JSON response in OpenAI-compatible format:
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"request_id": "req-123456"
}
}- API keys are never logged: Authentication failures are logged with a generic message ("Authentication failed") but the provided API key or bearer token is never included in logs.
- Constant-time comparison: API key validation uses constant-time comparison to prevent timing attacks.
- Health endpoint bypass: The
/healthendpoint does not require authentication and can be used for monitoring and load balancer health checks.
If API_KEY is not set, authentication is disabled and all requests are allowed.
LlamaGate caches responses for non-streaming requests. The cache key is based on:
- Model name
- Messages content
Identical requests (same model + same messages) will return cached responses, reducing load on Ollama.
Rate limiting is implemented using a leaky bucket algorithm. The default limit is 50 requests per second, configurable via RATE_LIMIT_RPS.
When the limit is exceeded, requests receive a 429 Too Many Requests response with:
- HTTP Status:
429 Too Many Requests - Retry-After Header: Number of seconds to wait before retrying (e.g.,
Retry-After: 1) - Response Body: OpenAI-compatible JSON error format
Status: 429 Too Many Requests
Headers:
Retry-After: 1
Response Body:
{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"request_id": "req-123456"
}
}Rate-limited requests are logged with structured fields including request ID, IP address, path, retry time, and limiter decision.
LlamaGate implements consistent request correlation and secure logging across all components.
Every inbound request receives a unique request ID:
- If
X-Request-IDheader is provided: LlamaGate uses the provided request ID - If no header is provided: LlamaGate generates a UUID v4 request ID
The request ID is:
- Included in the
X-Request-IDresponse header - Propagated to all downstream components:
- Ollama upstream calls (via
X-Request-IDheader) - Tool/function calling (via context)
- MCP tool calls (via context and HTTP headers)
- Ollama upstream calls (via
- Included in all structured log entries for the request
LlamaGate automatically redacts sensitive values from logs to prevent secret leakage:
Redacted Values:
- API keys (
X-API-Keyheader values) - Bearer tokens (
Authorization: Bearerheader values) - Any other secrets in headers, environment variables, or configuration
What is Logged:
- Request method, path, status code, latency
- Request ID for correlation
- Client IP address
- Error messages (without sensitive data)
- Authentication failures (generic message only)
What is NOT Logged:
- API key values
- Bearer token values
- Authorization header contents
- Any header values that contain secrets
Example Log Entry:
{
"level": "info",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"method": "POST",
"path": "/v1/chat/completions",
"status": 200,
"latency": "1.234s",
"ip": "192.168.1.100",
"time": "2026-01-12T10:00:00Z",
"message": "HTTP request"
}Notice that the API key is not present in the log, even though it was sent in the request headers.
LlamaGate implements graceful shutdown to ensure clean termination without dropping in-flight requests.
When LlamaGate receives SIGINT or SIGTERM:
- Stop accepting new requests: The server immediately stops accepting new connections
- Allow in-flight requests to complete: Active requests are allowed to finish up to a configurable timeout
- Close downstream connections cleanly:
- Ollama HTTP client connections are closed
- MCP server connections are closed
- Cache cleanup goroutines are stopped
- Handle streaming responses safely: Streaming responses check for context cancellation and stop gracefully when the server shuts down
The shutdown timeout is configurable via the SHUTDOWN_TIMEOUT environment variable:
# Default: 30 seconds
SHUTDOWN_TIMEOUT=30s
# Examples:
SHUTDOWN_TIMEOUT=10s # 10 seconds
SHUTDOWN_TIMEOUT=1m # 1 minute
SHUTDOWN_TIMEOUT=2m30s # 2 minutes 30 secondsTimeout Behavior:
- If all in-flight requests complete before the timeout: Clean shutdown
- If the timeout is reached: Remaining requests are terminated, and the server exits
- Signal received (
SIGINTorSIGTERM) - Server stops accepting new requests
- Cache cleanup goroutines stopped
- Downstream connections closed (Ollama, MCP)
- In-flight requests allowed to complete (up to timeout)
- Server exits gracefully
Note: Streaming responses automatically detect server shutdown via context cancellation and stop gracefully, preventing abrupt connection resets.
LlamaGate supports native HTTPS/TLS encryption. To enable HTTPS:
-
Set TLS configuration in
.env:TLS_ENABLED=true TLS_CERT_FILE=/path/to/certificate.crt TLS_KEY_FILE=/path/to/private.key
-
Or use YAML config:
tls_enabled: true tls_cert_file: /path/to/certificate.crt tls_key_file: /path/to/private.key
-
Generate self-signed certificate (for testing):
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
-
For production with Let's Encrypt, use a reverse proxy (nginx, Caddy) for automatic certificate management and renewal.
Note: When TLS_ENABLED=true, the server will use HTTPS. Make sure to use https:// in your client URLs.
LlamaGate uses structured JSON logging with Zerolog. Each request is assigned a unique request ID.
Log Levels:
INFO: Default level, logs all requests and important eventsDEBUG: Enabled withDEBUG=true, includes detailed debugging information
Log Output:
-
By default, logs are written to stdout (console)
-
To also write logs to a file, set the
LOG_FILEenvironment variable:LOG_FILE=llamagate.log
-
When
LOG_FILEis set, logs are written to both console and file -
The log file is created automatically if it doesn't exist, and new logs are appended to it
-
Note: The log file is not automatically rotated. For production use, consider using a log rotation tool or process manager
Example log entry:
{
"level": "info",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"method": "POST",
"path": "/v1/chat/completions",
"status": 200,
"latency": "1.234s",
"ip": "127.0.0.1",
"time": 1703123456
}See docs/TESTING.md for a comprehensive testing guide, or use the provided test script:
Windows:
scripts\windows\test.cmdUnix/Linux/macOS:
./scripts/unix/test.shThis will test all endpoints, caching, authentication, and more.
To validate installer scripts before deployment, see docs/INSTALLER_TESTING.md:
# Test all installers
.\tests\installer\test-all-installers.ps1
# Test Windows installer only
.\tests\installer\test-installer-windows.ps1
# Test Unix installer (requires bash/WSL)
chmod +x tests/installer/test-installer-unix.sh
./tests/installer/test-installer-unix.shUsing the installer (recommended):
install\windows\install.cmdManual build:
go build -o llamagate ./cmd/llamagateOr use the build script:
scripts\windows\build.cmdgo test ./...# Make sure Ollama is running
ollama serve
# In another terminal, run LlamaGate
go run ./cmd/llamagateLlamaGate includes support for the Model Context Protocol (MCP) as a client. This allows you to:
- Connect to MCP servers and discover their tools, resources, and prompts
- Expose tools to chat completion requests
- Execute tool calls in multi-round loops
- Reference MCP resources directly in chat messages using
mcp://URIs - Enforce security with allow/deny lists
- Access MCP management via HTTP API
See MCP Documentation for full details and MCP Quick Start for a getting started guide.
You can reference MCP resources directly in chat completion messages:
curl -X POST http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: sk-llamagate" \
-d '{
"model": "mistral",
"messages": [{
"role": "user",
"content": "Summarize mcp://filesystem/file:///docs/readme.txt"
}]
}'LlamaGate will automatically fetch the resource content and inject it as context.
LlamaGate Core is Open Source
This repository contains the core LlamaGate functionality:
- OpenAI-compatible API gateway
- MCP client support
- Caching, authentication, rate limiting
- Basic tool execution
Advanced Features (Separate Modules)
The following features are not included in this open-source core and are available as separate modules:
- Advanced workflow automation packs
- Enterprise connectors and integrations
- Cloud fallback capabilities
- Compatibility validation suites
- Premium support and consulting
These advanced features are maintained separately and are not part of this repository.
Issue: When using PowerShell with $ErrorActionPreference = "Stop", git clone may fail even though the clone succeeds. This happens because Git writes progress messages to stderr, which PowerShell treats as errors.
Symptoms:
- Script fails with error even though
git clonecompletes successfully - Error message appears but repository is actually cloned
Solution (PowerShell):
Option 1: Temporarily change error action (Recommended)
# Save current setting
$oldErrorAction = $ErrorActionPreference
# Temporarily allow errors during git clone
$ErrorActionPreference = "Continue"
# Clone the repository (replace with your repo URL if you host elsewhere)
git clone <your-llamagate-repo-url>.git
# Restore original setting
$ErrorActionPreference = $oldErrorActionOption 2: Check exit code instead
# Clone and check exit code
git clone <your-llamagate-repo-url>.git 2>&1 | Out-Null
if ($LASTEXITCODE -ne 0) {
Write-Error "Git clone failed with exit code $LASTEXITCODE"
exit 1
}Option 3: Redirect stderr
# Redirect stderr to null (suppress progress messages)
git clone <your-llamagate-repo-url>.git 2>$nullNote: This is a known Git behavior - progress messages go to stderr even on success. The installers handle this automatically, but manual git clone commands in PowerShell scripts need this workaround.
If the binary installer fails because binaries aren't available yet:
- Use the source installer instead (see Installation section)
- Or wait for binaries to be published to releases
Make the binary executable:
chmod +x llamagate- Make sure you're in the directory where the binary was installed
- Or add the directory to your PATH
- Or use the full path:
/path/to/llamagate
If you need a different architecture than what's available:
- Build from source (see Installation section)
- The installers automatically detect your platform
- ✅ Windows (amd64)
- ✅ Linux (amd64, arm64)
- ✅ macOS (amd64, arm64)
- ✅ Ollama - Fully supported (primary backend)
- ❌ Direct OpenAI API - Not included (use OpenAI SDK directly)
- ❌ Other LLM providers - Not included in core
- ✅ stdio transport - Fully implemented
⚠️ SSE transport - Interface prepared, implementation pending- ✅ Tool execution - Multi-round loops supported
- ✅ Security guardrails - Allow/deny lists, timeouts, size limits
- HTTPS/TLS - Native HTTPS support available via
TLS_ENABLED,TLS_CERT_FILE, andTLS_KEY_FILEconfiguration. For production with Let's Encrypt, a reverse proxy (nginx, Caddy) is still recommended for automatic certificate management. - In-memory cache only - Cache is lost on restart (persistent cache not included in core)
- Global rate limiting - Per-IP rate limiting not included in core
- No cloud fallback - Core is designed for local Ollama instances only
- Single binary deployment - No built-in clustering or load balancing
- Single instance per machine - Only one LlamaGate instance should run per machine. Multiple applications can connect to the same instance. If you try to start a second instance, you'll get a clear error message indicating the port is already in use.
- Database persistence (cache, logs, etc.)
- Multi-tenant isolation
- Advanced monitoring/observability dashboards
- Enterprise SSO/authentication providers
- High-availability/clustering features
.
├── cmd/
│ └── llamagate/
│ └── main.go # Entry point
├── internal/
│ ├── config/
│ │ └── config.go # Configuration management
│ ├── logger/
│ │ └── logger.go # Logger initialization
│ ├── cache/
│ │ └── cache.go # In-memory cache
│ ├── mcpclient/
│ │ ├── client.go # MCP client implementation
│ │ ├── stdio.go # stdio transport
│ │ ├── sse.go # SSE transport (stub)
│ │ ├── types.go # MCP protocol types
│ │ └── errors.go # MCP errors
│ ├── tools/
│ │ ├── manager.go # Tool registry and management
│ │ ├── mapper.go # MCP to OpenAI format conversion
│ │ ├── guardrails.go # Security and limits
│ │ └── types.go # Tool types
│ ├── middleware/
│ │ ├── auth.go # Authentication middleware
│ │ ├── rate_limit.go # Rate limiting middleware
│ │ └── request_id.go # Request ID middleware
│ └── proxy/
│ ├── proxy.go # Proxy handlers
│ └── tool_loop.go # Tool execution loop
├── docs/
│ ├── MCP.md # MCP documentation
│ └── MCP_QUICKSTART.md # MCP quick start guide
├── mcp-config.example.yaml # MCP configuration example
├── Dockerfile
├── go.mod
├── go.sum
└── README.md
MIT
Contributions are welcome! Please feel free to submit a Pull Request.