Welcome to the LLM Interactive Proxy User Guide. This guide provides comprehensive documentation for end-users who want to use and configure the proxy.
- Quick Start Guide - Get up and running in minutes with installation, basic configuration, and first steps
- Configuration Guide - Learn about configuration methods, precedence, and common scenarios
- Access Modes - Single-user vs multi-user access mode behavior and configuration
- CLI Parameters Reference - Complete reference for all CLI arguments and environment variables
- Database Configuration - Database setup for SQLite (default) and PostgreSQL
Advanced features that enhance the proxy's capabilities:
- SSO Identity Provider Overview - Overview of supported Identity Providers and configuration
- Quality Verifier System - Real-time response verification using a secondary model
- Tool Access Control - Fine-grained control over which tools models can access
- Dangerous Command Protection - Prevent execution of potentially harmful commands
- Dangerous Command Protection (Dev Tools) - Explain safe developer tool exemptions
- File Access Sandboxing - Restrict file system access to specific directories
- SSO Agent Setup - Setting up SSO with agent integrations
- SSO Authentication - Authentication flow details
- SSO Authorization - Authorization modes and configuration
- SSO Configuration - Detailed SSO configuration guide
- SSO Identity Provider Overview - Overview of supported Identity Providers
- SSO Identity Provider Setup - Setting up specific Identity Providers
- SSO Security - Security considerations and best practices
- SSO Troubleshooting - Common issues and solutions
- Hybrid Backend - Use two models in sequence for reasoning and execution phases (experimental)
- Model Name Rewrites - Transform model names dynamically with aliases and patterns
- URI Model Parameters - Specify model parameters directly in model name strings
- Planning Phase Overrides - Use stronger models for planning phases in coding workflows
- Random Model Replacement - Probabilistically replace models to improve session diversity and resilience
- Replacement Metrics - Track activation rates, turn counts, and opt-outs for replacements
- Think Tags Fix - Correct improperly formatted thinking tags in model responses
- Edit Precision Tuning - Automatically adjust temperature and top_p for code editing tasks
- ProxyMem: Cross-Session Memory - Persistent context across sessions with LLM-generated summaries and intelligent context injection
- Pytest Output Compression - Compress verbose pytest output to save context tokens
- Pytest Context Saving - Automatically add helpful pytest flags for better output
- Pytest Full-Suite Steering - Prevent agents from running entire test suites inadvertently
- Inline Python Steering - Control Python code execution within responses
- Test Execution Reminder - Remind agents to run tests before completing tasks
- Session Management - Intelligent session handling and state management
- Context Compaction - Intelligent context compaction to reduce prompt size
- Context Window Enforcement - Enforce context window limits and prevent overruns
- Windows Double-Ampersand Fixer - Automatically fix
&&command separators for Windows clients - Unified Steering Telemetry Migration - Migration guide for the unified steering framework telemetry changes
- Monitoring Overview - Overview of all monitoring and analytics capabilities
- Backend Health Checks - Automated health monitoring and circuit breaker for backend API endpoints
- Connection Activity Monitoring - Real-time visibility into active connections with RX/TX byte counters
- Usage Tracking and Statistics - Comprehensive monitoring of token consumption, costs, performance metrics, and request patterns across all backends
- Failure Handling - Automatic retry and failover for backend errors
- Request Deduplication - Prevent duplicate requests from exhausting rate limits
- Resilience Scoping - Personal vs shared cooldown state for OAuth and enterprise backends
- Codebuff Quick Start - Get started with Codebuff in 5 minutes
- Codebuff Backend Compatibility - WebSocket server for Codebuff coding agent protocol
- Codebuff Protocol Reference - Complete protocol specification for Codebuff WebSocket communication
- WebSocket Transport for Responses API - Low-latency WebSocket transport for
/v1/responses - Client Identity Override - Override client identity headers for compatibility with specific tools
Frontend APIs where clients connect to the proxy:
- Frontend Overview - Understanding frontends vs backends, choosing a frontend
- OpenAI Chat Completions -
/v1/chat/completionsAPI for most OpenAI-compatible clients - OpenAI Responses API -
/v1/responsesAPI for structured JSON output - Anthropic Messages -
/anthropic/v1/messagesAPI for Claude-compatible clients - Google Gemini v1beta -
/v1beta/modelsAPI for Gemini-compatible clients
Backend provider configuration and usage:
-
Backend Overview - Supported backends, choosing a backend, and switching between providers
-
OpenAI Backend - OpenAI API and ChatGPT OAuth configuration
-
OpenAI Codex Backend - Codex CLI authentication and debugging-only usage
-
Anthropic Backend - Claude API and OAuth configuration
-
Anthropic OAuth Backend - Claude Code OAuth configuration
-
Cline Backend - Internal development & debugging backend
-
Gemini Backends - Google Gemini API, OAuth, and GCP configurations
-
Gemini OAuth Auto Backend - Multi-account Google Gemini with automatic rotation
-
Antigravity OAuth Backend - Internal Antigravity OAuth configuration
-
Kiro OAuth Auto Backend - Amazon Kiro / Q Developer streaming via self-managed OAuth
-
Kimi Code Backend - Kimi For Coding via OpenAI-compatible API
-
OpenRouter Backend - OpenRouter multi-model access
-
Nvidia Backend - NVIDIA NIM OpenAI-compatible API
-
ZAI Backend - Zhipu/Z.ai configuration
-
Qwen Backend - Alibaba Qwen OAuth configuration
-
Minimax Backend - Minimax API configuration
-
InternLM Backend - InternLM AI models with API key rotation
-
Zenmux Backend - Zenmux API configuration
-
OpenCode Zen Backend - OpenCode Zen API configuration
-
Custom Backends - Creating and configuring custom backend connectors
Tools and techniques for troubleshooting:
- Wire Capture - Record and analyze HTTP requests and responses
- CBOR Capture - Binary wire capture format with simulation capabilities
- Troubleshooting Guide - Common issues and solutions
Authentication and security best practices:
- Authentication - API key authentication and access control
- Brute-Force Protection - Rate limiting and attack prevention
- Key Hygiene - API key redaction and secure handling
- Development Guide - For contributors and developers
- CHANGELOG - Version history and release notes
- CONTRIBUTING - How to contribute to the project
- LICENSE - Project license information
If you encounter issues or have questions:
- Check the Troubleshooting Guide
- Review the relevant feature or backend documentation
- Search existing GitHub Issues
- Open a new issue with detailed information about your problem
- First-time setup: Start with Quick Start Guide
- Production deployment: Review Configuration Guide, Database Configuration, and Authentication
- Debugging issues: See Wire Capture and Troubleshooting
- Advanced features: Browse the Features section
- Backend setup: Check Backend Overview
- End Users: Quick Start, Configuration, Features, Backends
- Security Administrators: Security section, Tool Access Control, Authentication
- Developers: Development Guide, Debugging section, Wire Capture
- DevOps: Configuration, Authentication, Troubleshooting