User Guide

Welcome to the LLM Interactive Proxy User Guide. This guide provides comprehensive documentation for end-users who want to use and configure the proxy.

Getting Started

Quick Start Guide - Get up and running in minutes with installation, basic configuration, and first steps
Configuration Guide - Learn about configuration methods, precedence, and common scenarios
Access Modes - Single-user vs multi-user access mode behavior and configuration
CLI Parameters Reference - Complete reference for all CLI arguments and environment variables
Database Configuration - Database setup for SQLite (default) and PostgreSQL

Features

Advanced features that enhance the proxy's capabilities:

Security Features

SSO Identity Provider Overview - Overview of supported Identity Providers and configuration
Quality Verifier System - Real-time response verification using a secondary model
Tool Access Control - Fine-grained control over which tools models can access
Dangerous Command Protection - Prevent execution of potentially harmful commands
Dangerous Command Protection (Dev Tools) - Explain safe developer tool exemptions
File Access Sandboxing - Restrict file system access to specific directories

Single Sign-On (SSO)

SSO Agent Setup - Setting up SSO with agent integrations
SSO Authentication - Authentication flow details
SSO Authorization - Authorization modes and configuration
SSO Configuration - Detailed SSO configuration guide
SSO Identity Provider Overview - Overview of supported Identity Providers
SSO Identity Provider Setup - Setting up specific Identity Providers
SSO Security - Security considerations and best practices
SSO Troubleshooting - Common issues and solutions

Model Management

Hybrid Backend - Use two models in sequence for reasoning and execution phases (experimental)
Model Name Rewrites - Transform model names dynamically with aliases and patterns
URI Model Parameters - Specify model parameters directly in model name strings
Planning Phase Overrides - Use stronger models for planning phases in coding workflows
Random Model Replacement - Probabilistically replace models to improve session diversity and resilience
Replacement Metrics - Track activation rates, turn counts, and opt-outs for replacements

Response Processing

Think Tags Fix - Correct improperly formatted thinking tags in model responses
Edit Precision Tuning - Automatically adjust temperature and top_p for code editing tasks

Session Memory

ProxyMem: Cross-Session Memory - Persistent context across sessions with LLM-generated summaries and intelligent context injection

Development Tools

Pytest Output Compression - Compress verbose pytest output to save context tokens
Pytest Context Saving - Automatically add helpful pytest flags for better output
Pytest Full-Suite Steering - Prevent agents from running entire test suites inadvertently
Inline Python Steering - Control Python code execution within responses
Test Execution Reminder - Remind agents to run tests before completing tasks
Session Management - Intelligent session handling and state management
Context Compaction - Intelligent context compaction to reduce prompt size
Context Window Enforcement - Enforce context window limits and prevent overruns
Windows Double-Ampersand Fixer - Automatically fix && command separators for Windows clients
Unified Steering Telemetry Migration - Migration guide for the unified steering framework telemetry changes

Monitoring and Analytics

Monitoring Overview - Overview of all monitoring and analytics capabilities
Backend Health Checks - Automated health monitoring and circuit breaker for backend API endpoints
Connection Activity Monitoring - Real-time visibility into active connections with RX/TX byte counters
Usage Tracking and Statistics - Comprehensive monitoring of token consumption, costs, performance metrics, and request patterns across all backends

Reliability and Resilience

Failure Handling - Automatic retry and failover for backend errors
Request Deduplication - Prevent duplicate requests from exhausting rate limits
Resilience Scoping - Personal vs shared cooldown state for OAuth and enterprise backends

Client Integration

Codebuff Quick Start - Get started with Codebuff in 5 minutes
Codebuff Backend Compatibility - WebSocket server for Codebuff coding agent protocol
Codebuff Protocol Reference - Complete protocol specification for Codebuff WebSocket communication
WebSocket Transport for Responses API - Low-latency WebSocket transport for /v1/responses
Client Identity Override - Override client identity headers for compatibility with specific tools

Frontends

Frontend APIs where clients connect to the proxy:

Frontend Overview - Understanding frontends vs backends, choosing a frontend
OpenAI Chat Completions - /v1/chat/completions API for most OpenAI-compatible clients
OpenAI Responses API - /v1/responses API for structured JSON output
Anthropic Messages - /anthropic/v1/messages API for Claude-compatible clients
Google Gemini v1beta - /v1beta/models API for Gemini-compatible clients

Backends

Backend provider configuration and usage:

Backend Overview - Supported backends, choosing a backend, and switching between providers
OpenAI Backend - OpenAI API and ChatGPT OAuth configuration
OpenAI Codex Backend - Codex CLI authentication and debugging-only usage
Anthropic Backend - Claude API and OAuth configuration
Anthropic OAuth Backend - Claude Code OAuth configuration
Cline Backend - Internal development & debugging backend
Gemini Backends - Google Gemini API, OAuth, and GCP configurations
Gemini OAuth Auto Backend - Multi-account Google Gemini with automatic rotation
Antigravity OAuth Backend - Internal Antigravity OAuth configuration
Kiro OAuth Auto Backend - Amazon Kiro / Q Developer streaming via self-managed OAuth
Kimi Code Backend - Kimi For Coding via OpenAI-compatible API
OpenRouter Backend - OpenRouter multi-model access
Nvidia Backend - NVIDIA NIM OpenAI-compatible API
ZAI Backend - Zhipu/Z.ai configuration
Qwen Backend - Alibaba Qwen OAuth configuration
Minimax Backend - Minimax API configuration
InternLM Backend - InternLM AI models with API key rotation
Zenmux Backend - Zenmux API configuration
OpenCode Zen Backend - OpenCode Zen API configuration
Custom Backends - Creating and configuring custom backend connectors

Debugging

Tools and techniques for troubleshooting:

Wire Capture - Record and analyze HTTP requests and responses
CBOR Capture - Binary wire capture format with simulation capabilities
Troubleshooting Guide - Common issues and solutions

Security

Authentication and security best practices:

Authentication - API key authentication and access control
Brute-Force Protection - Rate limiting and attack prevention
Key Hygiene - API key redaction and secure handling

Additional Resources

Development Guide - For contributors and developers
CHANGELOG - Version history and release notes
CONTRIBUTING - How to contribute to the project
LICENSE - Project license information

Getting Help

If you encounter issues or have questions:

Check the Troubleshooting Guide
Review the relevant feature or backend documentation
Search existing GitHub Issues
Open a new issue with detailed information about your problem

Quick Navigation

By Use Case

First-time setup: Start with Quick Start Guide
Production deployment: Review Configuration Guide, Database Configuration, and Authentication
Debugging issues: See Wire Capture and Troubleshooting
Advanced features: Browse the Features section
Backend setup: Check Backend Overview

By Role

End Users: Quick Start, Configuration, Features, Backends
Security Administrators: Security section, Tool Access Control, Authentication
Developers: Development Guide, Debugging section, Wire Capture
DevOps: Configuration, Authentication, Troubleshooting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Guide

Getting Started

Features

Security Features

Single Sign-On (SSO)

Model Management

Response Processing

Session Memory

Development Tools

Monitoring and Analytics

Reliability and Resilience

Client Integration

Frontends

Backends

Debugging

Security

Additional Resources

Getting Help

Quick Navigation

By Use Case

By Role

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

User Guide

Getting Started

Features

Security Features

Single Sign-On (SSO)

Model Management

Response Processing

Session Memory

Development Tools

Monitoring and Analytics

Reliability and Resilience

Client Integration

Frontends

Backends

Debugging

Security

Additional Resources

Getting Help

Quick Navigation

By Use Case

By Role