Skip to content

Latest commit

 

History

History
138 lines (106 loc) · 5.28 KB

File metadata and controls

138 lines (106 loc) · 5.28 KB

Backend Overview

The LLM Interactive Proxy supports multiple backend providers, allowing you to route requests to different LLM services while maintaining a consistent front-end API. This flexibility enables you to choose the best provider for your use case, switch providers without changing client code, and implement failover strategies.

Supported Backends

The proxy supports the following backend providers out of the box:

Backend ID Provider Authentication Best For
openai OpenAI API Key Production applications, standard OpenAI models
openai-codex OpenAI (ChatGPT/Codex OAuth) Local OAuth token Using ChatGPT login instead of API key
anthropic Anthropic API Key Claude models via standard API
anthropic-oauth Anthropic (OAuth) Local OAuth token Claude via OAuth credential flow
cline Cline Local OAuth token Internal development & debugging
gemini Google Gemini API Key Metered API usage, production apps
gemini-oauth-plan Google Gemini (CLI) OAuth Users with Google One subscription
gemini-oauth-free Google Gemini (CLI) OAuth Free tier users
gemini-cli-cloud-project Google Gemini (GCP) OAuth + GCP Project Enterprise, team workflows, central billing
openrouter OpenRouter API Key Access to many hosted models
nvidia NVIDIA (NIM / OpenAI-compatible) API Key (NVIDIA_API_KEY) Hosted NVIDIA integrator or self-hosted NIM
zenmux ZenMux API Key OpenAI-compatible ZenMux router
zai ZAI API Key Zhipu/Z.ai access
zai-coding-plan ZAI Coding Plan API Key Coding-specific workflows
kimi-code Kimi API Key Kimi For Coding (OpenAI-compatible)
minimax Minimax API Key Minimax AI models
qwen-oauth Alibaba Qwen Local OAuth token Qwen CLI OAuth
qwen-oauth Alibaba Qwen Local OAuth token Qwen CLI OAuth
internlm InternLM AI API Key InternLM models with key rotation
hybrid Virtual (orchestrates two models) Inherits from sub-backends Two-phase reasoning + execution
antigravity-oauth Google Gemini (Antigravity) Antigravity Token Internal debugging (Gemini models)

Frontend APIs

The proxy exposes multiple frontend APIs where clients connect. Each frontend implements a different LLM provider's API specification.

For detailed frontend API documentation, see the Frontend Overview:

Choosing a Backend

When selecting a backend, consider:

  • Cost: API key-based backends typically charge per token, while OAuth-based backends may have subscription or free tier limits
  • Performance: Different providers have different latency and throughput characteristics
  • Model Availability: Each provider offers different models with varying capabilities
  • Authentication: Choose between API keys (simpler) or OAuth (may offer free tiers)
  • Use Case: Some backends are optimized for specific tasks (e.g., zai-coding-plan for coding)

Configuration

Backends are configured through environment variables and the proxy configuration file:

Basic Setup

# Set API keys for the backends you want to use
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GEMINI_API_KEY="AIza..."
export OPENROUTER_API_KEY="sk-or-..."
export NVIDIA_API_KEY="..."
export ZENMUX_API_KEY="..."
export ZAI_API_KEY="..."
export KIMI_API_KEY="..."
export MINIMAX_API_KEY="..."
export INTERNAI_API_KEY="..."

# For GCP-based Gemini
export GOOGLE_CLOUD_PROJECT="your-project-id"

Starting the Proxy

# Start with a specific default backend
python -m src.core.cli --default-backend openai

# Or specify in config file
python -m src.core.cli --config config/config.yaml

Config File Example

# config.yaml
backends:
  openai:
    type: openai
  anthropic:
    type: anthropic
  gemini:
    type: gemini

default_backend: openai

Switching Backends

You can switch backends dynamically during a session using in-chat commands:

!/backend(anthropic)
!/model(claude-3-5-sonnet-20241022)

Or use one-off commands for a single request:

!/oneoff(openrouter:qwen/qwen3-coder)

Backend-Specific Documentation

For detailed configuration and usage information for each backend, see:

Related Features