LLM Proxy Mode Implementation Plan

Overview

Implement a flexible LLM proxy system that routes requests through a central proxy server, enabling unified API access, rate limiting, cost tracking, and request logging across multiple providers.

Providers to Implement

Anthropic - Claude models
OpenAI - GPT models
GLM (ZhipuAI) - Chinese language models
Groq - Fast inference
Grok - xAI models

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Worker/Agent  │────▶│  Proxy Server    │────▶│  Provider APIs  │
│                 │◀────│  (SSE Streaming) │◀────│  (Anthropic,    │
└─────────────────┘     │  - Rate Limit    │     │   OpenAI, etc.) │
                        │  - Cost Track    │     └─────────────────┘
                        │  - Logging       │
                        └──────────────────┘

Package Structure

internal/llmproxy/
├── provider/
│   ├── provider.go          # Provider interface
│   ├── anthropic.go         # Anthropic implementation
│   ├── openai.go            # OpenAI implementation
│   ├── glm.go               # GLM (ZhipuAI) implementation
│   ├── groq.go              # Groq implementation
│   └── grok.go              # Grok implementation
├── server/
│   ├── server.go            # HTTP server
│   ├── handlers.go          # Request handlers
│   ├── streaming.go         # SSE streaming
│   └── middleware.go        # Rate limiting, auth
├── client/
│   └── client.go            # Proxy client for workers
├── config/
│   └── config.go            # Proxy configuration
├── cost/
│   └── tracker.go           # Cost tracking
├── logging/
│   └── logger.go            # Request/response logging
└── types.go                 # Shared types

Implementation Tasks

Task LP-1: Define Provider Interface and Types

File: internal/llmproxy/provider/provider.go

type Provider interface {
    Name() string
    StreamChat(ctx context.Context, req *ChatRequest) (<-chan ChatChunk, error)
    Chat(ctx context.Context, req *ChatRequest) (*ChatResponse, error)
    GetModels() []Model
    Validate() error
}

type ChatRequest struct {
    Model    string
    Messages []Message
    Tools    []Tool
    // ... common fields
}

type ChatChunk struct {
    Content  string
    Delta    *MessageDelta
    Metadata map[string]interface{}
}

Task LP-2: Implement Anthropic Provider

File: internal/llmproxy/provider/anthropic.go

Use Anthropic Go SDK or direct HTTP client
Support Claude 3.5 Sonnet, Haiku, Opus
Implement streaming via SSE
Tool/function calling support

Task LP-3: Implement OpenAI Provider

File: internal/llmproxy/provider/openai.go

Use OpenAI Go SDK
Support GPT-4o, GPT-4o-mini, o1
Streaming support
Tool calling support

Task LP-4: Implement GLM (ZhipuAI) Provider

File: internal/llmproxy/provider/glm.go

GLM-4, GLM-4-Flash, GLM-4-Air
API endpoint: https://open.bigmodel.cn/api/paas/v4/chat/completions
JWT token generation for auth

Task LP-5: Implement Groq Provider

File: internal/llmproxy/provider/groq.go

LLaMA 3, Mixtral models
API endpoint: https://api.groq.com/openai/v1/chat/completions
OpenAI-compatible API

Task LP-6: Implement Grok Provider

File: internal/llmproxy/provider/grok.go

Grok-2, Grok-beta
API endpoint: https://api.x.ai/v1/chat/completions
OpenAI-compatible API

Task LP-7: Implement Proxy HTTP Server

File: internal/llmproxy/server/server.go

type Server struct {
    config    *Config
    providers map[string]Provider
    costTracker *cost.Tracker
    logger    *logging.Logger
    rateLimit *rate.Limiter
}

// Endpoints:
// POST /v1/chat/completions - Unified chat endpoint
// GET  /v1/models - List available models
// GET  /health - Health check
// GET  /metrics - Cost and usage metrics

Task LP-8: Add SSE Streaming Support

File: internal/llmproxy/server/streaming.go

Server-Sent Events for streaming responses
Chunk aggregation and forwarding
Connection cleanup on disconnect

Task LP-9: Implement Proxy Client

File: internal/llmproxy/client/client.go

HTTP client for workers to use
Automatic retries with exponential backoff
Fallback between providers

Task LP-10: Add Rate Limiting

File: internal/llmproxy/server/middleware.go

Token bucket rate limiting
Per-IP and per-API-key limits
Configurable limits per provider

Task LP-11: Add Cost Tracking

File: internal/llmproxy/cost/tracker.go

Track token usage per provider/model
Calculate costs based on pricing
Budget enforcement with configurable limits

Task LP-12: Add Request/Response Logging

File: internal/llmproxy/logging/logger.go

Log all requests and responses
Include timing, token usage, costs
Support structured logging (JSON)
Optional PII redaction

Task LP-13: Configuration and CLI Integration

File: cmd/drover/main.go

New proxy serve command
Configuration file support (YAML)
Provider API key management
Model aliases and mappings

Configuration Example

llmproxy:
  listen_addr: ":8080"
  log_level: "info"
  providers:
    anthropic:
      api_key: "${ANTHROPIC_API_KEY}"
      models:
        - name: "claude-sonnet-4"
          alias: "default"
    openai:
      api_key: "${OPENAI_API_KEY}"
      base_url: "https://api.openai.com/v1"
    glm:
      api_key: "${GLM_API_KEY}"
    groq:
      api_key: "${GROQ_API_KEY}"
    grok:
      api_key: "${GROK_API_KEY}"
  rate_limits:
    requests_per_minute: 100
    tokens_per_minute: 100000
  cost_budget:
    hourly_limit: 10.0
    daily_limit: 100.0

Dependencies

// New imports in go.mod
require (
    github.com/anthropic-sdk/anthropic-go/v1 v1.0.0
    github.com/sashabaranov/go-openai v1.20.0
    github.com/redis/go-redis/v9 v9.0.0  // For rate limiting cache
)

Testing Strategy

Unit tests for each provider
Integration tests with mock servers
Load testing for rate limiting
End-to-end tests with real API keys (optional)

Migration Path

Phase 1: Core infrastructure (interface, server, client)
Phase 2: Anthropic + OpenAI providers
Phase 3: GLM, Groq, Grok providers
Phase 4: Cost tracking and advanced features
Phase 5: TUI and web dashboard integration (future)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Proxy Mode Implementation Plan

Overview

Providers to Implement

Architecture

Package Structure

Implementation Tasks

Task LP-1: Define Provider Interface and Types

Task LP-2: Implement Anthropic Provider

Task LP-3: Implement OpenAI Provider

Task LP-4: Implement GLM (ZhipuAI) Provider

Task LP-5: Implement Groq Provider

Task LP-6: Implement Grok Provider

Task LP-7: Implement Proxy HTTP Server

Task LP-8: Add SSE Streaming Support

Task LP-9: Implement Proxy Client

Task LP-10: Add Rate Limiting

Task LP-11: Add Cost Tracking

Task LP-12: Add Request/Response Logging

Task LP-13: Configuration and CLI Integration

Configuration Example

Dependencies

Testing Strategy

Migration Path

FilesExpand file tree

plan.md

Latest commit

History

plan.md

File metadata and controls

LLM Proxy Mode Implementation Plan

Overview

Providers to Implement

Architecture

Package Structure

Implementation Tasks

Task LP-1: Define Provider Interface and Types

Task LP-2: Implement Anthropic Provider

Task LP-3: Implement OpenAI Provider

Task LP-4: Implement GLM (ZhipuAI) Provider

Task LP-5: Implement Groq Provider

Task LP-6: Implement Grok Provider

Task LP-7: Implement Proxy HTTP Server

Task LP-8: Add SSE Streaming Support

Task LP-9: Implement Proxy Client

Task LP-10: Add Rate Limiting

Task LP-11: Add Cost Tracking

Task LP-12: Add Request/Response Logging

Task LP-13: Configuration and CLI Integration

Configuration Example

Dependencies

Testing Strategy

Migration Path