rho-llm

Multi-provider LLM client for Go. Streaming, tool use, image/vision support, extended thinking, auth pool rotation. Includes thread-safe concurrency management to prevent redundant HTTP client allocations during concurrent rate-limit failovers. Zero external dependencies (stdlib only).

Requires Go 1.26+ (go 1.26.0 in go.mod).

Install

go get github.com/bds421/rho-llm

Supported Providers

Provider	Protocol	Auth	Default BaseURL
Anthropic/Claude	Native	x-api-key	api.anthropic.com
Google Gemini	Native	x-goog-api-key	generativelanguage.googleapis.com
OpenAI	OpenAI-compat / Responses	Bearer	api.openai.com/v1
xAI/Grok	OpenAI-compat	Bearer	api.x.ai/v1
Groq	OpenAI-compat	Bearer	api.groq.com/openai/v1
Cerebras	OpenAI-compat	Bearer	api.cerebras.ai/v1
Mistral	OpenAI-compat	Bearer	api.mistral.ai/v1
OpenRouter	OpenAI-compat	Bearer	openrouter.ai/api/v1
DashScope/Qwen	OpenAI-compat	Bearer	dashscope-intl.aliyuncs.com/compatible-mode/v1
DeepSeek	OpenAI-compat	Bearer	api.deepseek.com
Cohere	OpenAI-compat	Bearer	api.cohere.ai/compatibility/v1
Ollama	OpenAI-compat	None	localhost:11434/v1
vLLM	OpenAI-compat	None	localhost:8000/v1
LM Studio	OpenAI-compat	None	localhost:1234/v1

Quick Start

This example demonstrates a complete request using Google Gemini, but the code is identical for all 15 providers.

import _ "github.com/bds421/rho-llm/provider" // required: register adapters

// 1. Configure and initialize
cfg := llm.Config{
    Provider: "gemini",
    Model:    "flash", // resolves to gemini-2.5-flash
    APIKey:   os.Getenv("GEMINI_API_KEY"),
}
client, err := llm.NewClient(cfg)
if err != nil {
    panic(err)
}
defer client.Close()

// 2. Send a message
req := llm.Request{
    Messages: []llm.Message{
        llm.NewTextMessage(llm.RoleUser, "Explain quantum entanglement in one sentence."),
    },
}
resp, err := client.Complete(context.Background(), req)

fmt.Println(resp.Content)

Provider Recipes

Ollama (local, no API key)

cfg := llm.Config{
    Provider: "ollama",
    Model:    "llama3", // or "mistral", "phi3", etc.
}

Custom OpenAI-compatible endpoint

Unknown providers (not in the presets list) must set BaseURL. Without it, NewClient returns an error to prevent typos from silently defaulting to an incorrect endpoint.

cfg := llm.Config{
    Provider:   "custom",
    BaseURL:    "http://my-proxy:8080/v1", // required for unknown providers
    APIKey:     "my-key",
}

Image / Vision Support

Send images to vision-capable models using ContentImage parts with base64-encoded data. All three protocol adapters serialize images to the correct wire format automatically.

import (
    "encoding/base64"
    "os"
)

imgBytes, _ := os.ReadFile("photo.png")
imgData := base64.StdEncoding.EncodeToString(imgBytes)

req := llm.Request{
    Messages: []llm.Message{{
        Role: llm.RoleUser,
        Content: []llm.ContentPart{
            {Type: llm.ContentText, Text: "What do you see in this image?"},
            {Type: llm.ContentImage, Source: &llm.ImageSource{
                Type: "base64", MediaType: "image/png", Data: imgData,
            }},
        },
    }},
}
resp, err := client.Complete(ctx, req)

// Or use the convenience helper for single-image messages:
msg := llm.NewImageMessage(llm.RoleUser, "image/png", imgData)

Supported media types: image/jpeg, image/png, image/gif, image/webp.

Streaming

client.Stream() returns a Go 1.23 iterator (iter.Seq2[StreamEvent, error]) that yields events as the model generates tokens. This lets you display partial output in real time rather than waiting for the full response. Use break to abort early — the iterator cleans up the underlying HTTP connection automatically.

for event, err := range client.Stream(ctx, req) {
    if err != nil {
        fmt.Printf("Stream error: %v\n", err)
        break
    }
    switch event.Type {
    case llm.EventContent:
        fmt.Print(event.Text)           // Partial text token
    case llm.EventToolUse:
        // Model wants to call a tool (see Tool Use below)
        fmt.Printf("Tool call: %s(%v)\n", event.ToolCall.Name, event.ToolCall.Input)
    case llm.EventThinking:
        fmt.Print(event.Thinking)       // Extended thinking (Anthropic with ThinkingLevel set)
    case llm.EventDone:
        // Final metadata — stop reason and token usage
        fmt.Printf("\nDone: %s (in=%d, out=%d)\n",
            event.StopReason, event.InputTokens, event.OutputTokens)
    }
}

Event types

Event	Fields	Description
`EventContent`	`Text`	A chunk of generated text. Concatenate all chunks for the full response.
`EventToolUse`	`ToolCall` (ID, Name, Input)	The model is invoking a tool. Handle it and continue the conversation with the result.
`EventThinking`	`Thinking`	Extended thinking output (requires `ThinkingLevel` in config).
`EventDone`	`StopReason`, `InputTokens`, `OutputTokens`	Stream completed. `StopReason` is normalized across all providers: `end_turn`, `tool_use`, or `max_tokens`.
`EventError`	`Error`	An error occurred mid-stream.

Stream completion: EventDone is emitted when the API sends a completion signal (finish reason + usage stats). If the connection drops or the API response is malformed, the iterator may exhaust without EventDone. Handle iterator exhaustion as the authoritative "stream ended" signal; treat EventDone as optional metadata. Token counts use the sentinel llm.TokensNotReported (-1) when the provider did not report usage; compare against this constant to distinguish "not reported" from "zero tokens" (0).

Malformed events: If a provider sends an SSE event with invalid JSON, the iterator yields an error for that event and continues parsing subsequent events. Callers should check err on every iteration and decide whether to break or continue. This ensures data corruption is never silent.

Tool Use

Tool use (function calling) lets the model invoke functions you define. When a model wants to call a tool, resp.StopReason will be "tool_use". You manage the conversation by executing the tool locally and feeding llm.NewToolResultMessage back into the message history.

Use llm.NewAssistantMessage(resp) to append the assistant's response — it preserves both text and tool_use blocks. Using NewTextMessage(RoleAssistant, resp.Content) would drop tool call blocks, causing the next request to fail.

// In the tool use loop:
req.Messages = append(req.Messages, llm.NewAssistantMessage(resp))  // text + tool_use blocks
req.Messages = append(req.Messages, llm.NewToolResultMessage(tc.ID, result, false))

For a full working example of an agentic Tool Use loop, see examples/tool_use/main.go.

If a tool execution fails, you can pass isError: true so the model knows the call failed and can attempt to recover:

req.Messages = append(req.Messages, llm.NewToolResultMessage(tc.ID, "location not found", true))

Thinking & Reasoning

Many modern models support reasoning (chain-of-thought) capabilities where they expose their internal thought processes before outputting the final answer.

You can check if a given model natively supports extended thinking by checking the registry. ThinkingLevel is supported by anthropic, gemini, and openai (via the Responses API for GPT-5 family models, where it maps to reasoning effort). The OpenAI Chat Completions adapter (openai_compat) returns an error if ThinkingLevel is set — use the openai provider instead, which auto-routes GPT-5 models to the Responses API. All four adapters parse thinking content from responses: Anthropic via thinking blocks, Gemini via thought: true parts, OpenAI Responses via reasoning_summary, and OpenAI-compat via the reasoning_content field (used by Ollama Qwen3, DeepSeek-R1, etc.).

info, ok := llm.GetModelInfo("claude-opus-4-6")

// 1. API-controlled thinking budgets (e.g. Anthropic)
if ok && info.SupportsThinking {
    fmt.Println("Model supports extended thinking budgets")
    
    // Opt-in via config
    cfg := llm.Config{
        Provider:      "anthropic",
        Model:         "claude-opus-4-6",
        ThinkingLevel: llm.ThinkingLow, // or llm.ThinkingMedium / llm.ThinkingHigh
    }
}

// 2. Intrinsic reasoning models (e.g. DeepSeek-R1, Grok 4 R)
if ok && info.Thinking {
    fmt.Println("Model uses intrinsic reasoning natively")
}

You can override the default token budget for a specific request using ThinkingBudget:

req := llm.Request{
    Messages:       messages,
    ThinkingLevel:  llm.ThinkingMedium,
    ThinkingBudget: 8192, // overrides ThinkingMedium's default of 16384
}

Note: Anthropic's API requires temperature = 1.0 when extended thinking is enabled. The adapter enforces this automatically. A warning-level log is emitted when a temperature override occurs.

If extended thinking is enabled, you can read it synchronously via resp.Thinking or asynchronously in a stream via llm.EventThinking and event.Thinking.

Context Caching

Context caching reduces cost and latency by reusing previously processed input. Anthropic and Gemini support caching with different models.

Anthropic (inline cache_control)

Mark content blocks, system prompts, or tool definitions as cacheable. Anthropic caches the marked prefix and reuses it on subsequent requests with the same prefix.

req := llm.Request{
    System:             "You are a helpful assistant with extensive knowledge...",
    SystemCacheControl: true, // Cache the system prompt
    Messages: []llm.Message{{
        Role: llm.RoleUser,
        Content: []llm.ContentPart{
            {Type: llm.ContentText, Text: longDocument, CacheControl: true}, // Cache this block
            {Type: llm.ContentText, Text: "Summarize the above."},
        },
    }},
    Tools: []llm.Tool{{
        Name: "search", Description: "Search the web",
        InputSchema:  map[string]interface{}{"type": "object"},
        CacheControl: true, // Cache this tool definition
    }},
}

resp, _ := client.Complete(ctx, req)
fmt.Printf("Cache write: %d tokens, Cache read: %d tokens\n",
    resp.CacheCreationTokens, resp.CacheReadTokens)

Cache token usage is also available in streaming via EventDone:

for event, err := range client.Stream(ctx, req) {
    if event.Type == llm.EventDone {
        fmt.Printf("Cache: write=%d read=%d\n",
            event.CacheCreationTokens, event.CacheReadTokens)
    }
}

Gemini (cached content reference)

Gemini uses a two-stage model: create a cache resource externally, then reference it by name. Cache lifecycle (create/list/delete) is managed outside the SDK.

req := llm.Request{
    CachedContent: "cachedContents/abc123", // Pre-created via Gemini API
    Messages:      []llm.Message{llm.NewTextMessage(llm.RoleUser, "Summarize.")},
}
resp, _ := client.Complete(ctx, req)
fmt.Printf("Cached tokens: %d\n", resp.CacheReadTokens)

OpenAI-compatible

Cache fields are silently ignored — no error, no effect.

Automatic Retry, Circuit Breaker & Auth Pool Rotation

All clients get automatic retry with exponential backoff (1s→2s→4s, capped at 30s) and a circuit breaker (opens after 5 consecutive failures, probes after 30s) — including keyless local providers like Ollama and vLLM. A solo developer hitting a transient 502 or 429 gets the same resilience as an enterprise with 10 keys.

The rotation engine is thread-safe. During concurrent rate-limit events, rotation is synchronized to prevent redundant HTTP client allocations, ensuring all in-flight requests seamlessly fail over to the next available endpoint.

cfg := llm.Config{
    Provider:  "anthropic",
    Model:     "claude-sonnet-4-6",
    APIKey:    os.Getenv("ANTHROPIC_API_KEY"),
    MaxTokens: 8192,
    Timeout:   120 * time.Second,
}

// Single-key: gets retry/backoff + circuit breaker on transient errors
client, err := llm.NewClient(cfg)

// Multi-key: rotates between keys on failure
keys := []string{"key1", "key2", "key3"}
client, err := llm.NewClientWithKeys(cfg, keys)

Circuit Breaker

When an endpoint is degraded (returning 503s), the circuit breaker prevents request storms by opening after consecutive failures and allowing a single probe after cooldown:

cfg := llm.DefaultConfig()                     // circuit breaker enabled by default
cfg.CircuitThreshold = 3                        // open after 3 consecutive failures
cfg.CircuitCooldown  = 15 * time.Second         // probe after 15s

Auth errors (401/403) do not trip the circuit — a bad key is not a broken endpoint.

Configurable Retry & Cooldowns

cfg.RetryPolicy = &llm.RetryPolicy{
    BaseDelay: 500 * time.Millisecond,         // faster retries for local providers
    MaxDelay:  10 * time.Second,
    Factor:    2.0,
    Jitter:    0.25,
}
cfg.CooldownRateLimit = 30 * time.Second       // 429 cooldown (default: 60s)
cfg.CooldownOverload  = 15 * time.Second       // 503 cooldown (default: 30s)
cfg.CooldownDefault   = 5 * time.Second        // other errors (default: 10s)

Retry Observability

cfg.RetryHook = func(evt llm.RetryEvent) {
    metrics.Counter("llm_retries", "type", evt.Type.String()).Inc()
}

Per-Profile Endpoints

Keys can include a custom BaseURL using the API_KEY|BASE_URL format. This enables failover across different backends:

keys := []string{
    "sk-primary-key",                              // Uses cfg.BaseURL (or provider default)
    "sk-backup-key|https://azure-proxy.example.com/v1",  // Uses Azure proxy
    "local-key|http://localhost:8000/v1",          // Falls back to local vLLM
}
client, err := llm.NewClientWithKeys(cfg, keys)

When a key fails, the pool rotates to the next profile — which may use an entirely different endpoint.

Error handling:

Transient errors (429, 503, 502): Backoff and retry, rotating to other keys if available
Auth errors (401, 403): Key is permanently disabled; rotates to other keys or fails immediately if none remain
Bad request (400): Returns immediately — the request is broken, not the key

Structured Errors

All API errors are returned as *APIError with HTTP status code, enabling reliable classification. (Note: If using NewClientWithKeys for Auth Pool Rotation, retries happen automatically. These helpers are useful for manual flow control with a single client or application-level retries).

resp, err := client.Complete(ctx, req)
if err != nil {
    switch {
    case llm.IsRateLimited(err):
        // 429 - back off and retry
    case llm.IsOverloaded(err):
        // 503 - server busy, retry later
    case llm.IsAuthError(err):
        // 401/403 - check API key
    case llm.IsContextLength(err):
        // 400 - input too long, truncate
    case llm.IsRetryable(err):
        // Any retryable error (429, 503, 500, 502, 408)
    default:
        // Non-retryable error
    }
}

Cost Estimation

Estimate cost from token counts using registry pricing data:

cost := llm.EstimateCost(llm.CostInput{
    Model:             "claude-sonnet-4-6",
    InputTokens:       resp.InputTokens,
    OutputTokens:      resp.OutputTokens,
    ThinkingTokens:    resp.ThinkingTokens,
    CacheCreateTokens: resp.CacheCreationTokens,
    CacheReadTokens:   resp.CacheReadTokens,
})
fmt.Printf("Cost: $%.6f\n", cost)

// Access pricing data directly
info, _ := llm.GetModelInfo("claude-opus-4-6")
fmt.Printf("Context: %d tokens, Input: $%.2f/1M\n", info.ContextWindow, info.InputPricePer1M)

Request Logging (Middleware)

Enable metadata-only logging (no message content) via LogRequests:

cfg := llm.Config{
    Provider:    "anthropic",
    Model:       "claude-sonnet-4-6",
    APIKey:      apiKey,
    LogRequests: true,  // Logs provider, model, tokens, cost, elapsed time
}
client, _ := llm.NewClient(cfg)

Or wrap an existing client manually:

client = llm.WithLogging(client)
client = llm.WithLoggingPrefix(client, "[MyApp]")

Exponential Backoff

The pool uses configurable exponential backoff with jitter (default: 1s base, 30s cap). Override via Config.RetryPolicy or use the utility directly:

// Available as a utility for custom retry logic
delay := llm.Backoff(attempt, 1*time.Second, 30*time.Second)
// attempt 0: ~1s, 1: ~2s, 2: ~4s, 3: ~8s, ...

// Or use RetryPolicy directly
p := llm.RetryPolicy{BaseDelay: 100*time.Millisecond, MaxDelay: 5*time.Second, Factor: 3.0, Jitter: 0.1}
delay = p.Delay(attempt)

Config Reference

Field	Type	Default	Description
Provider	string	"anthropic"	Provider name
Model	string	"claude-sonnet-4-6"	Model identifier
APIKey	string	""	API key (empty OK for local providers)
MaxTokens	int	8192	Max output tokens
Temperature	*float64	nil	Sampling temperature (nil = provider default, omitted from wire)
ThinkingLevel	ThinkingLevel	""	Extended thinking: ThinkingLow/ThinkingMedium/ThinkingHigh
Timeout	Duration	120s	HTTP timeout
BaseURL	string	""	Override provider endpoint
AuthHeader	string	"Bearer"	Override auth header format
ProviderName	string	""	Override Client.Provider()
LogRequests	bool	false	Enable request/response metadata logging
RetryPolicy	*RetryPolicy	nil	Configurable backoff (nil = DefaultRetryPolicy: 1s–30s, 2x, ±25% jitter)
CircuitThreshold	int	5	Consecutive failures to open circuit (0 = disabled)
CircuitCooldown	Duration	30s	Open→half-open cooldown
CooldownRateLimit	Duration	60s	Profile cooldown for 429 errors
CooldownOverload	Duration	30s	Profile cooldown for 503 errors
CooldownDefault	Duration	10s	Profile cooldown for other transient errors
RetryHook	RetryHook	nil	Observability hook for retry lifecycle events
MaxRetries	int	10	Cap on retry/rotation iterations (min effective: 3)
BetaFeatures	[]string	["interleaved-thinking-..."]	Provider beta flags (Anthropic: anthropic-beta header)
AnthropicVersion	string	"2023-06-01"	Override Anthropic API version header
MaxErrorBodyBytes	int	1 MB	Cap on error response body reads
MaxSSELineBytes	int	256 KB	Cap on per-line SSE buffer
MaxResponseBodyBytes	int	32 MB	Cap on success response body reads
MaxToolInputBytes	int	1 MB	Cap on accumulated tool input JSON
MaxErrorMessageLen	int	4096	Cap on stored error message length

Model Registry

Use ResolveModelAlias() for short aliases:

model := llm.ResolveModelAlias("opus")   // -> "claude-opus-4-6"
model = llm.ResolveModelAlias("grok")    // -> "grok-4.20-beta"
model = llm.ResolveModelAlias("flash")   // -> "gemini-2.5-flash"

Anthropic aliases

Alias	Resolves to
`opus`	`claude-opus-4-6`
`sonnet`	`claude-sonnet-4-6`
`haiku`	`claude-haiku-4-5-20251001`
`claude`	`claude-sonnet-4-6`

xAI / Grok aliases

Alias	Resolves to
`grok`, `grok4.2`, `grok4.20`, `grok4`	`grok-4.20-beta`
`grok4.1`, `grok-4.1`	`grok-4-1-fast-non-reasoning`
`grok-reasoning`, `grok-4-reasoning`	`grok-4-fast-reasoning`
`grok-4.1-reasoning`	`grok-4-1-fast-reasoning`
`grok-code`	`grok-code-fast-1`
`grok-mini`	`grok-3-mini`

OpenAI aliases

Alias	Resolves to
`gpt`, `gpt5.4`, `gpt5`	`gpt-5.4`
`gpt5.3`, `gpt-instant`	`gpt-5.3-chat-latest`
`gpt5.4-mini`	`gpt-5.4-mini`
`gpt5.4-nano`	`gpt-5.4-nano`
`gpt5.2`	`gpt-5.2`
`gpt5.1`	`gpt-5.1`
`gpt5-mini`	`gpt-5-mini`
`gpt5-nano`	`gpt-5-nano`
`gpt4.1`	`gpt-4.1`

Groq aliases

Alias	Resolves to
`groq`	`llama-3.3-70b-versatile`
`llama`, `llama-70b`	`llama-3.3-70b-versatile`
`llama-8b`	`llama-3.1-8b-instant`
`llama4`, `llama-4-scout`	`meta-llama/llama-4-scout-17b-16e-instruct`
`gpt-oss`	`openai/gpt-oss-120b`
`gpt-oss-20b`	`openai/gpt-oss-20b`

Mistral aliases

Alias	Resolves to
`mistral-large`	`mistral-large-2512`
`mistral-medium`	`mistral-medium-latest`
`mistral-small`	`mistral-small-2603`
`magistral`	`magistral-medium-2509`
`codestral`	`codestral-2508`
`devstral`	`devstral-2512`
`devstral-small`	`devstral-small-2-2512`
`ministral`	`ministral-8b-2512`

DeepSeek aliases

Alias	Resolves to
`deepseek-cloud`, `deepseek-v4`	`deepseek-chat`

Cohere aliases

Alias	Resolves to
`cohere`, `command-a`	`command-a-03-2025`

DashScope/Qwen aliases

Alias	Resolves to
`qwen-cloud`	`qwen3.6-plus`
`qwen-omni`	`qwen3.5-omni-plus`

Ollama aliases

Alias	Resolves to
`deepseek`	`deepseek-r1:14b`
`mistral-local`	`mistral-small3.2:24b`
`qwen-local`	`qwen3:8b`
`qwen-code`	`qwen3-coder:30b`
`gemma`, `gemma4`	`gemma4:e4b`
`gemma3`	`gemma3:12b`

Gemini aliases

Alias	Resolves to
`gemini`, `flash-lite`	`gemini-3.1-flash-lite-preview`
`flash`	`gemini-2.5-flash`
`gemini-pro`, `gemini3.1`, `gemini3`, `gemini-3`	`gemini-3.1-pro-preview`

Gemini 3 note: gemini-3-pro-preview and gemini-3-flash-preview use ThoughtSignature — the model returns an opaque signature in tool call responses that must be echoed in the corresponding tool_result. The adapter handles this automatically; no changes to calling code are required.

Cost and metadata

// Estimate cost from token counts (cache-aware)
cost := llm.EstimateCost(llm.CostInput{Model: "claude-sonnet-4-6", InputTokens: inputTokens, OutputTokens: outputTokens})

// Query per-model metadata (context window, pricing, thinking support)
info, ok := llm.GetModelInfo("grok-4.20-beta")
fmt.Printf("Context: %d tokens\n", info.ContextWindow)

// Detect provider from model ID
provider := llm.ProviderForModel("gemini-2.5-flash") // -> "gemini"

// Get the default model for a provider
model := llm.GetDefaultModel("xai") // -> "grok-4.20-beta"

Package Structure

Provider implementations live in sub-packages under provider/, following the database/sql driver registration pattern:

llm/
  types.go, config.go, errors.go, ...   # Core types and interfaces
  register.go                            # RegisterProvider() registry
  factory.go                             # NewClient() -> registry lookup
  retrypolicy.go                         # RetryPolicy + RetryHook (configurable backoff)
  circuitbreaker.go                      # CircuitBreaker (3-state machine)
  provider/
    all.go                               # Blank-imports all sub-packages
    anthropic/anthropic.go               # Anthropic Claude adapter
    gemini/gemini.go                     # Google Gemini adapter
    openaicompat/openaicompat.go         # OpenAI-compatible adapter (13+ providers)

Consumers that call llm.NewClient() must add a blank import in their main.go:

import _ "github.com/bds421/rho-llm/provider"  // register all provider adapters

Consumers that only use types (llm.Client, llm.Config, llm.Message, etc.) need no blank import.

Acknowledgements

The name rho is a tribute to pi — rho is the next letter in the Greek alphabet. pi's ai package is a great multi-provider LLM library; rho brings the same idea to Go.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
docs		docs
examples		examples
provider		provider
tasks		tasks
testdata		testdata
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
circuitbreaker.go		circuitbreaker.go
circuitbreaker_test.go		circuitbreaker_test.go
config.go		config.go
errors.go		errors.go
factory.go		factory.go
go.mod		go.mod
go.sum		go.sum
llm_test.go		llm_test.go
middleware.go		middleware.go
pool.go		pool.go
pool_internal_test.go		pool_internal_test.go
provider.go		provider.go
register.go		register.go
registry.go		registry.go
registry_internal_test.go		registry_internal_test.go
retrypolicy.go		retrypolicy.go
retrypolicy_test.go		retrypolicy_test.go
security_test.go		security_test.go
types.go		types.go

Folders and files

Latest commit

History

Repository files navigation

rho-llm

Install

Supported Providers

Quick Start

Provider Recipes

Ollama (local, no API key)

Custom OpenAI-compatible endpoint

Image / Vision Support

Streaming

Event types

Tool Use

Thinking & Reasoning

Context Caching

Anthropic (inline cache_control)

Gemini (cached content reference)

OpenAI-compatible

Automatic Retry, Circuit Breaker & Auth Pool Rotation

Circuit Breaker

Configurable Retry & Cooldowns

Retry Observability

Per-Profile Endpoints

Structured Errors

Cost Estimation

Request Logging (Middleware)

Exponential Backoff

Config Reference

Model Registry

Anthropic aliases

xAI / Grok aliases

OpenAI aliases

Groq aliases

Mistral aliases

DeepSeek aliases

Cohere aliases

DashScope/Qwen aliases

Ollama aliases

Gemini aliases

Cost and metadata

Package Structure

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages