Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,26 @@
# (e.g. /v1/chat/completions and /v1/responses items).
# ENABLE_GUARDRAILS_FOR_BATCH_PROCESSING=false

# ----------------------------------------------------------------------------
# Intelligent model routing (default: disabled)
# Analyzes the request with a cheap analyzer model and selects the best catalog
# model for execution. Only triggers for intelligent selectors (auto/smart/
# auto-cost/auto-quality) or intelligent virtual models, unless mode is observe.
# Configure analyzers/selectors in config.yaml under intelligent_routing.
# See docs/dev/intelligent-model.md.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether the referenced dev doc exists and list any intelligent-routing docs.
fd -t f 'intelligent' docs
rg -n 'docs/dev/intelligent-model\.md' --glob '!**/vendor/**'

Repository: ENTERPILOT/GoModel

Length of output: 194


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show the relevant references and whether the referenced dev doc exists.
git ls-files '.env.template' 'config/intelligent_routing.go' 'CLAUDE.md' 'docs/dev/*' 'docs/features/*'

printf '\n--- refs ---\n'
rg -n 'docs/dev/intelligent-model\.md|intelligent-model\.md|intelligent-routing\.mdx' .env.template config/intelligent_routing.go CLAUDE.md docs -g '!**/vendor/**' || true

printf '\n--- docs/dev listing ---\n'
fd -t f . docs/dev || true

printf '\n--- docs/features listing ---\n'
fd -t f . docs/features || true

Repository: ENTERPILOT/GoModel

Length of output: 2366


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the exact lines around the references.
sed -n '268,280p' .env.template
printf '\n--- config/intelligent_routing.go ---\n'
sed -n '1,40p' config/intelligent_routing.go
printf '\n--- CLAUDE.md ---\n'
sed -n '120,140p' CLAUDE.md

Repository: ENTERPILOT/GoModel

Length of output: 8641


Update the intelligent-routing doc link
.env.template, config/intelligent_routing.go, and CLAUDE.md still reference docs/dev/intelligent-model.md; point them to docs/features/intelligent-routing.mdx instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.template at line 275, Update the stale intelligent-routing
documentation reference so it points to docs/features/intelligent-routing.mdx
instead of docs/dev/intelligent-model.md. Make the same link change wherever the
old path appears in the related config/docs references, including the
.env.template entry and the intelligent routing config/docs references such as
config/intelligent_routing.go and CLAUDE.md, so all unique symbols and comments
consistently reference the new doc.

# ----------------------------------------------------------------------------
# INTELLIGENT_ROUTING_ENABLED=false
# INTELLIGENT_ROUTING_MODE=off # off | observe | enforce

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, IIUC:
INTELLIGENT_ROUTING_ENABLED=false AND INTELLIGENT_ROUTING_MODE=off ===

INTELLIGENT_ROUTING_ENABLED=true AND INTELLIGENT_ROUTING_MODE=off

What's the reason for mode=off ?

# INTELLIGENT_ROUTING_DEFAULT_STRATEGY=balanced # cost | balanced | quality | latency
Comment on lines +278 to +279

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid inline # … comments on value lines.

Lines 278-279 put the enumerations on the same line as the assignment (INTELLIGENT_ROUTING_MODE=off # off | observe | enforce). The rest of this template keeps descriptions on separate comment lines, and several env_file loaders (e.g. Docker Compose) do not strip trailing inline comments, so an uncommented value would become off # off | observe | enforce. Move the option lists to preceding comment lines for consistency and safety.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.template around lines 278 - 279, The INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY entries still use inline value-line
comments, which can be parsed as part of the value by env loaders. Update the
.env.template entries to match the rest of the template by moving the option
descriptions to separate preceding comment lines and keeping the assignment
lines as plain values; use the existing INTELLIGENT_ROUTING_MODE and
INTELLIGENT_ROUTING_DEFAULT_STRATEGY keys to locate the affected lines.

# INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS=256
# INTELLIGENT_ROUTING_TIMEOUT=1500ms # Go duration string
# INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO=0.15
# INTELLIGENT_ROUTING_MIN_CONFIDENCE=0.7
# INTELLIGENT_ROUTING_FALLBACK_MODEL=
# INTELLIGENT_ROUTING_ANALYSIS_USER_PATH=/intelligent-router
# INTELLIGENT_ROUTING_CANDIDATES_ALLOW=
# INTELLIGENT_ROUTING_CANDIDATES_DENY=

# In-memory buffer size before flushing to storage (default: 1000)
# USAGE_BUFFER_SIZE=1000

Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ Full reference: `.env.template` and `config/config.yaml`
- **HTTP client:** `HTTP_TIMEOUT` (600s), `HTTP_RESPONSE_HEADER_TIMEOUT` (600s)
- **Resilience:** Configured via `config/config.yaml` - global `resilience.retry.*` and `resilience.circuit_breaker.*` defaults with optional per-provider overrides under `providers.<name>.resilience.retry.*` and `providers.<name>.resilience.circuit_breaker.*`. Retry defaults: `max_retries` (3), `initial_backoff` (1s), `max_backoff` (30s), `backoff_factor` (2.0), `jitter_factor` (0.1). Circuit breaker defaults: `failure_threshold` (5), `success_threshold` (2), `timeout` (30s)
- **Metrics:** `METRICS_ENABLED` (false), `METRICS_ENDPOINT` (/metrics)
- **Intelligent routing:** Disabled by default. When enabled, the gateway classifies each request with a cheap analyzer model and selects the best catalog model for execution. Configured via `config/config.yaml` under `intelligent_routing` (key env vars: `INTELLIGENT_ROUTING_ENABLED`, `INTELLIGENT_ROUTING_MODE` (`off`/`observe`/`enforce`), `INTELLIGENT_ROUTING_DEFAULT_STRATEGY` (`cost`/`balanced`/`quality`/`latency`), `INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS`, `INTELLIGENT_ROUTING_TIMEOUT`, `INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO`, `INTELLIGENT_ROUTING_FALLBACK_MODEL`). It only triggers for intelligent selectors (`auto`, `smart`, `auto-cost`, `auto-quality`) or intelligent virtual models, unless `mode` is `observe` (dry-run that records the recommendation without changing the executed model). The default example pool ships with `codex/gpt-5.4-mini`, `zai/glm-5-turbo`, and `anthropic/claude-haiku-4-5` as ordered analyzers (tried in order with failover). Analysis cost is attributed to `analysis_user_path` (`/intelligent-router` by default) to keep it separate from the main execution in usage reports. See `docs/dev/intelligent-model.md`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Analyzer provider inconsistency with the example config.

This line documents the default analyzer pool as codex/gpt-5.4-mini, but config/config.example.yaml (Line 190-191) registers gpt-5.4-mini under provider: "openai", and docs/features/intelligent-routing.mdx also uses openai. Align these so users aren't pointed at a provider name (codex) that isn't in the shipped examples.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 131, The intelligent routing docs describe the default
analyzer pool with an inconsistent provider name for the first model entry;
update the wording in CLAUDE.md so it matches the shipped examples and feature
docs that use the OpenAI provider for gpt-5.4-mini. Use the intelligent routing
section and its analyzer pool description as the target, and keep the
provider/model naming aligned with config/config.example.yaml and
docs/features/intelligent-routing.mdx so the documented default pool is
consistent.

- **Guardrails:** Configured via `config/config.yaml` only (except `GUARDRAILS_ENABLED` env var)
- **Providers:** `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `USE_GOOGLE_GEMINI_NATIVE_API` (true by default; false uses Gemini's OpenAI-compatible chat API), `XAI_API_KEY`, `GROQ_API_KEY`, `OPENROUTER_API_KEY`, `ZAI_API_KEY`, `ZAI_BASE_URL` (optional Z.ai endpoint override), `MINIMAX_API_KEY`, `MINIMAX_BASE_URL` (optional MiniMax endpoint override), `XIAOMI_API_KEY`, `XIAOMI_BASE_URL` (optional Xiaomi MiMo endpoint override), `OPENCODE_GO_API_KEY`, `OPENCODE_GO_BASE_URL` (optional OpenCode Go/Zen endpoint override; default `https://opencode.ai/zen/go/v1`), `OPENCODE_GO_MESSAGES_MODELS` (optional comma-separated model IDs routed to the Anthropic-native `/messages` endpoint instead of `/chat/completions`; default `qwen3.7-max`), `BAILIAN_API_KEY`, `BAILIAN_BASE_URL` (optional Bailian base URL for region switching; default `https://dashscope.aliyuncs.com/compatible-mode/v1`), `AZURE_API_KEY`, `AZURE_BASE_URL` (Azure OpenAI deployment base URL), `AZURE_API_VERSION` (optional Azure API version), `ORACLE_API_KEY` (Oracle API key), `ORACLE_BASE_URL` (Oracle OpenAI-compatible base URL), `<PROVIDER>[_SUFFIX]_MODELS` (comma-separated configured model list for any provider type), `OLLAMA_BASE_URL`, `VLLM_BASE_URL`, `VLLM_API_KEY` (optional upstream vLLM bearer token)
- **Provider model metadata:** `providers.<name>.models` accepts either model IDs (strings) or `{id, metadata}` objects. When `metadata` is supplied (`display_name`, `context_window`, `max_output_tokens`, `modes`, `capabilities`, `pricing`, …) it is merged onto the remote ai-model-list entry during enrichment, with operator values winning per-field. Primary use case: advertising context windows, capabilities, and pricing for local models (Ollama) and other custom endpoints whose IDs are not in the upstream registry.
41 changes: 41 additions & 0 deletions config/config.example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,47 @@ guardrails:
# skip_content_prefix: "### safe"
# # prompt: "Custom rewrite instructions here."

# Intelligent model routing (optional, disabled by default).
# When enabled, the gateway classifies the request with a cheap analyzer model
# and selects the best catalog model for execution. Only triggers for
# intelligent selectors (auto/smart/auto-cost/auto-quality) or intelligent
# virtual models, unless mode is observe. See docs/dev/intelligent-model.md.
intelligent_routing:
enabled: false
mode: "off" # off | observe | enforce; observe classifies but keeps the requested model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still 'off' mode here. What is it for?

analyzers:
# Ordered pool of cheap models used to classify the request. Tried in
# order; on failure or timeout the next analyzer is used.
- model: "gpt-5.4-mini"
provider: "openai"
max_tokens: 256
- model: "glm-5-turbo"
provider: "zai"
max_tokens: 256
- model: "claude-haiku-4-5"
provider: "anthropic"
max_tokens: 256
defaults:
strategy: "balanced" # cost | balanced | quality | latency
max_analysis_tokens: 256
timeout: "8000ms"
min_savings_ratio: 0.15 # minimum estimated savings to switch to a cheaper model in enforce
min_confidence: 0.7 # below this, a stronger model is chosen
selectors:
- name: "auto"
strategy: "balanced"
- name: "smart"
strategy: "balanced"
- name: "auto-cost"
strategy: "cost"
- name: "auto-quality"
strategy: "quality"
# candidates: # optional allow/deny over the catalog
# allow: ["openai/gpt-4o-mini", "anthropic/claude-sonnet-*"]
# deny: []
fallback_model: "openai/gpt-4o-mini" # used when all analyzers fail; empty falls back to model_not_found
analysis_user_path: "/intelligent-router" # scopes analyzer usage/audit cost separately

fallback:
default_mode: "manual" # "off", "manual", or "auto"; default is "manual"
manual_rules_path: "config/fallback.example.json" # optional JSON map: {"model": ["fallback-1", "provider/model"]}; when omitted, manual mode has no fallback candidates
Expand Down
36 changes: 21 additions & 15 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,21 @@ import (

// Config holds the application configuration.
type Config struct {
Server ServerConfig `yaml:"server"`
Models ModelsConfig `yaml:"models"`
Cache CacheConfig `yaml:"cache"`
Storage StorageConfig `yaml:"storage"`
Logging LogConfig `yaml:"logging"`
Usage UsageConfig `yaml:"usage"`
Budgets BudgetsConfig `yaml:"budgets"`
Metrics MetricsConfig `yaml:"metrics"`
HTTP HTTPConfig `yaml:"http"`
Admin AdminConfig `yaml:"admin"`
Guardrails GuardrailsConfig `yaml:"guardrails"`
Fallback FallbackConfig `yaml:"fallback"`
Workflows WorkflowsConfig `yaml:"workflows"`
Resilience ResilienceConfig `yaml:"resilience"`
Server ServerConfig `yaml:"server"`
Models ModelsConfig `yaml:"models"`
Cache CacheConfig `yaml:"cache"`
Storage StorageConfig `yaml:"storage"`
Logging LogConfig `yaml:"logging"`
Usage UsageConfig `yaml:"usage"`
Budgets BudgetsConfig `yaml:"budgets"`
Metrics MetricsConfig `yaml:"metrics"`
HTTP HTTPConfig `yaml:"http"`
Admin AdminConfig `yaml:"admin"`
Guardrails GuardrailsConfig `yaml:"guardrails"`
Fallback FallbackConfig `yaml:"fallback"`
Workflows WorkflowsConfig `yaml:"workflows"`
Resilience ResilienceConfig `yaml:"resilience"`
IntelligentRouting IntelligentRoutingConfig `yaml:"intelligent_routing"`
}

// LoadResult is returned by Load and bundles the application config with the raw
Expand Down Expand Up @@ -127,7 +128,8 @@ func buildDefaultConfig() *Config {
LiveLogsReplayLimit: 1000,
LiveLogsHeartbeatSeconds: 15,
},
Guardrails: GuardrailsConfig{},
Guardrails: GuardrailsConfig{},
IntelligentRouting: DefaultIntelligentRoutingConfig(),
}
}

Expand Down Expand Up @@ -193,6 +195,10 @@ func Load() (*LoadResult, error) {
return nil, err
}

if err := ValidateIntelligentRoutingConfig(&cfg.IntelligentRouting); err != nil {
return nil, err
}

return &LoadResult{
Config: cfg,
RawProviders: rawProviders,
Expand Down
Loading
Loading