-
-
Notifications
You must be signed in to change notification settings - Fork 68
feat(router): add intelligent model routing with auto selectors #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9ddbb3d
140e94c
c601282
995a4c8
6901217
27a9ccc
11b1f42
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -266,6 +266,26 @@ | |
| # (e.g. /v1/chat/completions and /v1/responses items). | ||
| # ENABLE_GUARDRAILS_FOR_BATCH_PROCESSING=false | ||
|
|
||
| # ---------------------------------------------------------------------------- | ||
| # Intelligent model routing (default: disabled) | ||
| # Analyzes the request with a cheap analyzer model and selects the best catalog | ||
| # model for execution. Only triggers for intelligent selectors (auto/smart/ | ||
| # auto-cost/auto-quality) or intelligent virtual models, unless mode is observe. | ||
| # Configure analyzers/selectors in config.yaml under intelligent_routing. | ||
| # See docs/dev/intelligent-model.md. | ||
| # ---------------------------------------------------------------------------- | ||
| # INTELLIGENT_ROUTING_ENABLED=false | ||
| # INTELLIGENT_ROUTING_MODE=off # off | observe | enforce | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, IIUC:
What's the reason for mode=off ? |
||
| # INTELLIGENT_ROUTING_DEFAULT_STRATEGY=balanced # cost | balanced | quality | latency | ||
|
Comment on lines
+278
to
+279
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win Avoid inline Lines 278-279 put the enumerations on the same line as the assignment ( 🤖 Prompt for AI Agents |
||
| # INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS=256 | ||
| # INTELLIGENT_ROUTING_TIMEOUT=1500ms # Go duration string | ||
| # INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO=0.15 | ||
| # INTELLIGENT_ROUTING_MIN_CONFIDENCE=0.7 | ||
| # INTELLIGENT_ROUTING_FALLBACK_MODEL= | ||
| # INTELLIGENT_ROUTING_ANALYSIS_USER_PATH=/intelligent-router | ||
| # INTELLIGENT_ROUTING_CANDIDATES_ALLOW= | ||
| # INTELLIGENT_ROUTING_CANDIDATES_DENY= | ||
|
|
||
| # In-memory buffer size before flushing to storage (default: 1000) | ||
| # USAGE_BUFFER_SIZE=1000 | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -128,6 +128,7 @@ Full reference: `.env.template` and `config/config.yaml` | |
| - **HTTP client:** `HTTP_TIMEOUT` (600s), `HTTP_RESPONSE_HEADER_TIMEOUT` (600s) | ||
| - **Resilience:** Configured via `config/config.yaml` - global `resilience.retry.*` and `resilience.circuit_breaker.*` defaults with optional per-provider overrides under `providers.<name>.resilience.retry.*` and `providers.<name>.resilience.circuit_breaker.*`. Retry defaults: `max_retries` (3), `initial_backoff` (1s), `max_backoff` (30s), `backoff_factor` (2.0), `jitter_factor` (0.1). Circuit breaker defaults: `failure_threshold` (5), `success_threshold` (2), `timeout` (30s) | ||
| - **Metrics:** `METRICS_ENABLED` (false), `METRICS_ENDPOINT` (/metrics) | ||
| - **Intelligent routing:** Disabled by default. When enabled, the gateway classifies each request with a cheap analyzer model and selects the best catalog model for execution. Configured via `config/config.yaml` under `intelligent_routing` (key env vars: `INTELLIGENT_ROUTING_ENABLED`, `INTELLIGENT_ROUTING_MODE` (`off`/`observe`/`enforce`), `INTELLIGENT_ROUTING_DEFAULT_STRATEGY` (`cost`/`balanced`/`quality`/`latency`), `INTELLIGENT_ROUTING_MAX_ANALYSIS_TOKENS`, `INTELLIGENT_ROUTING_TIMEOUT`, `INTELLIGENT_ROUTING_MIN_SAVINGS_RATIO`, `INTELLIGENT_ROUTING_FALLBACK_MODEL`). It only triggers for intelligent selectors (`auto`, `smart`, `auto-cost`, `auto-quality`) or intelligent virtual models, unless `mode` is `observe` (dry-run that records the recommendation without changing the executed model). The default example pool ships with `codex/gpt-5.4-mini`, `zai/glm-5-turbo`, and `anthropic/claude-haiku-4-5` as ordered analyzers (tried in order with failover). Analysis cost is attributed to `analysis_user_path` (`/intelligent-router` by default) to keep it separate from the main execution in usage reports. See `docs/dev/intelligent-model.md`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Analyzer provider inconsistency with the example config. This line documents the default analyzer pool as 🤖 Prompt for AI Agents |
||
| - **Guardrails:** Configured via `config/config.yaml` only (except `GUARDRAILS_ENABLED` env var) | ||
| - **Providers:** `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, `USE_GOOGLE_GEMINI_NATIVE_API` (true by default; false uses Gemini's OpenAI-compatible chat API), `XAI_API_KEY`, `GROQ_API_KEY`, `OPENROUTER_API_KEY`, `ZAI_API_KEY`, `ZAI_BASE_URL` (optional Z.ai endpoint override), `MINIMAX_API_KEY`, `MINIMAX_BASE_URL` (optional MiniMax endpoint override), `XIAOMI_API_KEY`, `XIAOMI_BASE_URL` (optional Xiaomi MiMo endpoint override), `OPENCODE_GO_API_KEY`, `OPENCODE_GO_BASE_URL` (optional OpenCode Go/Zen endpoint override; default `https://opencode.ai/zen/go/v1`), `OPENCODE_GO_MESSAGES_MODELS` (optional comma-separated model IDs routed to the Anthropic-native `/messages` endpoint instead of `/chat/completions`; default `qwen3.7-max`), `BAILIAN_API_KEY`, `BAILIAN_BASE_URL` (optional Bailian base URL for region switching; default `https://dashscope.aliyuncs.com/compatible-mode/v1`), `AZURE_API_KEY`, `AZURE_BASE_URL` (Azure OpenAI deployment base URL), `AZURE_API_VERSION` (optional Azure API version), `ORACLE_API_KEY` (Oracle API key), `ORACLE_BASE_URL` (Oracle OpenAI-compatible base URL), `<PROVIDER>[_SUFFIX]_MODELS` (comma-separated configured model list for any provider type), `OLLAMA_BASE_URL`, `VLLM_BASE_URL`, `VLLM_API_KEY` (optional upstream vLLM bearer token) | ||
| - **Provider model metadata:** `providers.<name>.models` accepts either model IDs (strings) or `{id, metadata}` objects. When `metadata` is supplied (`display_name`, `context_window`, `max_output_tokens`, `modes`, `capabilities`, `pricing`, …) it is merged onto the remote ai-model-list entry during enrichment, with operator values winning per-field. Primary use case: advertising context windows, capabilities, and pricing for local models (Ollama) and other custom endpoints whose IDs are not in the upstream registry. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -176,6 +176,47 @@ guardrails: | |
| # skip_content_prefix: "### safe" | ||
| # # prompt: "Custom rewrite instructions here." | ||
|
|
||
| # Intelligent model routing (optional, disabled by default). | ||
| # When enabled, the gateway classifies the request with a cheap analyzer model | ||
| # and selects the best catalog model for execution. Only triggers for | ||
| # intelligent selectors (auto/smart/auto-cost/auto-quality) or intelligent | ||
| # virtual models, unless mode is observe. See docs/dev/intelligent-model.md. | ||
| intelligent_routing: | ||
| enabled: false | ||
| mode: "off" # off | observe | enforce; observe classifies but keeps the requested model | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is still 'off' mode here. What is it for? |
||
| analyzers: | ||
| # Ordered pool of cheap models used to classify the request. Tried in | ||
| # order; on failure or timeout the next analyzer is used. | ||
| - model: "gpt-5.4-mini" | ||
| provider: "openai" | ||
| max_tokens: 256 | ||
| - model: "glm-5-turbo" | ||
| provider: "zai" | ||
| max_tokens: 256 | ||
| - model: "claude-haiku-4-5" | ||
| provider: "anthropic" | ||
| max_tokens: 256 | ||
| defaults: | ||
| strategy: "balanced" # cost | balanced | quality | latency | ||
| max_analysis_tokens: 256 | ||
| timeout: "8000ms" | ||
| min_savings_ratio: 0.15 # minimum estimated savings to switch to a cheaper model in enforce | ||
| min_confidence: 0.7 # below this, a stronger model is chosen | ||
| selectors: | ||
| - name: "auto" | ||
| strategy: "balanced" | ||
| - name: "smart" | ||
| strategy: "balanced" | ||
| - name: "auto-cost" | ||
| strategy: "cost" | ||
| - name: "auto-quality" | ||
| strategy: "quality" | ||
| # candidates: # optional allow/deny over the catalog | ||
| # allow: ["openai/gpt-4o-mini", "anthropic/claude-sonnet-*"] | ||
| # deny: [] | ||
| fallback_model: "openai/gpt-4o-mini" # used when all analyzers fail; empty falls back to model_not_found | ||
| analysis_user_path: "/intelligent-router" # scopes analyzer usage/audit cost separately | ||
|
|
||
| fallback: | ||
| default_mode: "manual" # "off", "manual", or "auto"; default is "manual" | ||
| manual_rules_path: "config/fallback.example.json" # optional JSON map: {"model": ["fallback-1", "provider/model"]}; when omitted, manual mode has no fallback candidates | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
📐 Maintainability & Code Quality | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
Repository: ENTERPILOT/GoModel
Length of output: 194
🏁 Script executed:
Repository: ENTERPILOT/GoModel
Length of output: 2366
🏁 Script executed:
Repository: ENTERPILOT/GoModel
Length of output: 8641
Update the intelligent-routing doc link
.env.template,config/intelligent_routing.go, andCLAUDE.mdstill referencedocs/dev/intelligent-model.md; point them todocs/features/intelligent-routing.mdxinstead.🤖 Prompt for AI Agents