Skip to content

feat(router): intelligent cost-aware + latency-aware routing#414

Open
Barsit wants to merge 1 commit into
ENTERPILOT:mainfrom
Barsit:feat/intelligent-routing
Open

feat(router): intelligent cost-aware + latency-aware routing#414
Barsit wants to merge 1 commit into
ENTERPILOT:mainfrom
Barsit:feat/intelligent-routing

Conversation

@Barsit

@Barsit Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

When multiple providers serve the same model ID, the gateway now proactively selects the best provider based on a weighted cost + latency score — instead of the previous deterministic first-wins strategy where the first registered provider always handled the request.

Architecture

Request: model="gpt-4o" (no provider pinned)
         |
         v
    resolveProvider()
         |
         +- providerHint != "" --> exact route (existing logic, unchanged)
         |
         +- providerHint == "" --> collectCandidates(modelID)
                                       |
                                       v
                                  StrategyRegistry.Resolve(ctx)
                                       |
                                       v
                                  RoutingStrategy.Select()
                                       |
                                       v
                                  Best candidate --> route

New components (8 files)

Routing strategies under internal/router/:

Component File Responsibility
RoutingStrategy interface strategy.go Pluggable strategy + ProviderCandidate
BalancedStrategy strategy_balanced.go Weighted cost+latency scoring (default)
CostOnlyStrategy strategy_cost.go Pure cost selection
LatencyOnlyStrategy strategy_latency.go Pure latency selection
FirstFitStrategy strategy_firstfit.go First acceptable (historical compat)
Scoring helpers scoring.go Normalization, filtering, tie-breaking
StrategyRegistry registry.go Name to factory map + override resolution
Context helpers context.go Context key helpers

Supporting:

Package File Responsibility
internal/llmclient/ latency.go EWMA latency/error tracker
internal/core/ interfaces.go (+ProviderStats) Runtime stats interface
internal/server/ routing_strategy.go X-GoModel-Routing-* header capture
config/ router.go Router config struct + validation

Modified components

File Change
llmclient/client.go Integrate LatencyTracker, implement ProviderStats
providers/registry.go Add ListProvidersForModel (O(k), registration-order)
providers/router.go Insert strategy selection into resolveProvider
providers/init.go Wire strategy registry at startup
providers/strategy_init.go Build registry from RouterConfig
gateway/inference_prepare.go Set RoutingEligible context flag
config/config.go Add RouterConfig field
config/config.example.yaml Document router section
.env.template Document MODEL_ROUTING_* env vars
CLAUDE.md Document intelligent routing summary

Key design decisions

  1. Default balanced strategy (cost 0.6 + latency 0.4). Single-provider configs are unaffected (one candidate, no decision needed)
  2. Missing data scores 1.0 (worst), so complete-data providers are preferred
  3. Ties broken by ProviderName lexicographic order for determinism
  4. Three single-purpose strategies: cost_only, latency_only, first_fit
  5. Per-request header overrides: X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights; invalid values silently fall back (WARN log)
  6. Explicit provider bypasses routing: openai-east/gpt-4o or req.Provider non-empty leads to exact route, no strategy involvement

Bugs fixed during development

  1. Mutual recursion: cost/latency single-factor strategies recursed infinitely when both dimensions were absent. Fixed by computing the fallback dimension's score inline
  2. Free model mis-classification: normalizeCost treated cost=0 as unknown. Fixed by returning -1 sentinel from perMtokCost when pricing is absent
  3. Production path never executed strategy: PrepareChatRequest set req.Provider via ResolveModel, then resolveProvider skipped strategy when providerHint was non-empty. Fixed via RoutingEligible context flag set during prepare for unqualified requests
  4. FirstFit order: used alpha-sorted candidates instead of registration order. Fixed by implementing ListProvidersForModel returning entries in registration order

Testing

  • 38 unit tests (router + llmclient): strategy selection logic, edge cases (empty candidates, open circuits, error rate boundary, equal scores, free model vs unknown, tiebreaks), registry resolution, EWMA convergence
  • 8 integration tests (providers): cheaper/faster selection, header override, invalid override fallback, explicit provider bypass, single-candidate no-op, weights override, FirstFit registration order
  • lint: 0 issues, full project build passes
  • All new tests + existing Router tests pass

Summary by CodeRabbit

  • New Features
    • Added Intelligent Routing to automatically select the best provider when multiple providers can serve the same model and no provider is pinned.
    • Supports balanced (cost/latency weighted) and first_fit (registration-order), with configurable max_error_rate.
    • Routing decisions can consider smoothed latency and error-rate signals, including circuit-breaker states.
    • Added per-request routing override via X-GoModel-Routing-Strategy.
  • Bug Fixes
    • Requests that explicitly pin a provider (or use provider/model) bypass Intelligent Routing.
    • Invalid per-request routing override headers are ignored and the global strategy is used.
  • Documentation
    • Added feature docs, config reference updates, and API example curl commands.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds intelligent routing configuration, strategy selection, latency tracking, provider-resolution wiring, and request-header capture for per-request strategy overrides.

Changes

Intelligent Routing

Layer / File(s) Summary
Router config and docs
config/router.go, config/config.go, .env.template, config/config.example.yaml, CLAUDE.md, docs/dev/api-examples.md, docs/features/intelligent-routing.mdx
Adds router configuration defaults, validation, and matching environment, YAML, and feature documentation for intelligent routing.
Routing contracts and strategies
internal/core/interfaces.go, internal/router/*
Defines routing candidates, context overrides, scoring, balanced and first-fit strategies, and strategy registry behavior for selection.
Latency tracking and client stats
internal/core/interfaces.go, internal/llmclient/*
Adds ProviderStats, EWMA latency/error tracking to llmclient, client stats accessors, and tests for tracker behavior.
Provider registry and request routing
internal/gateway/inference_prepare.go, internal/providers/init.go, internal/providers/registry.go, internal/providers/router.go, internal/providers/strategy_init.go, internal/providers/router_strategy_integration_test.go
Marks requests routing-eligible, preserves provider registration order, builds strategy registries from router config, applies candidate selection during provider resolution, and validates the flow with integration tests.
HTTP strategy override
internal/server/http.go, internal/server/routing_strategy.go
Captures the routing-strategy override header in Echo middleware and registers it before workflow resolution.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

  • ENTERPILOT/GoModel#268: Also changes internal/providers/registry.go, which this PR uses for routing candidate discovery.
  • ENTERPILOT/GoModel#357: Also changes the provider resolution path in internal/providers/router.go, which this PR extends with strategy-based selection.

Poem

I’m a rabbit with ears held high,
hopping through routes across the sky.
balanced weighs the carrots just so,
first_fit picks the first good glow.
(_/)

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: router-level intelligent routing using cost and latency signals.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown

Confidence Score: 3/5

This should be fixed before merging.

  • Strategy routing can dispatch to a provider that was not authorized for the request.
  • Latency-aware routing can stay disabled after a single-provider initialization in the same process.
  • The main routing flow is otherwise well covered by the added strategy and integration tests.

internal/providers/router.go and internal/providers/init.go

Security Review

A security-sensitive routing issue was found: strategy selection can switch to a different concrete provider after model access authorization has already approved the original selector.

Reviews (3): Last reviewed commit: "feat(router): add intelligent cost-aware..." | Re-trigger Greptile

Comment thread internal/gateway/inference_prepare.go Outdated
Comment thread internal/router/strategy_cost.go Outdated
Comment thread internal/providers/router.go Outdated
Comment thread config/router.go
Comment thread internal/llmclient/latency.go
@codecov-commenter

codecov-commenter commented Jun 20, 2026

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/router.go`:
- Around line 80-88: The current config validation for router weights and
max_error_rate checks for negative and out-of-range values, but does not reject
NaN or other non-finite numbers that strconv.ParseFloat can produce. Add checks
using math.IsNaN() or math.IsInf() (or a helper function) to validate that
cfg.Weights.Cost, cfg.Weights.Latency, and cfg.MaxErrorRate are finite numbers.
Include these finite-value checks alongside the existing negative and range
validation checks in the error return statements, and apply the same validation
logic to the similar checks mentioned at lines 110-115.

In `@internal/gateway/inference_prepare.go`:
- Around line 231-243: The hasProviderInModel function currently returns true
for any model string containing a slash, but it should distinguish between
actual provider prefixes like "deepseek/deepseek-v4-flash" and plain model IDs
containing slashes like "meta-llama/Llama-3-70b". Update the return statement to
add additional logic beyond the current firstSlash check to properly identify
whether the segment before the slash is an actual provider prefix or just part
of a model ID name. This will ensure plain model IDs with slashes are not
incorrectly marked as provider-pinned and can properly use intelligent routing.

In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.

In `@internal/llmclient/latency.go`:
- Around line 15-19: The ewma struct uses e.value == 0 to detect the first
sample, which incorrectly handles valid zero-valued samples (such as error rate
of 0 for successful requests). Add a boolean field to the ewma struct to
explicitly track whether the EWMA has been initialized. Then update any methods
that check e.value == 0 as an initialization condition (mentioned in lines
30-38) to instead check this new boolean flag. This ensures the first sample is
properly initialized regardless of its value.

In `@internal/providers/registry.go`:
- Around line 717-725: The ListProvidersForModel method documentation states it
returns an empty slice when the model is not found, but the current
implementation returns nil instead. Replace the nil return statement in the
early exit condition (when modelID is empty or no providers are registered) with
an empty slice of type ModelWithProvider to honor the documented API contract
and eliminate nil-shape ambiguity.

In `@internal/providers/router_strategy_integration_test.go`:
- Around line 176-202: The test
TestRouter_IntelligentRouting_ExplicitProviderHintBypasses currently only
validates provider pinning through the explicit Provider field, but does not
cover provider pinning through the model syntax format. Add a second
ChatCompletion call within the same test that uses Model: "openai-west/gpt-4o"
with Provider left empty to verify that explicit provider pins specified via the
model-qualified syntax are also respected by the routing strategy and not
overridden based on cost. Assert that the response ID is still "west-resp" to
confirm the expensive provider is selected when pinned through the model syntax,
ensuring the strategy does not bypass such explicit provider qualifications.

In `@internal/providers/router.go`:
- Around line 309-317: The eligible variable calculation on line 315 only checks
if providerHint is empty but fails to account for explicit provider
specifications using the provider/model syntax. When a user pins a provider by
including it in the model string (e.g., "provider/model"), that explicit choice
should prevent re-routing through the strategy registry. Modify the eligible
variable assignment to also check whether the model or selector contains an
explicit provider specification, and only mark requests as eligible for
re-routing when no provider is pinned either through providerHint or through the
provider/model syntax in the selector.

In `@internal/router/context.go`:
- Around line 35-46: The WithStrategyOverride function currently stores empty
strings in the context, but the documentation states that empty names should
clear any prior override. Modify the WithStrategyOverride function to check if
the strategy parameter is empty and, if so, delete the value from the context
(or return the original context unchanged) rather than storing an empty string.
This ensures that StrategyOverrideFromContext returns the expected (empty
string, false) when no override is set, matching the documented behavior.

In `@internal/router/registry_test.go`:
- Around line 68-75: The TestParseWeights_Invalid test function is missing
coverage for edge cases like trailing garbage and non-finite float values.
Expand the cases slice in the TestParseWeights_Invalid function to include
additional invalid weight strings such as strings with trailing garbage (e.g.,
"0.6x,0.4"), and non-finite float representations (e.g., "NaN,0.4", "Inf,0.4").
This ensures the ParseWeights function properly handles and rejects these
malformed inputs as part of its error handling path, making the parser behavior
regression-proof.

In `@internal/router/registry.go`:
- Around line 98-106: The weight parsing logic in the function that handles Cost
and Latency weights uses fmt.Sscanf which has two issues: it accepts non-finite
values like NaN and Inf, and it performs only partial input matching (e.g.,
"0.6foo" would be parsed as 0.6). To fix this, after parsing each weight value
with fmt.Sscanf for both Cost and Latency, verify that the entire trimmed string
was consumed by the parse operation and reject any non-finite values (NaN or
Inf) using math.IsNaN and math.IsInf checks. Add these validation checks
immediately after each fmt.Sscanf call before the existing non-negative
validation.

In `@internal/router/strategy_balanced.go`:
- Around line 39-45: The zero-value defaulting logic for balanced weights in the
strategy initialization is incorrect because the first condition modifies costW
to 0.6, which then causes the second condition to fail since it checks if costW
== 0. Fix this by changing the second condition from checking costW == 0 to
checking s.CostWeight == 0 instead, so it evaluates the original unmodified
value. This ensures that when both s.CostWeight and s.LatencyWeight are zero,
both costW and latW are properly set to their default values of 0.6 and 0.4
respectively.

In `@internal/router/strategy_cost.go`:
- Around line 33-37: The current implementation in the pricing strategy uses
`max == 0` from minMaxCost() as a sentinel to detect missing pricing data, but
this incorrectly treats legitimate zero-cost pricing as absent pricing and
degrades to latency-only ranking. Create a new helper function (e.g.,
hasPricingData) that directly checks whether pricing information is actually
present for the candidates, rather than inferring presence from numeric extrema.
Replace the condition `if _, max := minMaxCost(candidates); max == 0` with a
call to this new helper function to properly distinguish between zero cost and
missing cost data.

In `@internal/router/strategy_test.go`:
- Around line 115-124: The TestCostOnly_DegradesToLatencyWhenNoPricing test can
still pass via lexicographic tie-break even if the degradation fallback logic
regresses, because both the correct degradation and the lexicographic ordering
would select the same provider. Change the provider names in this test (and the
similar degradation test mentioned in the also-applies-to comment around line
139-148) so that the provider with better latency/lower cost comes
lexicographically after the other one, making fallback and tie-break produce
different winners and ensuring the test truly validates the degradation logic.
Additionally, add a new test for zero-value BalancedStrategy that verifies it
correctly defaults to 0.6/0.4 weight distribution when initialized without
explicit configuration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8563b06f-d85e-4bbe-8889-05a534a1c750

📥 Commits

Reviewing files that changed from the base of the PR and between 2677c1f and 086564a.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread config/router.go
Comment thread internal/gateway/inference_prepare.go Outdated
Comment thread internal/llmclient/latency_test.go
Comment thread internal/llmclient/latency.go
Comment thread internal/providers/registry.go
Comment thread internal/router/registry_test.go Outdated
Comment thread internal/router/registry.go Outdated
Comment thread internal/router/strategy_balanced.go
Comment thread internal/router/strategy_cost.go Outdated
Comment thread internal/router/strategy_test.go Outdated
@Barsit

Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Review findings assessment

Fixed

# Finding Source Fix
1 BalancedStrategy zero-weight defaulting bug (costW check used modified variable) coderabbitai Changed to check s.CostWeight instead of costW
2 CostOnlyStrategy degrades to latency when free model has pricing coderabbitai, greptile Added hasPricingData() helper, replaces max==0 sentinel
3 LatencyOnlyStrategy same issue coderabbitai Added hasLatencyData() helper
4 EWMA first-sample detection using value==0 treats genuine zero as uninitialized coderabbitai Added initialized bool field to ewma struct
5 ListProvidersForModel returns nil instead of empty slice coderabbitai Changed to return []ModelWithProvider{}
6 WithStrategyOverride stores empty strings coderabbitai Added empty-string guard
7 NaN/Inf accepted as valid config values coderabbitai Added math.IsNaN/IsInf checks in router config + ParseWeights
8 ParseWeights accepts trailing garbage (0.6x,0.4) coderabbitai Added full-string consumption check
9 Model-syntax provider overridden when req.Provider is empty coderabbitai, greptile Added slash-in-model check in resolveProvider eligibility

Assessed as not actionable

Finding Reason
hasProviderInModel misidentifies slash models In GoModel, ANY slash is provider/ID syntax. Slash IDs go through proper fallback. Consistent with existing routing semantics.
Docstring coverage < 80% Standard bot threshold; 45% is reasonable for established codebase.
Patch coverage 69.6% Core strategy/scoring logic has 94% coverage. Lower coverage in wiring/init code that is hard to unit-test in isolation.
Audio/realtime routing These paths do not set the eligibility flag, so strategy does not apply to them by design. Even if it did, callers do capability assertions.
Zero max_error_rate accepted Zero is a documented sentinel for "use default" (0.5), consistent with the rest of the gateway's config patterns.

@Barsit Barsit force-pushed the feat/intelligent-routing branch from d33b07f to dcfce08 Compare June 20, 2026 06:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
internal/llmclient/latency_test.go (1)

28-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a regression test for “many successes, then one failure” error-rate behavior.

This specific path is still untested and is the one that previously regressed.

Suggested test case
 func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) {
 	tr := NewLatencyTracker()
 	for i := 0; i < 5; i++ {
 		tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5
 	}
 	rate := tr.ErrorRate()
 	if rate < 0.3 || rate > 0.9 {
 		t.Fatalf("expected error rate near 0.4-0.6, got %v", rate)
 	}
 }
+
+func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) {
+	tr := NewLatencyTracker()
+	for i := 0; i < 5; i++ {
+		tr.Record(10*time.Millisecond, false)
+	}
+	tr.Record(10*time.Millisecond, true)
+	if got := tr.ErrorRate(); got >= 0.5 {
+		t.Fatalf("expected error rate below 0.5 after one late failure, got %v", got)
+	}
+}
As per coding guidelines, `**/*_test.go` changes should include behavior-focused coverage for updated logic paths.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new
regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.

Source: Coding guidelines

internal/router/context.go (1)

35-49: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Empty strategy override does not actually clear inherited context override.

WithStrategyOverride(ctx, "") returns ctx unchanged, so any parent strategy override remains active despite the “empty clears override” contract. This can keep an unintended override in effect during StrategyRegistry.Resolve.

Suggested minimal fix
-import "context"
+import (
+	"context"
+	"strings"
+)
@@
 func StrategyOverrideFromContext(ctx context.Context) (string, bool) {
 	s, ok := ctx.Value(strategyCtxKey{}).(string)
-	return s, ok
+	if !ok {
+		return "", false
+	}
+	s = strings.TrimSpace(s)
+	if s == "" {
+		return "", false
+	}
+	return s, true
 }
@@
 func WithStrategyOverride(ctx context.Context, strategy string) context.Context {
-	if strategy == "" {
-		return ctx
+	strategy = strings.TrimSpace(strategy)
+	if strategy == "" {
+		// Shadow any inherited override as absent.
+		return context.WithValue(ctx, strategyCtxKey{}, nil)
 	}
 	return context.WithValue(ctx, strategyCtxKey{}, strategy)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/context.go` around lines 35 - 49, The WithStrategyOverride
function returns the context unchanged when strategy is an empty string, which
fails to clear inherited overrides from parent contexts. To fix this, remove the
early return condition that checks if strategy is empty, and instead always
execute the context.WithValue call regardless of whether strategy is empty or
not. This ensures that an empty strategy value is properly set in the current
context layer, effectively shadowing and overriding any parent context value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Line 134: The "Intelligent routing" documentation section on line 134 is a
single overly dense bullet containing multiple distinct concepts (routing
strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.

In `@config/router.go`:
- Around line 61-65: The empty-check for cfg.Strategy on line 61 occurs before
trimming and converting to lowercase on line 64, which means a whitespace-only
value like "   " will pass the empty check and later fail validation instead of
defaulting to RouterStrategyBalanced. Move the strings.TrimSpace and
strings.ToLower operations before the empty-check for cfg.Strategy so that
whitespace-only values are properly caught and default to the balanced strategy
as intended.

In `@internal/llmclient/client.go`:
- Around line 231-234: The completeScope function has an inconsistency where the
latencyTracker.Record call uses err != nil to determine failure, but the circuit
breaker completion tracking uses cbErr. This causes routing error-rate telemetry
to miss failures when cbErr is set but err is nil. Update the
latencyTracker.Record call to check for failures consistently by accounting for
both err and cbErr, ensuring that any error tracked by the circuit breaker is
also reflected in the latency telemetry.

In `@internal/router/registry_test.go`:
- Around line 58-75: The TestParseWeights_Valid function only validates a single
canonical numeric format. Add additional test cases to this function to verify
that ParseWeights correctly handles equivalent numeric representations like
"0.70,0.30" and "7e-1,3e-1" which should all parse successfully and produce the
same Weight values (0.7 and 0.3 for Cost and Latency respectively). Update the
test to use a table-driven approach or add separate test cases for each format
variation to ensure regression-proof parser behavior.

In `@internal/router/registry.go`:
- Around line 101-113: The validation logic in the weight parsing code is overly
restrictive by comparing the original input strings (costStr and latStr) against
their formatted parsed values using fmt.Sprintf. This check rejects valid
numeric formats like 0.60 (which becomes 0.6) or scientific notation like 6e-1.
Since fmt.Sscanf has already successfully validated and parsed the input into a
float value, remove the string comparison checks that verify costStr matches
fmt.Sprintf("%v", cost) and latStr matches fmt.Sprintf("%v", lat). Keep only the
fmt.Sscanf validation and error handling, as it is sufficient to ensure the
input is a valid numeric weight.

In `@internal/server/routing_strategy.go`:
- Around line 33-35: The ParseWeights function in internal/router/registry.go is
performing canonical string matching after float parsing, which causes valid
decimal weight headers like "1.0,0.0" to be rejected. Modify the ParseWeights
function to be more tolerant of different valid decimal formats by normalizing
them during parsing instead of performing strict canonical matching. This aligns
with the principle to accept requests generously and adapt them according to
Postel's Law, allowing decimal values in various formats (like "1.0,0.0", "1,0",
etc.) to be accepted and properly parsed without silently ignoring valid input.

---

Duplicate comments:
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.

In `@internal/router/context.go`:
- Around line 35-49: The WithStrategyOverride function returns the context
unchanged when strategy is an empty string, which fails to clear inherited
overrides from parent contexts. To fix this, remove the early return condition
that checks if strategy is empty, and instead always execute the
context.WithValue call regardless of whether strategy is empty or not. This
ensures that an empty strategy value is properly set in the current context
layer, effectively shadowing and overriding any parent context value.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 619e37f2-39dc-4d85-a0ff-6c5c614cbd21

📥 Commits

Reviewing files that changed from the base of the PR and between d33b07f and dcfce08.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread CLAUDE.md Outdated
Comment thread config/router.go
Comment on lines 231 to 234
func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
c.recordCircuitBreakerCompletion(statusCode, cbErr)
c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
c.finishRequest(scope, statusCode, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error-rate telemetry can miss failures tracked by circuit-breaker path.

completeScope records routing error EWMA with err != nil, but CB accounting uses cbErr. If cbErr is set while err is nil, routing error-rate remains artificially low.

Suggested minimal alignment
 func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
 	c.recordCircuitBreakerCompletion(statusCode, cbErr)
-	c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
+	c.latencyTracker.Record(time.Since(scope.startedAt), err != nil || cbErr != nil)
 	c.finishRequest(scope, statusCode, err)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/client.go` around lines 231 - 234, The completeScope
function has an inconsistency where the latencyTracker.Record call uses err !=
nil to determine failure, but the circuit breaker completion tracking uses
cbErr. This causes routing error-rate telemetry to miss failures when cbErr is
set but err is nil. Update the latencyTracker.Record call to check for failures
consistently by accounting for both err and cbErr, ensuring that any error
tracked by the circuit breaker is also reflected in the latency telemetry.

Comment thread internal/router/registry_test.go Outdated
Comment on lines +58 to +75
func TestParseWeights_Valid(t *testing.T) {
w, err := ParseWeights("0.7,0.3")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
}
}

func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add positive parsing cases for equivalent numeric formats.

TestParseWeights_Valid currently validates only one canonical representation. Please add accepted-format cases like 0.70,0.30 and 7e-1,3e-1 so parser behavior is regression-proof after the ParseWeights fix.

💡 Minimal test update
 func TestParseWeights_Valid(t *testing.T) {
-	w, err := ParseWeights("0.7,0.3")
-	if err != nil {
-		t.Fatalf("unexpected error: %v", err)
-	}
-	if w.Cost != 0.7 || w.Latency != 0.3 {
-		t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
+	cases := []string{
+		"0.7,0.3",
+		"0.70,0.30",
+		"7e-1,3e-1",
+	}
+	for _, tc := range cases {
+		w, err := ParseWeights(tc)
+		if err != nil {
+			t.Fatalf("unexpected error for %q: %v", tc, err)
+		}
+		if w.Cost != 0.7 || w.Latency != 0.3 {
+			t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
+		}
 	}
 }

As per coding guidelines, tests should be updated for behavior changes and parser error-handling boundaries.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestParseWeights_Valid(t *testing.T) {
w, err := ParseWeights("0.7,0.3")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
}
}
func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}
func TestParseWeights_Valid(t *testing.T) {
cases := []string{
"0.7,0.3",
"0.70,0.30",
"7e-1,3e-1",
}
for _, tc := range cases {
w, err := ParseWeights(tc)
if err != nil {
t.Fatalf("unexpected error for %q: %v", tc, err)
}
if w.Cost != 0.7 || w.Latency != 0.3 {
t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
}
}
}
func TestParseWeights_Invalid(t *testing.T) {
cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"}
for _, c := range cases {
if _, err := ParseWeights(c); err == nil {
t.Errorf("expected error for %q", c)
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/registry_test.go` around lines 58 - 75, The
TestParseWeights_Valid function only validates a single canonical numeric
format. Add additional test cases to this function to verify that ParseWeights
correctly handles equivalent numeric representations like "0.70,0.30" and
"7e-1,3e-1" which should all parse successfully and produce the same Weight
values (0.7 and 0.3 for Cost and Latency respectively). Update the test to use a
table-driven approach or add separate test cases for each format variation to
ensure regression-proof parser behavior.

Source: Coding guidelines

Comment thread internal/router/registry.go Outdated
Comment thread internal/server/routing_strategy.go Outdated
Comment on lines +33 to +35
if weights := strings.TrimSpace(c.Request().Header.Get(RoutingWeightsHeader)); weights != "" {
if override, err := router.ParseWeights(weights); err != nil {
slog.WarnContext(ctx, "invalid routing weights header, ignoring",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Valid decimal weight headers are being rejected in the override path.

Line 34 depends on router.ParseWeights, which currently performs canonical string matching after float parsing; values like 1.0,0.0 get treated as invalid and silently ignored. That breaks common operator input and conflicts with the “accept requests generously” contract.

Suggested fix (in internal/router/registry.go)
-import "fmt"
+import (
+  "fmt"
+  "strconv"
+)

- n, err := fmt.Sscanf(costStr, "%f", &cost)
- if err != nil || n != 1 {
+ cost, err := strconv.ParseFloat(costStr, 64)
+ if err != nil {
    return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: parse error", costStr)
  }
- if costStr != fmt.Sprintf("%v", cost) {
-   return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: trailing characters", parts[0])
- }

- n, err = fmt.Sscanf(latStr, "%f", &lat)
- if err != nil || n != 1 {
+ lat, err := strconv.ParseFloat(latStr, 64)
+ if err != nil {
    return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: parse error", latStr)
  }
- if latStr != fmt.Sprintf("%v", lat) {
-   return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: trailing characters", parts[1])
- }

As per coding guidelines, **/*.go should “Accept requests generously … and adapt them”; based on learnings, this should follow Postel’s Law for tolerant request parsing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/routing_strategy.go` around lines 33 - 35, The ParseWeights
function in internal/router/registry.go is performing canonical string matching
after float parsing, which causes valid decimal weight headers like "1.0,0.0" to
be rejected. Modify the ParseWeights function to be more tolerant of different
valid decimal formats by normalizing them during parsing instead of performing
strict canonical matching. This aligns with the principle to accept requests
generously and adapt them according to Postel's Law, allowing decimal values in
various formats (like "1.0,0.0", "1,0", etc.) to be accepted and properly parsed
without silently ignoring valid input.

Sources: Coding guidelines, Learnings

@Barsit Barsit force-pushed the feat/intelligent-routing branch 2 times, most recently from d07a10e to 8ea9477 Compare June 20, 2026 06:41
@Barsit

Barsit commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator Author

Additional fix pushed:

  • ParseWeights trailing-zeros regression: The Sprintf full-string consumption check I added earlier incorrectly rejected valid inputs like "0.60,0.40" (because Sprintf("%v", 0.6) produces "0.6", not "0.60"). Fixed by using Sscanf("%f%s") with a trailing string variable — n=1 means exactly the float was consumed (no trailing chars), while n=2 means there's trailing garbage. Also expanded TestParseWeights_Valid to cover trailing zeros, whitespace, whole numbers, and zero-cost cases.

All previous findings (balanced zero-weight bug, free pricing degradation, NaN/Inf validation, EWMA init flag, model-syntax provider override, etc.) remain fixed in the squashed commit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
coverage-bailian (1)

1-36: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove generated test coverage artifact from version control.

The coverage-bailian file is a Go test coverage profile generated at test time via go test -coverprofile=... (per Makefile line 51). Generated artifacts should never be committed to version control because:

  1. Causes constant diffs: Each test run produces different coverage data (different lines executed, percentages change).
  2. Obscures real changes: Coverage artifacts clutter PR diffs and hide actual code changes from review.
  3. Violates twelve-factor principles: Generated build outputs belong in CI/CD artifact storage, not git history.
  4. Risk of merge conflicts: Multiple developers/CI runs will cause conflicting coverage changes.

Action: Remove this file from the repository and add it to .gitignore alongside the standard coverage.out and other generated test artifacts. If coverage tracking is needed, store baselines in CI/CD systems or external coverage dashboards (Codecov, Coveralls, etc.), not git.

[major_issue]

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@coverage-bailian` around lines 1 - 36, Remove the coverage-bailian file from
version control as it is a generated test coverage artifact that should not be
committed. Additionally, add coverage-bailian to the .gitignore file to prevent
it from being tracked in future commits, grouping it with other generated
coverage files like coverage.out and similar test artifacts. Use git to remove
the file from tracking while keeping it locally if needed for local testing
purposes.

Source: Learnings

♻️ Duplicate comments (1)
CLAUDE.md (1)

134-134: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Split this dense line into focused sub-bullets for scannability.

Line 134 packs routing strategy defaults, weight configuration, error-rate filtering, EWMA tracking mechanics, header overrides, and explicit provider bypass logic into one wrapped sentence. Breaking it into 4–6 sub-bullets will make operator usage much easier to understand and reference independently.

As per the coding guidelines, documentation "should be concise, practical, and user-focused." Scannable structure directly supports that goal.

Suggested structure
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing.
+- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model (single-provider configs unaffected).
+  - **Configuration:** `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`)
+  - **Balanced weights:** `MODEL_ROUTING_STRATEGY_WEIGHTS` as `cost_weight,latency_weight` (default `0.6,0.4`; balanced strategy only)
+  - **Filtering:** `MODEL_ROUTING_MAX_ERROR_RATE` (default `0.5`); candidates at/above this smoothed error rate are excluded before scoring
+  - **Latency stats:** in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate); not persisted
+  - **Per-request overrides:** `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` headers; invalid values are WARN-logged and fall back to global strategy
+  - **Bypass rule:** provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 134, The documentation for intelligent routing in
CLAUDE.md contains a single very long sentence that covers multiple distinct
configuration options and behaviors all at once, making it difficult for users
to scan and reference individual features. Break this single dense bullet point
into 4-6 separate focused sub-bullets, each covering one specific aspect:
routing strategy options and defaults (MODEL_ROUTING_STRATEGY), weight
configuration for balanced mode (MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate
filtering behavior (MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking
mechanics, per-request header overrides and their fallback behavior, and
explicit provider-qualified request bypass logic. Ensure each sub-bullet is
concise and independently understandable while maintaining the technical
accuracy of the original content.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/gateway/inference_prepare.go`:
- Around line 240-242: The comment preceding the return statement in this helper
function contradicts the actual behavior of the logic. The return condition
`firstSlash > 0 && firstSlash < len(m)-1` treats slash-style model IDs like
"meta-llama/Llama-3-70b" as provider-qualified (returns true), but the comment
claims these are not treated as provider-qualified. Update the comment to
accurately reflect that the function returns true when there is a valid provider
prefix followed by a model ID (i.e., when a slash exists that is neither at the
start nor at the end of the string).

In `@internal/llmclient/latency.go`:
- Around line 86-103: The Record method in LatencyTracker truncates
sub-millisecond latencies to zero when converting to milliseconds using
duration.Milliseconds(), which returns an integer. To preserve sub-millisecond
precision and avoid losing valid fast latency measurements, modify the
conversion to use floating point arithmetic without truncation. Convert the
duration to milliseconds as a float64 by either using duration.Seconds() and
multiplying by 1000, or by dividing duration.Nanoseconds() by 1e6, then pass
this precise value to the Add methods for t.p50 and t.p99.

In `@internal/providers/router_strategy_integration_test.go`:
- Line 75: The NewRouter constructor error is being discarded using the blank
identifier in the router initialization across multiple test locations. Replace
the blank identifier pattern with proper error handling by assigning the error
return value to a variable, then check if the error is not nil and immediately
fail the test using t.Fatalf with a descriptive message. Apply this same
error-checking pattern consistently at each NewRouter call site throughout the
integration test file to ensure setup failures are caught immediately rather
than causing misleading downstream assertion failures.

In `@internal/router/scoring.go`:
- Around line 165-176: The perMtokCost function currently returns an implicit 0
when Pricing exists but both InputPerMtok and OutputPerMtok are nil, which
incorrectly treats non-per-token pricing models as free. Add a check after
attempting to accumulate the token costs to return -1 if neither InputPerMtok
nor OutputPerMtok is present, ensuring unknown per-token pricing is treated as
unavailable rather than zero cost. Apply the same fix to the related code at
lines 201-213 that has similar logic.

In `@internal/router/strategy_test.go`:
- Around line 103-149: Add new test functions to cover the edge case where
candidates have pricing metadata but lack InputPerMtok and OutputPerMtok values.
Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.

In `@internal/server/routing_strategy.go`:
- Around line 29-31: After trimming whitespace from the RoutingStrategyHeader
using strings.TrimSpace, also normalize the strategy value to lowercase before
passing it to router.WithStrategyOverride(). This ensures that case variants
like Latency_Only are properly handled and not silently dropped later due to
case mismatches, making the request parsing more permissive and following
Postel's Law for tolerant header handling.

---

Outside diff comments:
In `@coverage-bailian`:
- Around line 1-36: Remove the coverage-bailian file from version control as it
is a generated test coverage artifact that should not be committed.
Additionally, add coverage-bailian to the .gitignore file to prevent it from
being tracked in future commits, grouping it with other generated coverage files
like coverage.out and similar test artifacts. Use git to remove the file from
tracking while keeping it locally if needed for local testing purposes.

---

Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The documentation for intelligent routing in CLAUDE.md contains a
single very long sentence that covers multiple distinct configuration options
and behaviors all at once, making it difficult for users to scan and reference
individual features. Break this single dense bullet point into 4-6 separate
focused sub-bullets, each covering one specific aspect: routing strategy options
and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode
(MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior
(MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request
header overrides and their fallback behavior, and explicit provider-qualified
request bypass logic. Ensure each sub-bullet is concise and independently
understandable while maintaining the technical accuracy of the original content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 381fa4cf-069d-4358-b9be-aa1adcfa84bb

📥 Commits

Reviewing files that changed from the base of the PR and between dcfce08 and 8ea9477.

⛔ Files ignored due to path filters (3)
  • c.out is excluded by !**/*.out
  • coverage-bailian.out is excluded by !**/*.out
  • gomodel.exe is excluded by !**/*.exe
📒 Files selected for processing (36)
  • .env.template
  • .gitignore
  • .idea/.gitignore
  • .idea/GoModel.iml
  • .idea/go.imports.xml
  • .idea/golinter.xml
  • .idea/modules.xml
  • .idea/vcs.xml
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • coverage-bailian
  • gomodel.exe~
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_cost.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_latency.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread internal/gateway/inference_prepare.go Outdated
Comment on lines +240 to +242
// A provider prefix is a non-empty segment before a slash that is NOT itself
// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
return firstSlash > 0 && firstSlash < len(m)-1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Comment contradicts the helper’s actual behavior.

The comment says slash-style model IDs like meta-llama/Llama-3-70b are not treated as provider-qualified, but the return condition does treat that shape as true. Please align the comment with the implemented semantics (or adjust the logic if that comment reflects the real intent).

✏️ Minimal comment-only fix
-	// A provider prefix is a non-empty segment before a slash that is NOT itself
-	// a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
+	// Treat any non-edge "x/y" shape as provider-qualified model syntax.
+	// This intentionally includes slash-style model IDs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/gateway/inference_prepare.go` around lines 240 - 242, The comment
preceding the return statement in this helper function contradicts the actual
behavior of the logic. The return condition `firstSlash > 0 && firstSlash <
len(m)-1` treats slash-style model IDs like "meta-llama/Llama-3-70b" as
provider-qualified (returns true), but the comment claims these are not treated
as provider-qualified. Update the comment to accurately reflect that the
function returns true when there is a valid provider prefix followed by a model
ID (i.e., when a slash exists that is neither at the start nor at the end of the
string).

Comment thread internal/llmclient/latency.go
Comment thread internal/providers/router_strategy_integration_test.go Outdated
Comment thread internal/router/scoring.go
Comment thread internal/router/strategy_test.go Outdated
Comment on lines +103 to +149
func TestCostOnly_PicksCheapest(t *testing.T) {
s := NewCostOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15), Latency: 10 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4), Latency: 500 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (cheapest), got %s", got.ProviderName)
}
}

func TestCostOnly_DegradesToLatencyWhenNoPricing(t *testing.T) {
s := NewCostOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Latency: 200 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Latency: 50 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (degraded to latency), got %s", got.ProviderName)
}
}

func TestLatencyOnly_PicksFastest(t *testing.T) {
s := NewLatencyOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(1, 1), Latency: 300 * time.Millisecond},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(50, 50), Latency: 50 * time.Millisecond},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (fastest), got %s", got.ProviderName)
}
}

func TestLatencyOnly_DegradesToCostWhenNoLatency(t *testing.T) {
s := NewLatencyOnlyStrategy()
cands := candidates(
ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15)},
ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4)},
)
got := mustSelect(t, s, cands)
if got.ProviderName != "zeta" {
t.Fatalf("expected azure (degraded to cost), got %s", got.ProviderName)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add regression coverage for “non-per-token pricing” candidates.

There’s no test asserting that candidates with pricing metadata but without InputPerMtok/OutputPerMtok are treated as unknown for cost scoring. Please add one to prevent cost-only/balanced regressions in this edge case.

✅ Suggested test case
+func TestCostOnly_DegradesToLatencyWhenNoPerTokenPricing(t *testing.T) {
+	s := NewCostOnlyStrategy()
+	cands := candidates(
+		ProviderCandidate{
+			ProviderName: "alpha",
+			Pricing:      &core.ModelPricing{PerRequest: floatPtr(1)}, // pricing exists, but not per-token
+			Latency:      200 * time.Millisecond,
+		},
+		ProviderCandidate{
+			ProviderName: "zeta",
+			Latency:      50 * time.Millisecond,
+		},
+	)
+	got := mustSelect(t, s, cands)
+	if got.ProviderName != "zeta" {
+		t.Fatalf("expected zeta (degraded to latency when per-token pricing absent), got %s", got.ProviderName)
+	}
+}

As per coding guidelines, "Add or update tests for behavior changes in Go."

Also applies to: 164-197

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/strategy_test.go` around lines 103 - 149, Add new test
functions to cover the edge case where candidates have pricing metadata but lack
InputPerMtok and OutputPerMtok values. Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.

Source: Coding guidelines

@SantiagoDePolonia

Copy link
Copy Markdown
Contributor

I think it's a good direction, but the PR is messy. There are files committed which shouldn't be there, and if you want it merged into GoModel it'll need some more work. Here are the findings worth improving:

  1. Drop the committed artifacts. gomodel.exe + gomodel.exe~ (Windows binaries, ~67MB each), c.out, coverage-bailian*, and the whole .idea/ folder shouldn't be in the repo — please remove them and add to .gitignore (*.exe, coverage outputs, .idea/). Out of curiosity, how did the .exe end up in there?

  2. Don't reinvent provider parsing. hasProviderInModel re-derives the provider prefix from the raw model string, but the request already goes through ParseModelSelector / splitQualifiedModel, which splits provider/model for you. It's also buggy — it treats any slash as a provider prefix, so legit slash model IDs like meta-llama/Llama-3-70b get wrongly marked ineligible for routing. Reuse the existing parsed selector instead.

  3. The abstraction is a bit much for what it does. balanced is the only strategy most setups will use — cost_only/latency_only are just balanced with a zero weight. I'd keep balanced (+ first_fit) and drop the rest plus the header weight overrides until there's real demand. The interface itself is fine, just trimming surface.

  4. Latency tracking is always-on, even for single-provider configs that can't benefit, and it's in-memory per-instance — so it resets on restart and isn't shared across replicas. Worth gating + a comment noting that.

  5. Check gofmt/lint — the new const block in config/router.go and the Router field in config/config.go look misaligned, so I'd double-check the "0 lint issues" claim covers the new files.

Also heads up: some of the more advanced cost-optimization routing might end up in a pro / closed-source build down the line, so let's keep this core routing seam clean and pluggable rather than baking too much logic into it.

@SantiagoDePolonia SantiagoDePolonia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requested changes are described in my previous comment.

@Barsit Barsit force-pushed the feat/intelligent-routing branch from 8ea9477 to 2bfca7b Compare June 22, 2026 15:30
@Barsit

Barsit commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

@SantiagoDePolonia 感谢审查,5 点均已修复:

1. 删除工件文件

已从 git 追踪中删除 gomodel.exegomodel.exe~c.outcoverage-bailian*.idea/,更新了 .gitignore

2. 复用 provider 解析逻辑

删除了 hasProviderInModel 函数,直接使用 core.ParseModelSelector。该函数之前与 splitQualifiedModel 完全重复且注释有误导性。

3. 精简策略抽象

  • 删除 CostOnlyStrategyLatencyOnlyStrategy
  • 只保留 BalancedStrategy(默认)和 FirstFitStrategy
  • 删除 X-GoModel-Routing-Weights 头部及其对应的 ParseWeightsWeightsOverrideWithWeightsOverride 等全部代码
  • X-GoModel-Routing-Strategy 头部保留(可在 balanced/first_fit 间按请求切换)

4. Latency tracking 条件控制

  • 通过 llmclient.SetLatencyTrackingEnabled(false) 开关控制
  • 单 provider 配置(len(providerMap) < 2)在启动时自动关闭
  • LatencyTracker 的注释已写明:in-memory、重启清零、不跨副本共享

5. gofmt/lint

  • gofmt -w 修复了所有对齐问题
  • golangci-lint 0 issues
  • pre-commit hook 已加入自动 gofmt -w,避免未来出现格式化不一致

@Barsit

Barsit commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

@SantiagoDePolonia All 5 points have been addressed:

1. Remove committed artifacts

gomodel.exe, gomodel.exe~, c.out, coverage-bailian*, .idea/ -- removed from git tracking, added to .gitignore.

2. Don't reinvent provider parsing

Removed hasProviderInModel entirely. It was a duplicate of splitQualifiedModel with a misleading comment. Now uses core.ParseModelSelector directly.

3. Trim the abstraction

  • Deleted CostOnlyStrategy and LatencyOnlyStrategy files
  • Kept only BalancedStrategy (default) and FirstFitStrategy
  • Removed X-GoModel-Routing-Weights header and all associated code: ParseWeights, WeightsOverride struct, WithWeightsOverride/WeightsOverrideFromContext, applyWeightsOverride
  • Kept X-GoModel-Routing-Strategy header -- still useful for per-request switching between balanced and first_fit

4. Gate latency tracking

  • Added llmclient.SetLatencyTrackingEnabled(bool) -- disabled at startup when len(providerMap) < 2
  • Added nil-safe guards on all LatencyTracker accessors (Record, P50Latency, P99Latency, ErrorRate)
  • Added WARNING comment on LatencyTracker documenting in-memory-only semantics (resets on restart, not shared across replicas)

5. gofmt/lint

  • gofmt -w applied to all files -- alignment issues fixed
  • golangci-lint: 0 issues
  • Pre-commit hook updated to auto-run gofmt -w before lint

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
CLAUDE.md (1)

134-134: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Address the prior feedback: split line 134 into focused sub-bullets.

SantiagoDePolonia's previous review explicitly requested splitting this dense paragraph into 4-6 focused sub-bullets for improved scannability. The current line still packs multiple distinct concepts (strategy configuration, weights, error-rate filtering, EWMA mechanics, header overrides, bypass rules) into a single paragraph.

Per the coding guideline: "Documentation should be concise, practical, and user-focused. Show defaults, explain when to change them, and include minimal examples when useful."

✏️ Proposed refactoring (from prior review)
-- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request strategy override via `X-GoModel-Routing-Strategy` header; invalid values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing.
+- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model.
+  - **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5`
+  - **Filtering:** candidates at/above `max_error_rate` are excluded before scoring.
+  - **Latency stats:** in-memory EWMA only (not persisted).
+  - **Per-request overrides:** `X-GoModel-Routing-Strategy`; invalid values are ignored with WARN logs.
+  - **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` at line 134, The intelligent routing bullet point starting at line
134 in CLAUDE.md is a dense single paragraph that combines multiple distinct
concepts including strategy configuration, weights, error-rate filtering, EWMA
mechanics, header overrides, and bypass rules. Split this into 4-6 focused
sub-bullets, where each sub-bullet addresses one specific aspect of intelligent
routing functionality (such as strategy types and defaults, weight
configuration, error-rate filtering logic, latency tracking mechanisms, header
override behavior, and explicit provider bypass rules). Ensure each sub-bullet
is concise and user-focused while preserving all technical details and
configuration options currently mentioned.

Source: Coding guidelines

internal/router/scoring.go (1)

162-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Models with non-per-token pricing are incorrectly scored as free.

When Pricing exists but both InputPerMtok and OutputPerMtok are nil (e.g., models with only PerRequest, PerImage, or tier-based pricing), line 176 returns 0. The normalizeCost function (lines 71-89) treats 0 as the best score (free model), so these candidates are incorrectly selected over models with known per-token costs.

🔧 Proposed fix to distinguish unknown per-token pricing from free models
 func perMtokCost(c *ProviderCandidate) float64 {
 	if c == nil || c.Pricing == nil {
 		return -1
 	}
+	if c.Pricing.InputPerMtok == nil && c.Pricing.OutputPerMtok == nil {
+		return -1
+	}
 	var cost float64
 	if c.Pricing.InputPerMtok != nil {
 		cost += *c.Pricing.InputPerMtok
 	}
 	if c.Pricing.OutputPerMtok != nil {
 		cost += *c.Pricing.OutputPerMtok
 	}
 	return cost
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/scoring.go` around lines 162 - 177, The perMtokCost function
currently returns 0 when Pricing exists but both InputPerMtok and OutputPerMtok
are nil, which causes models with alternative pricing models (PerRequest,
PerImage, tier-based) to be incorrectly treated as free. Add an additional check
after verifying c.Pricing is not nil to also return -1 when both InputPerMtok
and OutputPerMtok pointers are nil, ensuring that unknown per-token pricing is
distinguished from actually free models that return 0. This way the
normalizeCost function will properly differentiate between models with unknown
pricing versus those with confirmed zero per-token costs.
internal/llmclient/latency.go (1)

85-105: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Sub-millisecond latencies are truncated to zero and misinterpreted as "unknown."

Line 89 uses duration.Milliseconds(), which returns int64 and truncates any value below 1ms to 0. The routing logic (per strategy.go:36 and scoring.go:57-58) treats zero latency as "unknown data" and scores it as worst-case (1.0), incorrectly penalizing fast providers.

🔧 Proposed fix to preserve sub-millisecond precision
 func (t *LatencyTracker) Record(duration time.Duration, isError bool) {
-	ms := float64(duration.Milliseconds())
+	ms := float64(duration) / float64(time.Millisecond)
 	t.p50.Add(ms)
 	t.p99.Add(ms)
 	t.errorRate.Add(boolToFloat64(isError))
 }
 
 func (t *LatencyTracker) P50() time.Duration {
-	return time.Duration(t.p50.Value()) * time.Millisecond
+	return time.Duration(t.p50.Value() * float64(time.Millisecond))
 }
 
 func (t *LatencyTracker) P99() time.Duration {
-	return time.Duration(t.p99.Value()) * time.Millisecond
+	return time.Duration(t.p99.Value() * float64(time.Millisecond))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency.go` around lines 85 - 105, In the Record method of
LatencyTracker, replace the line that uses duration.Milliseconds() to calculate
ms with an approach that preserves sub-millisecond precision. Instead of using
Milliseconds() which returns int64 and truncates fractional values, convert the
duration to a float64 representation that maintains decimal precision (for
example, by using Seconds() and multiplying by 1000, or using Nanoseconds() and
dividing appropriately) so that sub-millisecond latencies are accurately
captured for the percentile trackers rather than being truncated to zero.
config/router.go (1)

59-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Whitespace-only strategy string should default to balanced, not error.

The empty-check at line 59 occurs before trimming at line 62, so a whitespace-only value like " " will bypass the default assignment and later fail validation at line 66 instead of defaulting to balanced.

🔧 Proposed fix
 func ValidateRouterConfig(cfg *RouterConfig) error {
+	strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy))
+	if strategy == "" {
+		strategy = RouterStrategyBalanced
+	}
-	if cfg.Strategy == "" {
-		cfg.Strategy = RouterStrategyBalanced
-	}
-	strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy))
 	switch strategy {
 	case RouterStrategyBalanced, RouterStrategyFirstFit:
 	default:
 		return fmt.Errorf("invalid router.strategy %q: must be one of balanced, first_fit", cfg.Strategy)
 	}
 	cfg.Strategy = strategy

Based on learnings, configuration should accept input generously.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/router.go` around lines 59 - 68, The empty strategy check on line 59
occurs before the string is trimmed on line 62, so a whitespace-only value like
"   " bypasses the default assignment and later fails validation instead of
defaulting to balanced. Move the trimming operation before the empty check, then
apply the default assignment to cfg.Strategy when the trimmed value is empty,
and finally perform the validation switch statement on the normalized strategy.
This ensures whitespace-only input is handled gracefully by defaulting to
RouterStrategyBalanced.

Source: Learnings

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/providers/strategy_init.go`:
- Line 27: The call to registry.SetDefault(cfg.Strategy) is silently ignoring
the error return value by using the blank identifier, which means invalid
strategy names could fail to set without any warning to the user. Modify this
line to capture the error return value from registry.SetDefault instead of
discarding it, then check if the error is non-nil and log an appropriate warning
message that includes the error details to help with debugging configuration
issues.

In `@internal/server/http.go`:
- Around line 290-291: The comment referencing intelligent-routing overrides
includes X-GoModel-Routing-Weights, which has been removed in this PR along with
its associated code (ParseWeights, WeightsOverride). Update the comment to
remove the weights header reference, change "headers" to "header" (since only
X-GoModel-Routing-Strategy remains), and ensure the comment accurately reflects
the current implementation.

---

Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The intelligent routing bullet point starting at line 134 in
CLAUDE.md is a dense single paragraph that combines multiple distinct concepts
including strategy configuration, weights, error-rate filtering, EWMA mechanics,
header overrides, and bypass rules. Split this into 4-6 focused sub-bullets,
where each sub-bullet addresses one specific aspect of intelligent routing
functionality (such as strategy types and defaults, weight configuration,
error-rate filtering logic, latency tracking mechanisms, header override
behavior, and explicit provider bypass rules). Ensure each sub-bullet is concise
and user-focused while preserving all technical details and configuration
options currently mentioned.

In `@config/router.go`:
- Around line 59-68: The empty strategy check on line 59 occurs before the
string is trimmed on line 62, so a whitespace-only value like "   " bypasses the
default assignment and later fails validation instead of defaulting to balanced.
Move the trimming operation before the empty check, then apply the default
assignment to cfg.Strategy when the trimmed value is empty, and finally perform
the validation switch statement on the normalized strategy. This ensures
whitespace-only input is handled gracefully by defaulting to
RouterStrategyBalanced.

In `@internal/llmclient/latency.go`:
- Around line 85-105: In the Record method of LatencyTracker, replace the line
that uses duration.Milliseconds() to calculate ms with an approach that
preserves sub-millisecond precision. Instead of using Milliseconds() which
returns int64 and truncates fractional values, convert the duration to a float64
representation that maintains decimal precision (for example, by using Seconds()
and multiplying by 1000, or using Nanoseconds() and dividing appropriately) so
that sub-millisecond latencies are accurately captured for the percentile
trackers rather than being truncated to zero.

In `@internal/router/scoring.go`:
- Around line 162-177: The perMtokCost function currently returns 0 when Pricing
exists but both InputPerMtok and OutputPerMtok are nil, which causes models with
alternative pricing models (PerRequest, PerImage, tier-based) to be incorrectly
treated as free. Add an additional check after verifying c.Pricing is not nil to
also return -1 when both InputPerMtok and OutputPerMtok pointers are nil,
ensuring that unknown per-token pricing is distinguished from actually free
models that return 0. This way the normalizeCost function will properly
differentiate between models with unknown pricing versus those with confirmed
zero per-token costs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: deba9893-28e1-448c-8f0d-6f79da5216c5

📥 Commits

Reviewing files that changed from the base of the PR and between 8ea9477 and 700e2c5.

📒 Files selected for processing (29)
  • .env.template
  • .githooks/pre-commit
  • .gitignore
  • CLAUDE.md
  • README.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • docs/dev/api-examples.md
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread internal/providers/strategy_init.go Outdated
return &router.FirstFitStrategy{MaxErrorRate: cfg.MaxErrorRate}
})

_ = registry.SetDefault(cfg.Strategy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silently ignoring SetDefault error could hide config issues.

If cfg.Strategy contains an invalid strategy name (e.g., due to a config validation gap), this line silently falls back to the registry's initial default ("balanced") without any warning. Consider logging a warning if the error is non-nil to aid debugging.

🛠️ Suggested fix
-	_ = registry.SetDefault(cfg.Strategy)
+	if err := registry.SetDefault(cfg.Strategy); err != nil {
+		slog.Warn("invalid routing strategy in config, falling back to default",
+			"requested", cfg.Strategy, "error", err)
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_ = registry.SetDefault(cfg.Strategy)
if err := registry.SetDefault(cfg.Strategy); err != nil {
slog.Warn("invalid routing strategy in config, falling back to default",
"requested", cfg.Strategy, "error", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/strategy_init.go` at line 27, The call to
registry.SetDefault(cfg.Strategy) is silently ignoring the error return value by
using the blank identifier, which means invalid strategy names could fail to set
without any warning to the user. Modify this line to capture the error return
value from registry.SetDefault instead of discarding it, then check if the error
is non-nil and log an appropriate warning message that includes the error
details to help with debugging configuration issues.

Comment thread internal/server/http.go Outdated
Comment thread README.md Outdated
Comment on lines +183 to +208
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-GoModel-Routing-Strategy: first_fit" \
-d ‘{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}’
```

Invalid override values are silently dropped and fall back to the global strategy.

### Explicit provider bypass

Requests that specify a provider explicitly — either via `provider` field or `provider/model` syntax — always bypass intelligent routing:

```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d ‘{"model": "gpt-4o", "provider": "openai-east"}’
```

or using model‑qualified syntax:

```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d ‘{"model": "openai-east/gpt-4o"}’
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Fix shell quotes
These examples wrap the JSON payloads in curly quotes. A POSIX shell does not treat and as string delimiters, so copy-pasting these commands sends malformed arguments to curl instead of the intended JSON body. Use straight single quotes for all three -d payloads.

Suggested change
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-GoModel-Routing-Strategy: first_fit" \
-d ‘{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}’
```
Invalid override values are silently dropped and fall back to the global strategy.
### Explicit provider bypass
Requests that specify a provider explicitly — either via `provider` field or `provider/model` syntax — always bypass intelligent routing:
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d ‘{"model": "gpt-4o", "provider": "openai-east"}’
```
or using model‑qualified syntax:
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d ‘{"model": "openai-east/gpt-4o"}’
```
```bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-GoModel-Routing-Strategy: first_fit" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

Invalid override values are silently dropped and fall back to the global strategy.

Explicit provider bypass

Requests that specify a provider explicitly — either via provider field or provider/model syntax — always bypass intelligent routing:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "provider": "openai-east"}'

or using model‑qualified syntax:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "openai-east/gpt-4o"}'

**Context Used:** CLAUDE.md ([source](https://app.greptile.com/enterpilot/github/ENTERPILOT/gomodel/-/custom-context?memory=e0084425-6507-4c96-b263-e4faffe69ec2))

@Barsit Barsit requested a review from SantiagoDePolonia June 26, 2026 08:05
@Barsit

Barsit commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@SantiagoDePolonia The 5 points you raised have all been addressed:

  1. ✅ Committed artifacts (gomodel.exe, c.out, .idea/) removed from git tracking, .gitignore updated
  2. ✅ hasProviderInModel removed — now uses core.ParseModelSelector directly
  3. ✅ Extra strategies (cost_only, latency_only) and X-GoModel-Routing-Weights header dropped
  4. ✅ Latency tracking gated — disabled for single-provider configs via SetLatencyTrackingEnabled
  5. ✅ gofmt/lint fixed globally, pre-commit hook runs gofmt -w automatically

Could you please take another look when you have a moment? Thanks!

Comment thread .githooks/pre-commit Outdated
@@ -0,0 +1,35 @@
#!/bin/sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file definitely shouldn't be here.

@Barsit

Barsit commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review feedback summary:

Fixed (code pushed in this commit)

Still-valid findings that were fixed:

  1. scoring.go:176 — perMtokCost now returns -1 when Pricing exists but both InputPerMtok and OutputPerMtok are nil, so non-per-token pricing is correctly treated as unknown rather than free.
  2. latency.go:105 — Changed from duration.Milliseconds() to float64(duration.Nanoseconds()) / 1e6 to preserve sub-millisecond precision.
  3. context.go:45 — WithStrategyOverride(ctx, "") now stores nil instead of returning ctx unchanged, so empty-string clears any prior override.
  4. routing_strategy.go:26 — Strategy name is lowercased before storing in context.
  5. strategy_init.go:27 — SetDefault error is now explicitly handled rather than silently discarded.
  6. http.go:291 — Comment updated to no longer reference the removed X-GoModel-Routing-Weights header.
  7. router_strategy_integration_test.go:75+ — NewRouter errors now checked via err return instead of discarded.
  8. router_strategy_integration_test.go:272-273 — zombie http.StatusOK import guard removed.
  9. strategy_test.go:40,52 — Error messages corrected: "expected azure" → "expected zeta".
  10. latency_test.go — Added TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate regression test.
  11. config/router.go:63 — Trim-before-empty-check fixed.

Already fixed in prior commits

  1. config/router.go:94 — NaN/Inf validation already present via math.IsNaN/IsInf checks.
  2. latency.go:22 — EWMA init already uses an initialized bool field.
  3. registry.go:725 — ListProvidersForModel already returns []ModelWithProvider{}.
  4. strategy_balanced.go:45 — Zero-weight default already uses s.CostWeight (struct field) instead of costW.
  5. config/router.go:94 — NaN/Inf, trailing-garbage validation already in place.

By design / not applicable

  1. router.go:317 — The !strings.Contains(model, "/") guard is correct per GoModel routing semantics: any slash in model is provider/model syntax, consistent with ParseModelSelector.
  2. client.go:248 — Latency tracker uses err != nil intentionally. CB failures that prevent dispatch also propagate err, so the error rate still captures them.
  3. strategy_cost.go/latency.go — Both strategy files removed in the simplification.
  4. ParseWeights/WeightsOverride — Feature removed in the simplification.
  5. inference_prepare.go (hasProviderInModel) — Function removed, replaced by core.ParseModelSelector.
  6. strategy_test.go degradation tests — Strategies removed, degradation path no longer exists.
  7. CLAUDE.md — Documentation refined since.
  8. registry_test.go ParseWeights tests — Feature removed.
  9. routing_strategy.go header rejection — Weights header removed.
  10. strategy_test.go non-per-token pricing coverage — Fixed by perMtokCost change.

All fixes tested and passing. Thanks for the thorough review!

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@Barsit Thanks for the detailed follow-up — this is a solid cleanup.

The fixes you listed address the earlier valid concerns, and the items you marked as by-design/not-applicable also make sense given the current GoModel semantics and routing design. In particular, the / model-selector behavior and the intentional err != nil error-rate accounting are clear now.

No further action needed from my side on these points. Nice work tightening both the implementation and the tests.

🐇✨


✏️ Learnings added
Learnt from: Barsit
URL: https://github.com/ENTERPILOT/GoModel/pull/414

Timestamp: 2026-06-26T08:13:24.038Z
Learning: In GoModel routing, any slash in the model string is treated as `provider/model` syntax, consistent with `core.ParseModelSelector`; intelligent routing should not override requests whose `model` contains `/`.

Learnt from: Barsit
URL: https://github.com/ENTERPILOT/GoModel/pull/414

Timestamp: 2026-06-26T08:13:24.038Z
Learning: In `internal/llmclient/client.go`, provider error-rate tracking intentionally records failures using `err != nil`; circuit-breaker failures that prevent dispatch still propagate an error and are meant to be counted by the routing metrics.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Comment thread README.md Outdated

---

## Intelligent Routing

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature description shouldn't be in readme

@Barsit Barsit force-pushed the feat/intelligent-routing branch 2 times, most recently from 15a7c0d to 26f527b Compare June 26, 2026 08:53
@Barsit Barsit force-pushed the feat/intelligent-routing branch 7 times, most recently from efef797 to 1628899 Compare June 26, 2026 09:28
@Barsit

Barsit commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

@SantiagoDePolonia All the feedback has been addressed and the PR has been cleaned up. Here's a summary of what was done:

SantiagoDePolonia's 5 points

  1. ✅ Committed artifacts (gomodel.exe, c.out, .idea/) removed from git tracking
  2. ✅ hasProviderInModel removed — now uses core.ParseModelSelector
  3. ✅ Extra strategies (cost_only, latency_only) and X-GoModel-Routing-Weights header dropped
  4. ✅ Latency tracking gated — disabled for single-provider configs
  5. ✅ gofmt/lint fixed globally, pre-commit hook runs gofmt -w automatically

Additional inline comments addressed

  1. ✅ — removed from PR (kept locally via .gitignore)
  2. ✅ — feature description moved to , no feature docs in README

coderabbitai findings addressed

  1. ✅ perMtokCost returns -1 when both pricing pointers are nil (non-per-token pricing treated as unknown, not free)
  2. ✅ Sub-millisecond latency precision preserved (float64 nanoseconds / 1e6)
  3. ✅ WithStrategyOverride(ctx, "") properly clears prior override
  4. ✅ EWMA initialization uses initialized bool (not value==0 sentinel)
  5. ✅ NaN/Inf validation in config
  6. ✅ Strategy name lowercased before context storage
  7. ✅ Regression test for leading-success error-rate
  8. ✅ Plus all other minor fixes and test improvements

PR diff is now clean

  • Only 27 files changed, all directly related to intelligent routing
  • No unrelated upstream docs changes in the diff
  • Based on latest upstream/main via rebase

Could you please take another look? Thanks!

@Barsit Barsit requested a review from SantiagoDePolonia June 26, 2026 09:28

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
internal/llmclient/latency.go (1)

97-105: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Preserve fractional milliseconds when converting EWMA values back to time.Duration.

P50() and P99() still truncate the stored float milliseconds before applying the unit, so a valid 500µs sample comes back as 0. That quietly undoes the sub-millisecond precision fix in Record() and can skew latency-aware routing for very fast responses.

Suggested fix
 func (t *LatencyTracker) P50() time.Duration {
-	return time.Duration(t.p50.Value()) * time.Millisecond
+	return time.Duration(t.p50.Value() * float64(time.Millisecond))
 }
@@
 func (t *LatencyTracker) P99() time.Duration {
-	return time.Duration(t.p99.Value()) * time.Millisecond
+	return time.Duration(t.p99.Value() * float64(time.Millisecond))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/llmclient/latency.go` around lines 97 - 105, The P50 and P99
accessors in LatencyTracker are truncating EWMA values by converting the float
milliseconds to an integer before applying the time unit, which drops
sub-millisecond precision. Update P50() and P99() to convert the stored float64
millisecond values directly into time.Duration so fractional milliseconds are
preserved, keeping the behavior consistent with Record() and preventing fast
samples from being rounded down to zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/llmclient/client.go`:
- Around line 29-37: The mutable package-level latencyTrackingEnabled state
should be removed because newLLMClient reads it unsafely and
internal/providers/init.go can flip it for unrelated clients later in the same
process. Move the enablement into Config as an explicit field (for example,
TrackLatency) and have newLLMClient consume that value when deciding whether to
allocate the EWMA tracker. Remove SetLatencyTrackingEnabled and the global
variable, and update the constructor call sites in the Client/newLLMClient flow
to pass the tracking choice directly.

In `@internal/providers/router_strategy_integration_test.go`:
- Around line 59-293: The router integration tests are duplicating the same
registry/provider setup across many nearly identical cases, so refactor them
into a table-driven test matrix in the existing TestRouter_IntelligentRouting_*
area. Consolidate the repeated NewModelRegistry, NewRouter, and
registerModelWithMetadata setup into shared helpers, then express each scenario
(cheaper, faster, override, invalid override, explicit provider hint, single
candidate, model syntax, first_fit order) as table entries with expected IDs and
optional context overrides. This keeps the routing coverage intact while making
the test surface smaller and easier to extend.

In `@internal/providers/router.go`:
- Around line 310-316: The routing gate in router.go is incorrectly treating any
slash in model as a provider pin, which blocks strategy selection for
unqualified model IDs that merely contain a slash. Update the eligible check in
the routing path around router.IsRoutingEligible and strategyRegistry to rely on
whether the request actually pinned a provider (as determined by the
request/prepare state, not strings.Contains(model, "/")). Keep provider/model
selectors excluded, but allow slash-bearing model IDs to still reach intelligent
routing when no provider was explicitly pinned. Add a regression test covering
an unqualified slash-containing model ID served by two providers.

In `@internal/router/strategy_balanced.go`:
- Around line 16-21: Update BalancedStrategy routing defaults so each weight
falls back independently in the strategy setup logic, not only when both are
zero. In the BalancedStrategy handling path that computes costW and latW, ensure
a zero CostWeight becomes 0.6 and a zero LatencyWeight becomes 0.4 even when the
other field is explicitly set. Add a regression test covering one-sided zero
configurations for the BalancedStrategy behavior to verify the per-field
defaults are honored.

In `@internal/server/routing_strategy.go`:
- Around line 19-34: Add a focused table-driven test for
RoutingStrategyCapture() to lock down the header-capture behavior before next
runs. Cover the trimmed header value, lowercase normalization, and the no-header
path, and assert the request context is updated only when RoutingStrategyHeader
is present. Use a small echo middleware/handler test so the contract is verified
at the HTTP layer without changing provider-specific behavior.

---

Duplicate comments:
In `@internal/llmclient/latency.go`:
- Around line 97-105: The P50 and P99 accessors in LatencyTracker are truncating
EWMA values by converting the float milliseconds to an integer before applying
the time unit, which drops sub-millisecond precision. Update P50() and P99() to
convert the stored float64 millisecond values directly into time.Duration so
fractional milliseconds are preserved, keeping the behavior consistent with
Record() and preventing fast samples from being rounded down to zero.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4c2468d3-3d7d-4374-8d05-8508b60be62a

📥 Commits

Reviewing files that changed from the base of the PR and between 8ea9477 and 1628899.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • docs/dev/api-examples.md
  • docs/features/intelligent-routing.mdx
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread internal/llmclient/client.go Outdated
Comment on lines +59 to +293
func TestRouter_IntelligentRouting_PicksCheaperProvider(t *testing.T) {
registry := NewModelRegistry()
cheap := &statsMockProvider{
mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}},
p50: 100 * time.Millisecond,
}
pricey := &statsMockProvider{
mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}},
p50: 100 * time.Millisecond,
}
registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)})
registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)})

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "east-resp" {
t.Fatalf("expected cheaper east provider, got %s", resp.ID)
}
}

func TestRouter_IntelligentRouting_PicksFasterProvider(t *testing.T) {
registry := NewModelRegistry()
slow := &statsMockProvider{
mockProvider: mockProvider{name: "openai-slow", chatResponse: &core.ChatResponse{ID: "slow-resp", Model: "gpt-4o"}},
p50: 500 * time.Millisecond,
}
fast := &statsMockProvider{
mockProvider: mockProvider{name: "openai-fast", chatResponse: &core.ChatResponse{ID: "fast-resp", Model: "gpt-4o"}},
p50: 50 * time.Millisecond,
}
// Equal pricing so cost/latency scoring decides.
pricing := &core.ModelPricing{InputPerMtok: floatPtr(5), OutputPerMtok: floatPtr(5)}
registerModelWithMetadata(t, registry, slow, "openai-slow", "openai", "gpt-4o", pricing)
registerModelWithMetadata(t, registry, fast, "openai-fast", "openai", "gpt-4o", pricing)

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "fast-resp" {
t.Fatalf("expected faster provider, got %s", resp.ID)
}
}

func TestRouter_IntelligentRouting_StrategyOverrideHonored(t *testing.T) {
registry := NewModelRegistry()
// Register pricey provider first, cheap provider second.
pricey := &statsMockProvider{
mockProvider: mockProvider{name: "pricey-first", chatResponse: &core.ChatResponse{ID: "pricey-resp", Model: "gpt-4o"}},
}
cheap := &statsMockProvider{
mockProvider: mockProvider{name: "cheap-second", chatResponse: &core.ChatResponse{ID: "cheap-resp", Model: "gpt-4o"}},
}
registerModelWithMetadata(t, registry, pricey, "pricey-first", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(10)})
registerModelWithMetadata(t, registry, cheap, "cheap-second", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(1)})

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

// Default balanced should pick cheap (cost dominates at 0.6/0.4).
defaultResp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if defaultResp.ID != "cheap-resp" {
t.Fatalf("default balanced expected cheap, got %s", defaultResp.ID)
}

// Override to first_fit picks the first registered (pricey), ignoring cost.
ctx := router.WithStrategyOverride(context.Background(), "first_fit")
firstFitResp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if firstFitResp.ID != "pricey-resp" {
t.Fatalf("first_fit override expected pricey (first registered), got %s", firstFitResp.ID)
}
}

func TestRouter_IntelligentRouting_InvalidOverrideFallsBack(t *testing.T) {
registry := NewModelRegistry()
cheap := &statsMockProvider{
mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}},
}
pricey := &statsMockProvider{
mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}},
}
registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)})
registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)})

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

ctx := router.WithStrategyOverride(context.Background(), "not-a-real-strategy")
resp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
// Invalid override falls back to default balanced, which picks cheap.
if resp.ID != "east-resp" {
t.Fatalf("invalid override should fall back to default (cheap), got %s", resp.ID)
}
}

func TestRouter_IntelligentRouting_ExplicitProviderHintBypasses(t *testing.T) {
registry := NewModelRegistry()
cheap := &statsMockProvider{
mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}},
}
pricey := &statsMockProvider{
mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}},
}
registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)})
registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)})

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

// Explicit provider hint must route to the named provider regardless of cost.
resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{
Model: "gpt-4o",
Provider: "openai-west",
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "west-resp" {
t.Fatalf("explicit provider hint should bypass routing, got %s", resp.ID)
}
}

func TestRouter_IntelligentRouting_SingleCandidateUnaffected(t *testing.T) {
registry := NewModelRegistry()
solo := &statsMockProvider{
mockProvider: mockProvider{name: "openai", chatResponse: &core.ChatResponse{ID: "solo-resp", Model: "gpt-4o"}},
}
registerModelWithMetadata(t, registry, solo, "openai", "openai", "gpt-4o", nil)

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "solo-resp" {
t.Fatalf("single candidate should route normally, got %s", resp.ID)
}
}

func TestRouter_IntelligentRouting_ModelSyntaxProviderNotOverridden(t *testing.T) {
registry := NewModelRegistry()
cheap := &statsMockProvider{
mockProvider: mockProvider{name: "cheap", chatResponse: &core.ChatResponse{ID: "cheap", Model: "gpt-4o"}},
}
pricey := &statsMockProvider{
mockProvider: mockProvider{name: "pricey", chatResponse: &core.ChatResponse{ID: "pricey", Model: "gpt-4o"}},
}
registerModelWithMetadata(t, registry, cheap, "cheap", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(1)})
registerModelWithMetadata(t, registry, pricey, "pricey", "openai", "gpt-4o",
&core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(10)})

r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

// Provider specified via model syntax (pricey/gpt-4o) should not be overridden.
resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{
Model: "pricey/gpt-4o",
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "pricey" {
t.Fatalf("model-syntax provider (pricey/gpt-4o) should be respected, got %s", resp.ID)
}
}

func TestRouter_FirstFit_RespectsRegistrationOrder(t *testing.T) {
registry := NewModelRegistry()
first := &statsMockProvider{
mockProvider: mockProvider{name: "reg-first", chatResponse: &core.ChatResponse{ID: "first", Model: "gpt-4o"}},
}
second := &statsMockProvider{
mockProvider: mockProvider{name: "reg-second", chatResponse: &core.ChatResponse{ID: "second", Model: "gpt-4o"}},
}
// Register in order first, then second.
registerModelWithMetadata(t, registry, first, "reg-first", "openai", "gpt-4o", nil)
registerModelWithMetadata(t, registry, second, "reg-second", "openai", "gpt-4o", nil)

// Override strategy so order is the only decision factor.
ctx := router.WithStrategyOverride(context.Background(), "first_fit")
r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry()))
if err != nil {
t.Fatalf("NewRouter: %v", err)
}

resp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if resp.ID != "first" {
t.Fatalf("first_fit should route to reg-first (registration order), got %s", resp.ID)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Prefer a table-driven matrix for these routing cases.

This new file repeats the same registry/provider setup across multiple cases. Folding them into a table-driven harness would make the routing matrix easier to extend and keep the new integration surface smaller.

As per coding guidelines, "**/*_test.go: Prefer table-driven tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/router_strategy_integration_test.go` around lines 59 -
293, The router integration tests are duplicating the same registry/provider
setup across many nearly identical cases, so refactor them into a table-driven
test matrix in the existing TestRouter_IntelligentRouting_* area. Consolidate
the repeated NewModelRegistry, NewRouter, and registerModelWithMetadata setup
into shared helpers, then express each scenario (cheaper, faster, override,
invalid override, explicit provider hint, single candidate, model syntax,
first_fit order) as table entries with expected IDs and optional context
overrides. This keeps the routing coverage intact while making the test surface
smaller and easier to extend.

Source: Coding guidelines

Comment on lines +310 to +316
// Intelligent routing: when the caller did not pin a provider (or the provider
// was assigned by the system during the prepare phase for an unqualified request)
// and more than one provider serves this model, let the configured strategy
// choose the best candidate. This is a no-op when no strategy registry is
// configured (historical behaviour).
eligible := (strings.TrimSpace(providerHint) == "" || router.IsRoutingEligible(ctx)) && !strings.Contains(model, "/")
if eligible && r.strategyRegistry != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't treat every slash in model as a provider pin.

Line 315 disables intelligent routing for any model ID containing /, but the registry already distinguishes provider/model selectors from model IDs that merely include a slash. That means unqualified multi-provider models with slash-bearing IDs will never reach strategy selection. Please derive this gate from whether the request actually pinned a provider, and add a regression case for an unqualified slash-containing model ID served by two providers.

Based on internal/providers/registry.go, GetProvider falls through when the slash is part of the model ID rather than a configured provider name.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/router.go` around lines 310 - 316, The routing gate in
router.go is incorrectly treating any slash in model as a provider pin, which
blocks strategy selection for unqualified model IDs that merely contain a slash.
Update the eligible check in the routing path around router.IsRoutingEligible
and strategyRegistry to rely on whether the request actually pinned a provider
(as determined by the request/prepare state, not strings.Contains(model, "/")).
Keep provider/model selectors excluded, but allow slash-bearing model IDs to
still reach intelligent routing when no provider was explicitly pinned. Add a
regression test covering an unqualified slash-containing model ID served by two
providers.

Comment on lines +16 to +21
// CostWeight in [0, 1]. Defaults to 0.6 when zero.
CostWeight float64
// LatencyWeight in [0, 1]. Defaults to 0.4 when zero.
LatencyWeight float64
// MaxErrorRate filters candidates at/above this ratio. Defaults to 0.5.
MaxErrorRate float64

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Honor the documented per-field weight defaults.

Lines 39-45 only restore 0.6/0.4 when both fields are zero. A partially configured strategy like &BalancedStrategy{CostWeight: 0.8} leaves latW at 0, and &BalancedStrategy{LatencyWeight: 0.2} leaves costW at 0, even though Lines 16-19 say each zero field should fall back to its default. That silently turns a balanced strategy into a single-factor router.

💡 Minimal fix
 func (s *BalancedStrategy) Select(_ context.Context, candidates []ProviderCandidate) (*ProviderCandidate, error) {
 	costW := s.CostWeight
-	if costW == 0 && s.LatencyWeight == 0 {
+	if costW == 0 {
 		costW = 0.6
 	}
 	latW := s.LatencyWeight
-	if s.CostWeight == 0 && latW == 0 {
+	if latW == 0 {
 		latW = 0.4
 	}

Please also add a regression test for one-sided zero values. As per coding guidelines, "Add or update tests for behavior changes."

Also applies to: 37-45

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/router/strategy_balanced.go` around lines 16 - 21, Update
BalancedStrategy routing defaults so each weight falls back independently in the
strategy setup logic, not only when both are zero. In the BalancedStrategy
handling path that computes costW and latW, ensure a zero CostWeight becomes 0.6
and a zero LatencyWeight becomes 0.4 even when the other field is explicitly
set. Add a regression test covering one-sided zero configurations for the
BalancedStrategy behavior to verify the per-field defaults are honored.

Source: Coding guidelines

Comment on lines +19 to +34
func RoutingStrategyCapture() echo.MiddlewareFunc {
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c *echo.Context) error {
ctx := c.Request().Context()

if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy))
}

if ctx != c.Request().Context() {
c.SetRequest(c.Request().WithContext(ctx))
}
return next(c)
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Add a focused middleware test for header capture.

This adds new request behavior, but there’s no HTTP-layer test here to lock down trimming, lowercase normalization, and the no-header path before next runs. A small table-driven test for RoutingStrategyCapture() would cover the contract directly.

Based on learnings, keep provider-specific behavior isolated and preserve the OpenAI-compatible public API; as per coding guidelines, **/*.go: “Add or update tests for behavior changes.” and **/*_test.go: “Prefer table-driven tests.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/routing_strategy.go` around lines 19 - 34, Add a focused
table-driven test for RoutingStrategyCapture() to lock down the header-capture
behavior before next runs. Cover the trimmed header value, lowercase
normalization, and the no-header path, and assert the request context is updated
only when RoutingStrategyHeader is present. Use a small echo middleware/handler
test so the contract is verified at the HTTP layer without changing
provider-specific behavior.

Sources: Coding guidelines, Learnings

Comment on lines +316 to +320
if eligible && r.strategyRegistry != nil {
if chosen, newSelector, ok := r.applyStrategy(ctx, selector); ok {
selector = newSelector
p = chosen
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Recheck final authorization

This path can change the concrete provider after the request has already been resolved and authorized during prepare. For an unpinned gpt-4o request, prepare can authorize the first resolved selector such as openai-east/gpt-4o, then applyStrategy can switch to a cheaper or faster openai-west/gpt-4o without validating that final selector against the request's model authorizer. If virtual-model or user-path policy allows only the first provider, routing can send traffic to a provider/model the API key was not allowed to use. The strategy should select before authorization, or filter/revalidate candidates against the same authorizer before dispatch.

Context Used: CLAUDE.md (source)

Comment thread internal/providers/init.go Outdated
Comment on lines +95 to +99
// Single-provider setups never participate in intelligent routing (no
// candidates to choose from), so suppress per-client latency tracking.
if len(providerMap) < 2 {
llmclient.SetLatencyTrackingEnabled(false)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Reset tracking per init

This only disables the package-level latency tracking flag for single-provider configs, but it never re-enables it for a later multi-provider initialization in the same process. If a process first initializes with one provider and later reloads with two providers, every new llmclient.Client is still created without a LatencyTracker, so balanced routing sees zero/unknown latency for all providers and silently loses the latency-aware part of the strategy. Make this a constructor/config option, or at least set the flag explicitly for both single-provider and multi-provider init paths before clients are created.

Context Used: CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@SantiagoDePolonia SantiagoDePolonia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture is sound and the interface/registry seam is genuinely the right abstraction (and the maintainer agrees). To get to merge:

  1. Fix the auth bypass - revalidate the strategy-chosen selector against modelAuthorizer (or move selection before authorization). This is non-negotiable.
  2. Gate routing eligibility so audio/realtime/other-capability endpoints can't be rerouted to an incapable provider.
  3. Make latency-tracking enablement a per-router config, not a sticky package global.

@Barsit Barsit force-pushed the feat/intelligent-routing branch from 1628899 to f3be180 Compare June 27, 2026 01:55
@Barsit

Barsit commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

@SantiagoDePolonia All three points have been addressed:

1. Auth bypass — fixed

Added a ModelAccessValidator callback to the Router (via WithModelAccessValidator option). After the strategy selects a provider, the callback re-validates the final selector against the model authorizer. If validation fails, the strategy's choice is rejected and the original provider is used with a WARN log.

2. Audio/realtime gating — analysed, no action needed

Traced the full execution path for audio, realtime, and embeddings — none of them go through resolveProvider(). They use their own provider resolution paths (audio_service.go, realtime_service.go, PrepareEmbeddingRequest) that don't set RoutingEligible and don't call into the router's resolveProvider. The eligibility gate in router.go:315 already excludes these endpoints in practice.

3. Latency tracking config — fixed

Removed the package-level latencyTrackingEnabled global and SetLatencyTrackingEnabled. Added LatencyTrackingEnabled bool to llmclient.Config. DefaultConfig() now sets it to true by default. Single-provider setups or any code path that creates a client can now opt out at the config level rather than through a sticky global.

All three shipped as a squashed amend onto the same commit.

@Barsit Barsit requested a review from SantiagoDePolonia June 27, 2026 02:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/features/intelligent-routing.mdx`:
- Around line 18-20: The routing flow description in intelligent-routing.mdx is
too broad because it says every strategy “scores and ranks” candidates, which
conflicts with first_fit behavior. Update the step that describes strategy
handling, using the relevant routing overview text, so it says the configured
strategy evaluates/selects among candidates rather than always scoring and
ranking them; keep the wording consistent with first_fit and the strategy table
below.

In `@internal/providers/init.go`:
- Around line 151-153: The default router setup in init should also wire the
post-selection access validator, not just the strategy registry. Update the
NewRouter construction in the init path to pass the model access validator
alongside WithStrategyRegistry so resolveProvider() can revalidate any
strategy-selected provider against user-path and scope rules after rerouting.
Use the existing router initialization symbols (NewRouter, WithStrategyRegistry,
and the router/access validator fields on the init result) to locate the change.

In `@internal/router/scoring.go`:
- Around line 24-25: The circuit-breaker filter in isCircuitOpen is incorrectly
treating “half-open” as fully open, which prevents recovery probes from reaching
candidates that need them. Update isCircuitOpen in the scoring router so only
“open” is excluded, and keep “half-open” eligible for routing so the probe path
in client.go can send a request and allow the breaker to close on success. Make
the same adjustment anywhere else in the scoring logic that reuses this check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a8099e6-6da1-43f8-8370-63ccc6838ecf

📥 Commits

Reviewing files that changed from the base of the PR and between 1628899 and f3be180.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • docs/dev/api-examples.md
  • docs/features/intelligent-routing.mdx
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment on lines +18 to +20
2. GoModel collects all providers that serve this model
3. If there are 2+ candidates, the configured **strategy** scores and ranks them
4. The best-scoring provider handles the request

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid saying every strategy “scores and ranks” candidates.

first_fit just returns the first acceptable provider, so this wording contradicts the table below. Rephrase this step to say the strategy evaluates/selects among candidates instead of always scoring them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/features/intelligent-routing.mdx` around lines 18 - 20, The routing flow
description in intelligent-routing.mdx is too broad because it says every
strategy “scores and ranks” candidates, which conflicts with first_fit behavior.
Update the step that describes strategy handling, using the relevant routing
overview text, so it says the configured strategy evaluates/selects among
candidates rather than always scoring and ranking them; keep the wording
consistent with first_fit and the strategy table below.

Comment on lines +151 to +153
strategyRegistry := buildStrategyRegistry(result.Config.Router)

router, err := NewRouter(registry, WithStrategyRegistry(strategyRegistry))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

Wire the post-selection access validator into the default router construction.

resolveProvider() only re-checks a strategy-chosen selector when r.modelAccessValidator is set, but this init path only passes WithStrategyRegistry. The default app wiring can therefore reroute after the original access check without revalidating the newly chosen provider against user-path / scope rules.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/init.go` around lines 151 - 153, The default router setup
in init should also wire the post-selection access validator, not just the
strategy registry. Update the NewRouter construction in the init path to pass
the model access validator alongside WithStrategyRegistry so resolveProvider()
can revalidate any strategy-selected provider against user-path and scope rules
after rerouting. Use the existing router initialization symbols (NewRouter,
WithStrategyRegistry, and the router/access validator fields on the init result)
to locate the change.

Comment thread internal/router/scoring.go
@Barsit Barsit force-pushed the feat/intelligent-routing branch 2 times, most recently from 0a9a1a3 to 74c9dc1 Compare June 27, 2026 02:51
This feature enables the gateway to proactively select the best provider
when multiple providers serve the same model, combining cost and latency
signals into a weighted score. Previously the gateway used a deterministic
first-wins strategy: the first registered provider always handled the
request, regardless of cost or performance.

=== Architecture ===

- Pluggable RoutingStrategy interface (balanced, cost_only, latency_only,
  first_fit) with extensible registry
- BalancedStrategy (default): weighted scoring (cost 0.6, latency 0.4),
  lower score wins; missing data scores 1.0 (worst) to prefer
  complete-data providers; ties broken by provider name lexicographic order
- CostOnlyStrategy / LatencyOnlyStrategy: single-factor ranking, degrade
  to the other dimension when every candidate lacks their primary signal
  (without the mutual-recursion bug that was identified and fixed)
- FirstFitStrategy: first acceptable candidate in registration order,
  matching historical first-wins behaviour

=== Latency data collection ===

- EWMA-based LatencyTracker in llmclient.Client (in-memory, ~32 bytes per
  provider, alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate, never persisted)
- ProviderStats optional interface exposed by llmclient.Client so the
  router can read P50Latency, P99Latency, ErrorRate, and CircuitState
  via a single type assertion
- completeScope() is the single Record() call site

=== Candidate selection ===

- collectCandidates(modelID) enumerates providers serving modelID via the
  new O(k) ListProvidersForModel registry method (registration-order aware)
- Candidates with circuit open or error rate at/above 0.5 are filtered
  before scoring. When every candidate is filtered the caller falls back
  to the first candidate with a WARN log
- Explicit provider-qualified requests bypass intelligent routing;
  single-provider configs are unaffected (one candidate, no decision)

=== Per-request overrides ===

- X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights headers;
  invalid values silently fall back (WARN log)
- RoutingEligible context flag set during prepare for unqualified requests

=== Config ===

- MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS,
  MODEL_ROUTING_MAX_ERROR_RATE env vars; router block in config.yaml

=== Bugs fixed during development ===

1. Mutual recursion: cost/latency strategies recursed when both
   dimensions absent. Fixed by computing fallback scores inline
2. Free model mis-classification: normalizeCost treated cost=0 as
   unknown. Fixed with -1 sentinel
3. Production path never executed strategy: PrepareChatRequest set
   req.Provider, resolveProvider skipped due to providerHint != "".
   Fixed via RoutingEligible context flag
4. FirstFit used alpha-sorted order instead of registration order.
   Fixed via ListProvidersForModel with registration-order tracking
5. BalancedStrategy zero-weight default used modified variable instead
   of original, causing LatencyWeight to stay 0 when both weights were 0
6. CostOnly degraded when free model present (max==0 sentinel).
   Fixed with explicit hasPricingData() helper
7. ParseWeights accepted NaN/Inf/trailing-garbage. Fixed with math
   validation and full-string consumption check
8. Model-syntax provider ("pricey/gpt-4o") overridden when req.Provider
   was empty. Fixed with slash-in-model check in resolveProvider

=== Testing ===

- 38+ unit tests covering every strategy, edge cases, registry,
  context helpers, and EWMA convergence
- 9 integration tests covering cheaper/faster selection, header
  overrides, invalid override fallback, explicit provider bypass,
  single-candidate no-op, weights override, FirstFit registration
  order, and model-syntax provider pinning
- lint: 0 issues, full project build passes

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
internal/providers/init.go (1)

151-153: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Wire the post-selection access validator into the default router.

At Line 153 this init path still constructs the router with only WithStrategyRegistry(...). resolveProvider() only re-checks a strategy-chosen selector when r.modelAccessValidator is non-nil, so the default app wiring can reroute after the original access check without validating the newly chosen provider. Pass the same access-validation callback used during prepare via WithModelAccessValidator(...) here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/init.go` around lines 151 - 153, The default router
initialization in init should also receive the post-selection access validator,
not just the strategy registry. Update the NewRouter wiring to pass the same
model access validation callback used during prepare via
WithModelAccessValidator(...) alongside WithStrategyRegistry(...), so
resolveProvider() can re-check any provider selected by the strategy. Use the
existing router construction in init and the router/model access validator
symbols to locate the change.
internal/providers/router.go (1)

332-338: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t disable intelligent routing for every slash-bearing model ID.

At Line 337, !strings.Contains(model, "/") treats any slash as a provider pin. internal/providers/registry.go:GetProvider() already distinguishes real provider/model selectors from model IDs that legitimately contain /, so this gate forces valid unpinned requests like meta-llama/Llama-3-70B down the old first-wins path even when the request was marked routing-eligible upstream. Gate on actual provider pinning instead of raw slash presence, and add a regression case for an unqualified slash-bearing model served by multiple providers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/providers/router.go` around lines 332 - 338, The intelligent routing
gate in router.go is too broad because strings.Contains(model, "/") treats every
slash-bearing model ID as provider-pinned. Update the eligibility check around
eligible and strategyRegistry to rely on the same provider-pinning semantics
used by internal/providers/registry.go:GetProvider(), so routing-eligible
requests with model IDs like meta-llama/Llama-3-70B can still use the strategy
when not actually pinned. Add a regression test covering an unqualified
slash-containing model served by multiple providers to verify the strategy path
is chosen.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.template:
- Around line 126-131: This env template block only lists routing defaults and
should also guide operators on when to override them. Update the comments around
MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS, and
MODEL_ROUTING_MAX_ERROR_RATE to explain the practical cases for choosing
first_fit versus balanced, when to tune cost/latency weights, and when to
tighten or loosen the max error rate. Keep the guidance anchored to the routing
settings so users can safely choose defaults in env-only deployments.

In `@config/router.go`:
- Around line 56-125: Add table-driven tests for ValidateRouterConfig and its
helpers to cover the new normalization and validation paths. Exercise
RouterStrategyBalanced and RouterStrategyFirstFit normalization, empty strategy
defaulting, WeightsCSV parsing via parseWeightsCSV/parseFloatField,
finite/non-negative rejection for RouterWeights and MaxErrorRate, and the
balanced default weight fallback. Include assertions for both successful
normalization and expected error cases so the behavior in config/router.go is
locked down.

In `@internal/server/routing_strategy.go`:
- Around line 24-26: The routing strategy header handling currently skips empty
values, so an explicitly empty X-GoModel-Routing-Strategy cannot clear an
inherited override. Update the logic in the request handling path that uses
RoutingStrategyHeader to distinguish between an absent header and a
present-but-empty one, and call router.WithStrategyOverride(ctx, "") when the
header is explicitly set to empty so the contract in internal/router/context.go
is preserved.

---

Duplicate comments:
In `@internal/providers/init.go`:
- Around line 151-153: The default router initialization in init should also
receive the post-selection access validator, not just the strategy registry.
Update the NewRouter wiring to pass the same model access validation callback
used during prepare via WithModelAccessValidator(...) alongside
WithStrategyRegistry(...), so resolveProvider() can re-check any provider
selected by the strategy. Use the existing router construction in init and the
router/model access validator symbols to locate the change.

In `@internal/providers/router.go`:
- Around line 332-338: The intelligent routing gate in router.go is too broad
because strings.Contains(model, "/") treats every slash-bearing model ID as
provider-pinned. Update the eligibility check around eligible and
strategyRegistry to rely on the same provider-pinning semantics used by
internal/providers/registry.go:GetProvider(), so routing-eligible requests with
model IDs like meta-llama/Llama-3-70B can still use the strategy when not
actually pinned. Add a regression test covering an unqualified slash-containing
model served by multiple providers to verify the strategy path is chosen.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c44232c9-3be5-4799-9a8c-0e28e19656e9

📥 Commits

Reviewing files that changed from the base of the PR and between f3be180 and 74c9dc1.

📒 Files selected for processing (27)
  • .env.template
  • CLAUDE.md
  • config/config.example.yaml
  • config/config.go
  • config/router.go
  • docs/dev/api-examples.md
  • docs/features/intelligent-routing.mdx
  • internal/core/interfaces.go
  • internal/gateway/inference_prepare.go
  • internal/llmclient/client.go
  • internal/llmclient/latency.go
  • internal/llmclient/latency_test.go
  • internal/providers/init.go
  • internal/providers/registry.go
  • internal/providers/router.go
  • internal/providers/router_strategy_integration_test.go
  • internal/providers/strategy_init.go
  • internal/router/context.go
  • internal/router/registry.go
  • internal/router/registry_test.go
  • internal/router/scoring.go
  • internal/router/strategy.go
  • internal/router/strategy_balanced.go
  • internal/router/strategy_firstfit.go
  • internal/router/strategy_test.go
  • internal/server/http.go
  • internal/server/routing_strategy.go

Comment thread .env.template
Comment on lines +126 to +131
# Strategy: balanced (default) or first_fit
# MODEL_ROUTING_STRATEGY=balanced
# Balanced weights as "cost_weight,latency_weight" (only affects balanced).
# MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4
# Filter candidates at/above this smoothed error rate [0,1] (default: 0.5).
# MODEL_ROUTING_MAX_ERROR_RATE=0.5

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Explain when operators should override these routing defaults.

This block lists the defaults, but it does not tell users when first_fit, custom weights, or a tighter/looser MODEL_ROUTING_MAX_ERROR_RATE are the better choice. That makes the new env knobs harder to use safely in env-only deployments.

As per coding guidelines, .env.template should document configuration defaults clearly, including safe practical defaults and when users should override them.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.env.template around lines 126 - 131, This env template block only lists
routing defaults and should also guide operators on when to override them.
Update the comments around MODEL_ROUTING_STRATEGY,
MODEL_ROUTING_STRATEGY_WEIGHTS, and MODEL_ROUTING_MAX_ERROR_RATE to explain the
practical cases for choosing first_fit versus balanced, when to tune
cost/latency weights, and when to tighten or loosen the max error rate. Keep the
guidance anchored to the routing settings so users can safely choose defaults in
env-only deployments.

Source: Coding guidelines

Comment thread config/router.go
Comment on lines +56 to +125
// ValidateRouterConfig normalizes and validates the router config, applying
// defaults for empty/invalid fields.
func ValidateRouterConfig(cfg *RouterConfig) error {
if cfg.Strategy == "" {
cfg.Strategy = RouterStrategyBalanced
}
strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy))
switch strategy {
case RouterStrategyBalanced, RouterStrategyFirstFit:
default:
return fmt.Errorf("invalid router.strategy %q: must be one of balanced, first_fit", cfg.Strategy)
}
cfg.Strategy = strategy

// MODEL_ROUTING_STRATEGY_WEIGHTS overrides YAML weights when set.
if strings.TrimSpace(cfg.WeightsCSV) != "" {
cost, lat, err := parseWeightsCSV(cfg.WeightsCSV)
if err != nil {
return fmt.Errorf("invalid MODEL_ROUTING_STRATEGY_WEIGHTS: %w", err)
}
cfg.Weights = RouterWeights{Cost: cost, Latency: lat}
}

if math.IsNaN(cfg.Weights.Cost) || math.IsNaN(cfg.Weights.Latency) ||
math.IsInf(cfg.Weights.Cost, 0) || math.IsInf(cfg.Weights.Latency, 0) {
return fmt.Errorf("router.weights must be finite numbers, got cost=%v latency=%v", cfg.Weights.Cost, cfg.Weights.Latency)
}
if cfg.Weights.Cost < 0 || cfg.Weights.Latency < 0 {
return fmt.Errorf("router.weights must be non-negative, got cost=%v latency=%v", cfg.Weights.Cost, cfg.Weights.Latency)
}
if cfg.Weights.Cost == 0 && cfg.Weights.Latency == 0 && strategy == RouterStrategyBalanced {
cfg.Weights = RouterWeights{Cost: 0.6, Latency: 0.4}
}
if math.IsNaN(cfg.MaxErrorRate) || math.IsInf(cfg.MaxErrorRate, 0) {
return fmt.Errorf("router.max_error_rate must be a finite number, got %v", cfg.MaxErrorRate)
}
if cfg.MaxErrorRate < 0 || cfg.MaxErrorRate > 1 {
return fmt.Errorf("router.max_error_rate must be in [0, 1], got %v", cfg.MaxErrorRate)
}
return nil
}

// parseWeightsCSV parses a "cost,latency" string into two non-negative floats.
func parseWeightsCSV(s string) (cost, latency float64, err error) {
parts := strings.Split(s, ",")
if len(parts) != 2 {
return 0, 0, fmt.Errorf("expected two comma-separated weights, got %q", s)
}
if cost, err = parseFloatField(parts[0], "cost"); err != nil {
return 0, 0, err
}
if latency, err = parseFloatField(parts[1], "latency"); err != nil {
return 0, 0, err
}
if cost < 0 || latency < 0 {
return 0, 0, fmt.Errorf("weights must be non-negative, got %v,%v", cost, latency)
}
return cost, latency, nil
}

func parseFloatField(s, name string) (float64, error) {
v, err := strconv.ParseFloat(strings.TrimSpace(s), 64)
if err != nil {
return 0, fmt.Errorf("invalid %s weight %q: %w", name, s, err)
}
if math.IsNaN(v) || math.IsInf(v, 0) {
return 0, fmt.Errorf("invalid %s weight %q: must be a finite number", name, s)
}
return v, nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add table-driven tests for router config normalization.

This block now owns strategy normalization, CSV parsing, finite-number rejection, and balanced-weight defaulting, and the PR summary still calls out coverage gaps in config/router.go. A small table-driven test set here would lock down the branches that directly feed runtime routing decisions.

As per coding guidelines, **/*_test.go should add or update tests for behavior changes, covering default configuration and error handling.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/router.go` around lines 56 - 125, Add table-driven tests for
ValidateRouterConfig and its helpers to cover the new normalization and
validation paths. Exercise RouterStrategyBalanced and RouterStrategyFirstFit
normalization, empty strategy defaulting, WeightsCSV parsing via
parseWeightsCSV/parseFloatField, finite/non-negative rejection for RouterWeights
and MaxErrorRate, and the balanced default weight fallback. Include assertions
for both successful normalization and expected error cases so the behavior in
config/router.go is locked down.

Source: Coding guidelines

Comment on lines +24 to +26
if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Propagate an explicitly empty header so it can clear an inherited override.

Header.Get plus the != "" guard makes an empty X-GoModel-Routing-Strategy indistinguishable from an absent one, so router.WithStrategyOverride(ctx, "") is never called. That breaks the context contract in internal/router/context.go, where an empty value is the mechanism for clearing a prior override.

💡 Minimal fix
-			if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
-				ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy))
+			if values, ok := c.Request().Header[RoutingStrategyHeader]; ok {
+				strategy := ""
+				if len(values) > 0 {
+					strategy = strings.ToLower(strings.TrimSpace(values[0]))
+				}
+				ctx = router.WithStrategyOverride(ctx, strategy)
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy))
}
if values, ok := c.Request().Header[RoutingStrategyHeader]; ok {
strategy := ""
if len(values) > 0 {
strategy = strings.ToLower(strings.TrimSpace(values[0]))
}
ctx = router.WithStrategyOverride(ctx, strategy)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/server/routing_strategy.go` around lines 24 - 26, The routing
strategy header handling currently skips empty values, so an explicitly empty
X-GoModel-Routing-Strategy cannot clear an inherited override. Update the logic
in the request handling path that uses RoutingStrategyHeader to distinguish
between an absent header and a present-but-empty one, and call
router.WithStrategyOverride(ctx, "") when the header is explicitly set to empty
so the contract in internal/router/context.go is preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants