feat(router): intelligent cost-aware + latency-aware routing#414
Conversation
📝 WalkthroughWalkthroughAdds intelligent routing configuration, strategy selection, latency tracking, provider-resolution wiring, and request-header capture for per-request strategy overrides. ChangesIntelligent Routing
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Confidence Score: 3/5This should be fixed before merging.
internal/providers/router.go and internal/providers/init.go
|
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@config/router.go`:
- Around line 80-88: The current config validation for router weights and
max_error_rate checks for negative and out-of-range values, but does not reject
NaN or other non-finite numbers that strconv.ParseFloat can produce. Add checks
using math.IsNaN() or math.IsInf() (or a helper function) to validate that
cfg.Weights.Cost, cfg.Weights.Latency, and cfg.MaxErrorRate are finite numbers.
Include these finite-value checks alongside the existing negative and range
validation checks in the error return statements, and apply the same validation
logic to the similar checks mentioned at lines 110-115.
In `@internal/gateway/inference_prepare.go`:
- Around line 231-243: The hasProviderInModel function currently returns true
for any model string containing a slash, but it should distinguish between
actual provider prefixes like "deepseek/deepseek-v4-flash" and plain model IDs
containing slashes like "meta-llama/Llama-3-70b". Update the return statement to
add additional logic beyond the current firstSlash check to properly identify
whether the segment before the slash is an actual provider prefix or just part
of a model ID name. This will ensure plain model IDs with slashes are not
incorrectly marked as provider-pinned and can properly use intelligent routing.
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (e.g.,
TestLatencyTracker_LeadingSuccessesDoNotSkewErrorRate) after the existing
TestLatencyTracker_ErrorRateReflectsFailures test that verifies ErrorRate()
behavior when multiple successful samples are recorded before a single failure
occurs. The test should create a NewLatencyTracker, record several successful
samples by calling Record with appropriate parameters indicating success (false
or 0), then record one failure, and finally assert that ErrorRate() returns a
value that does not spike to 1.0 but remains below the routing cutoff threshold,
demonstrating that a single failure after many successes does not incorrectly
skew the error rate calculation.
In `@internal/llmclient/latency.go`:
- Around line 15-19: The ewma struct uses e.value == 0 to detect the first
sample, which incorrectly handles valid zero-valued samples (such as error rate
of 0 for successful requests). Add a boolean field to the ewma struct to
explicitly track whether the EWMA has been initialized. Then update any methods
that check e.value == 0 as an initialization condition (mentioned in lines
30-38) to instead check this new boolean flag. This ensures the first sample is
properly initialized regardless of its value.
In `@internal/providers/registry.go`:
- Around line 717-725: The ListProvidersForModel method documentation states it
returns an empty slice when the model is not found, but the current
implementation returns nil instead. Replace the nil return statement in the
early exit condition (when modelID is empty or no providers are registered) with
an empty slice of type ModelWithProvider to honor the documented API contract
and eliminate nil-shape ambiguity.
In `@internal/providers/router_strategy_integration_test.go`:
- Around line 176-202: The test
TestRouter_IntelligentRouting_ExplicitProviderHintBypasses currently only
validates provider pinning through the explicit Provider field, but does not
cover provider pinning through the model syntax format. Add a second
ChatCompletion call within the same test that uses Model: "openai-west/gpt-4o"
with Provider left empty to verify that explicit provider pins specified via the
model-qualified syntax are also respected by the routing strategy and not
overridden based on cost. Assert that the response ID is still "west-resp" to
confirm the expensive provider is selected when pinned through the model syntax,
ensuring the strategy does not bypass such explicit provider qualifications.
In `@internal/providers/router.go`:
- Around line 309-317: The eligible variable calculation on line 315 only checks
if providerHint is empty but fails to account for explicit provider
specifications using the provider/model syntax. When a user pins a provider by
including it in the model string (e.g., "provider/model"), that explicit choice
should prevent re-routing through the strategy registry. Modify the eligible
variable assignment to also check whether the model or selector contains an
explicit provider specification, and only mark requests as eligible for
re-routing when no provider is pinned either through providerHint or through the
provider/model syntax in the selector.
In `@internal/router/context.go`:
- Around line 35-46: The WithStrategyOverride function currently stores empty
strings in the context, but the documentation states that empty names should
clear any prior override. Modify the WithStrategyOverride function to check if
the strategy parameter is empty and, if so, delete the value from the context
(or return the original context unchanged) rather than storing an empty string.
This ensures that StrategyOverrideFromContext returns the expected (empty
string, false) when no override is set, matching the documented behavior.
In `@internal/router/registry_test.go`:
- Around line 68-75: The TestParseWeights_Invalid test function is missing
coverage for edge cases like trailing garbage and non-finite float values.
Expand the cases slice in the TestParseWeights_Invalid function to include
additional invalid weight strings such as strings with trailing garbage (e.g.,
"0.6x,0.4"), and non-finite float representations (e.g., "NaN,0.4", "Inf,0.4").
This ensures the ParseWeights function properly handles and rejects these
malformed inputs as part of its error handling path, making the parser behavior
regression-proof.
In `@internal/router/registry.go`:
- Around line 98-106: The weight parsing logic in the function that handles Cost
and Latency weights uses fmt.Sscanf which has two issues: it accepts non-finite
values like NaN and Inf, and it performs only partial input matching (e.g.,
"0.6foo" would be parsed as 0.6). To fix this, after parsing each weight value
with fmt.Sscanf for both Cost and Latency, verify that the entire trimmed string
was consumed by the parse operation and reject any non-finite values (NaN or
Inf) using math.IsNaN and math.IsInf checks. Add these validation checks
immediately after each fmt.Sscanf call before the existing non-negative
validation.
In `@internal/router/strategy_balanced.go`:
- Around line 39-45: The zero-value defaulting logic for balanced weights in the
strategy initialization is incorrect because the first condition modifies costW
to 0.6, which then causes the second condition to fail since it checks if costW
== 0. Fix this by changing the second condition from checking costW == 0 to
checking s.CostWeight == 0 instead, so it evaluates the original unmodified
value. This ensures that when both s.CostWeight and s.LatencyWeight are zero,
both costW and latW are properly set to their default values of 0.6 and 0.4
respectively.
In `@internal/router/strategy_cost.go`:
- Around line 33-37: The current implementation in the pricing strategy uses
`max == 0` from minMaxCost() as a sentinel to detect missing pricing data, but
this incorrectly treats legitimate zero-cost pricing as absent pricing and
degrades to latency-only ranking. Create a new helper function (e.g.,
hasPricingData) that directly checks whether pricing information is actually
present for the candidates, rather than inferring presence from numeric extrema.
Replace the condition `if _, max := minMaxCost(candidates); max == 0` with a
call to this new helper function to properly distinguish between zero cost and
missing cost data.
In `@internal/router/strategy_test.go`:
- Around line 115-124: The TestCostOnly_DegradesToLatencyWhenNoPricing test can
still pass via lexicographic tie-break even if the degradation fallback logic
regresses, because both the correct degradation and the lexicographic ordering
would select the same provider. Change the provider names in this test (and the
similar degradation test mentioned in the also-applies-to comment around line
139-148) so that the provider with better latency/lower cost comes
lexicographically after the other one, making fallback and tie-break produce
different winners and ensuring the test truly validates the degradation logic.
Additionally, add a new test for zero-value BalancedStrategy that verifies it
correctly defaults to 0.6/0.4 weight distribution when initialized without
explicit configuration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8563b06f-d85e-4bbe-8889-05a534a1c750
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gointernal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
Review findings assessmentFixed
Assessed as not actionable
|
d33b07f to
dcfce08
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
internal/llmclient/latency_test.go (1)
28-37: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd a regression test for “many successes, then one failure” error-rate behavior.
This specific path is still untested and is the one that previously regressed.
As per coding guidelines, `**/*_test.go` changes should include behavior-focused coverage for updated logic paths.Suggested test case
func TestLatencyTracker_ErrorRateReflectsFailures(t *testing.T) { tr := NewLatencyTracker() for i := 0; i < 5; i++ { tr.Record(10*time.Millisecond, i%2 == 0) // 3 errors of 5 } rate := tr.ErrorRate() if rate < 0.3 || rate > 0.9 { t.Fatalf("expected error rate near 0.4-0.6, got %v", rate) } } + +func TestLatencyTracker_ErrorRateLeadingSuccessesThenOneFailure(t *testing.T) { + tr := NewLatencyTracker() + for i := 0; i < 5; i++ { + tr.Record(10*time.Millisecond, false) + } + tr.Record(10*time.Millisecond, true) + if got := tr.ErrorRate(); got >= 0.5 { + t.Fatalf("expected error rate below 0.5 after one late failure, got %v", got) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/llmclient/latency_test.go` around lines 28 - 37, Add a new regression test function (separate from TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many successes followed by one failure" error-rate calculation path. This test should create a fresh LatencyTracker, record multiple successful calls (with the success parameter as true), then record a single failure, and verify that the error rate is calculated correctly for this specific scenario. This ensures the error rate logic handles the case of predominantly successful operations with a single failure at the end, which is the code path that previously regressed.Source: Coding guidelines
internal/router/context.go (1)
35-49:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEmpty strategy override does not actually clear inherited context override.
WithStrategyOverride(ctx, "")returnsctxunchanged, so any parent strategy override remains active despite the “empty clears override” contract. This can keep an unintended override in effect duringStrategyRegistry.Resolve.Suggested minimal fix
-import "context" +import ( + "context" + "strings" +) @@ func StrategyOverrideFromContext(ctx context.Context) (string, bool) { s, ok := ctx.Value(strategyCtxKey{}).(string) - return s, ok + if !ok { + return "", false + } + s = strings.TrimSpace(s) + if s == "" { + return "", false + } + return s, true } @@ func WithStrategyOverride(ctx context.Context, strategy string) context.Context { - if strategy == "" { - return ctx + strategy = strings.TrimSpace(strategy) + if strategy == "" { + // Shadow any inherited override as absent. + return context.WithValue(ctx, strategyCtxKey{}, nil) } return context.WithValue(ctx, strategyCtxKey{}, strategy) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/router/context.go` around lines 35 - 49, The WithStrategyOverride function returns the context unchanged when strategy is an empty string, which fails to clear inherited overrides from parent contexts. To fix this, remove the early return condition that checks if strategy is empty, and instead always execute the context.WithValue call regardless of whether strategy is empty or not. This ensures that an empty strategy value is properly set in the current context layer, effectively shadowing and overriding any parent context value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Line 134: The "Intelligent routing" documentation section on line 134 is a
single overly dense bullet containing multiple distinct concepts (routing
strategy defaults, configuration options, EWMA tracking details, header
overrides, and explicit provider behavior). Break this into 4-6 focused
sub-bullets, each addressing one specific topic: one for the routing strategy
configuration and defaults, one for the scoring weights, one for error rate
filtering, one for EWMA latency tracking mechanics, one for request header
overrides, and one for explicit provider-qualified request bypass behavior. This
will improve scannability and make it easier for operators to find and
understand each configuration option independently.
In `@config/router.go`:
- Around line 61-65: The empty-check for cfg.Strategy on line 61 occurs before
trimming and converting to lowercase on line 64, which means a whitespace-only
value like " " will pass the empty check and later fail validation instead of
defaulting to RouterStrategyBalanced. Move the strings.TrimSpace and
strings.ToLower operations before the empty-check for cfg.Strategy so that
whitespace-only values are properly caught and default to the balanced strategy
as intended.
In `@internal/llmclient/client.go`:
- Around line 231-234: The completeScope function has an inconsistency where the
latencyTracker.Record call uses err != nil to determine failure, but the circuit
breaker completion tracking uses cbErr. This causes routing error-rate telemetry
to miss failures when cbErr is set but err is nil. Update the
latencyTracker.Record call to check for failures consistently by accounting for
both err and cbErr, ensuring that any error tracked by the circuit breaker is
also reflected in the latency telemetry.
In `@internal/router/registry_test.go`:
- Around line 58-75: The TestParseWeights_Valid function only validates a single
canonical numeric format. Add additional test cases to this function to verify
that ParseWeights correctly handles equivalent numeric representations like
"0.70,0.30" and "7e-1,3e-1" which should all parse successfully and produce the
same Weight values (0.7 and 0.3 for Cost and Latency respectively). Update the
test to use a table-driven approach or add separate test cases for each format
variation to ensure regression-proof parser behavior.
In `@internal/router/registry.go`:
- Around line 101-113: The validation logic in the weight parsing code is overly
restrictive by comparing the original input strings (costStr and latStr) against
their formatted parsed values using fmt.Sprintf. This check rejects valid
numeric formats like 0.60 (which becomes 0.6) or scientific notation like 6e-1.
Since fmt.Sscanf has already successfully validated and parsed the input into a
float value, remove the string comparison checks that verify costStr matches
fmt.Sprintf("%v", cost) and latStr matches fmt.Sprintf("%v", lat). Keep only the
fmt.Sscanf validation and error handling, as it is sufficient to ensure the
input is a valid numeric weight.
In `@internal/server/routing_strategy.go`:
- Around line 33-35: The ParseWeights function in internal/router/registry.go is
performing canonical string matching after float parsing, which causes valid
decimal weight headers like "1.0,0.0" to be rejected. Modify the ParseWeights
function to be more tolerant of different valid decimal formats by normalizing
them during parsing instead of performing strict canonical matching. This aligns
with the principle to accept requests generously and adapt them according to
Postel's Law, allowing decimal values in various formats (like "1.0,0.0", "1,0",
etc.) to be accepted and properly parsed without silently ignoring valid input.
---
Duplicate comments:
In `@internal/llmclient/latency_test.go`:
- Around line 28-37: Add a new regression test function (separate from
TestLatencyTracker_ErrorRateReflectsFailures) to specifically test the "many
successes followed by one failure" error-rate calculation path. This test should
create a fresh LatencyTracker, record multiple successful calls (with the
success parameter as true), then record a single failure, and verify that the
error rate is calculated correctly for this specific scenario. This ensures the
error rate logic handles the case of predominantly successful operations with a
single failure at the end, which is the code path that previously regressed.
In `@internal/router/context.go`:
- Around line 35-49: The WithStrategyOverride function returns the context
unchanged when strategy is an empty string, which fails to clear inherited
overrides from parent contexts. To fix this, remove the early return condition
that checks if strategy is empty, and instead always execute the
context.WithValue call regardless of whether strategy is empty or not. This
ensures that an empty strategy value is properly set in the current context
layer, effectively shadowing and overriding any parent context value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 619e37f2-39dc-4d85-a0ff-6c5c614cbd21
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gointernal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) { | ||
| c.recordCircuitBreakerCompletion(statusCode, cbErr) | ||
| c.latencyTracker.Record(time.Since(scope.startedAt), err != nil) | ||
| c.finishRequest(scope, statusCode, err) |
There was a problem hiding this comment.
Error-rate telemetry can miss failures tracked by circuit-breaker path.
completeScope records routing error EWMA with err != nil, but CB accounting uses cbErr. If cbErr is set while err is nil, routing error-rate remains artificially low.
Suggested minimal alignment
func (c *Client) completeScope(scope requestScope, statusCode int, err, cbErr error) {
c.recordCircuitBreakerCompletion(statusCode, cbErr)
- c.latencyTracker.Record(time.Since(scope.startedAt), err != nil)
+ c.latencyTracker.Record(time.Since(scope.startedAt), err != nil || cbErr != nil)
c.finishRequest(scope, statusCode, err)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/llmclient/client.go` around lines 231 - 234, The completeScope
function has an inconsistency where the latencyTracker.Record call uses err !=
nil to determine failure, but the circuit breaker completion tracking uses
cbErr. This causes routing error-rate telemetry to miss failures when cbErr is
set but err is nil. Update the latencyTracker.Record call to check for failures
consistently by accounting for both err and cbErr, ensuring that any error
tracked by the circuit breaker is also reflected in the latency telemetry.
| func TestParseWeights_Valid(t *testing.T) { | ||
| w, err := ParseWeights("0.7,0.3") | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if w.Cost != 0.7 || w.Latency != 0.3 { | ||
| t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency) | ||
| } | ||
| } | ||
|
|
||
| func TestParseWeights_Invalid(t *testing.T) { | ||
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | ||
| for _, c := range cases { | ||
| if _, err := ParseWeights(c); err == nil { | ||
| t.Errorf("expected error for %q", c) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add positive parsing cases for equivalent numeric formats.
TestParseWeights_Valid currently validates only one canonical representation. Please add accepted-format cases like 0.70,0.30 and 7e-1,3e-1 so parser behavior is regression-proof after the ParseWeights fix.
💡 Minimal test update
func TestParseWeights_Valid(t *testing.T) {
- w, err := ParseWeights("0.7,0.3")
- if err != nil {
- t.Fatalf("unexpected error: %v", err)
- }
- if w.Cost != 0.7 || w.Latency != 0.3 {
- t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency)
+ cases := []string{
+ "0.7,0.3",
+ "0.70,0.30",
+ "7e-1,3e-1",
+ }
+ for _, tc := range cases {
+ w, err := ParseWeights(tc)
+ if err != nil {
+ t.Fatalf("unexpected error for %q: %v", tc, err)
+ }
+ if w.Cost != 0.7 || w.Latency != 0.3 {
+ t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency)
+ }
}
}As per coding guidelines, tests should be updated for behavior changes and parser error-handling boundaries.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestParseWeights_Valid(t *testing.T) { | |
| w, err := ParseWeights("0.7,0.3") | |
| if err != nil { | |
| t.Fatalf("unexpected error: %v", err) | |
| } | |
| if w.Cost != 0.7 || w.Latency != 0.3 { | |
| t.Fatalf("expected 0.7,0.3 got %v,%v", w.Cost, w.Latency) | |
| } | |
| } | |
| func TestParseWeights_Invalid(t *testing.T) { | |
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | |
| for _, c := range cases { | |
| if _, err := ParseWeights(c); err == nil { | |
| t.Errorf("expected error for %q", c) | |
| } | |
| } | |
| } | |
| func TestParseWeights_Valid(t *testing.T) { | |
| cases := []string{ | |
| "0.7,0.3", | |
| "0.70,0.30", | |
| "7e-1,3e-1", | |
| } | |
| for _, tc := range cases { | |
| w, err := ParseWeights(tc) | |
| if err != nil { | |
| t.Fatalf("unexpected error for %q: %v", tc, err) | |
| } | |
| if w.Cost != 0.7 || w.Latency != 0.3 { | |
| t.Fatalf("expected 0.7,0.3 for %q got %v,%v", tc, w.Cost, w.Latency) | |
| } | |
| } | |
| } | |
| func TestParseWeights_Invalid(t *testing.T) { | |
| cases := []string{"0.6", "0.6,0.4,0.2", "abc,0.4", "-1,0.4", "0.6x,0.4", "NaN,0.4", "Inf,0.4"} | |
| for _, c := range cases { | |
| if _, err := ParseWeights(c); err == nil { | |
| t.Errorf("expected error for %q", c) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/registry_test.go` around lines 58 - 75, The
TestParseWeights_Valid function only validates a single canonical numeric
format. Add additional test cases to this function to verify that ParseWeights
correctly handles equivalent numeric representations like "0.70,0.30" and
"7e-1,3e-1" which should all parse successfully and produce the same Weight
values (0.7 and 0.3 for Cost and Latency respectively). Update the test to use a
table-driven approach or add separate test cases for each format variation to
ensure regression-proof parser behavior.
Source: Coding guidelines
| if weights := strings.TrimSpace(c.Request().Header.Get(RoutingWeightsHeader)); weights != "" { | ||
| if override, err := router.ParseWeights(weights); err != nil { | ||
| slog.WarnContext(ctx, "invalid routing weights header, ignoring", |
There was a problem hiding this comment.
Valid decimal weight headers are being rejected in the override path.
Line 34 depends on router.ParseWeights, which currently performs canonical string matching after float parsing; values like 1.0,0.0 get treated as invalid and silently ignored. That breaks common operator input and conflicts with the “accept requests generously” contract.
Suggested fix (in internal/router/registry.go)
-import "fmt"
+import (
+ "fmt"
+ "strconv"
+)
- n, err := fmt.Sscanf(costStr, "%f", &cost)
- if err != nil || n != 1 {
+ cost, err := strconv.ParseFloat(costStr, 64)
+ if err != nil {
return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: parse error", costStr)
}
- if costStr != fmt.Sprintf("%v", cost) {
- return WeightsOverride{}, fmt.Errorf("invalid cost weight %q: trailing characters", parts[0])
- }
- n, err = fmt.Sscanf(latStr, "%f", &lat)
- if err != nil || n != 1 {
+ lat, err := strconv.ParseFloat(latStr, 64)
+ if err != nil {
return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: parse error", latStr)
}
- if latStr != fmt.Sprintf("%v", lat) {
- return WeightsOverride{}, fmt.Errorf("invalid latency weight %q: trailing characters", parts[1])
- }As per coding guidelines, **/*.go should “Accept requests generously … and adapt them”; based on learnings, this should follow Postel’s Law for tolerant request parsing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/routing_strategy.go` around lines 33 - 35, The ParseWeights
function in internal/router/registry.go is performing canonical string matching
after float parsing, which causes valid decimal weight headers like "1.0,0.0" to
be rejected. Modify the ParseWeights function to be more tolerant of different
valid decimal formats by normalizing them during parsing instead of performing
strict canonical matching. This aligns with the principle to accept requests
generously and adapt them according to Postel's Law, allowing decimal values in
various formats (like "1.0,0.0", "1,0", etc.) to be accepted and properly parsed
without silently ignoring valid input.
Sources: Coding guidelines, Learnings
d07a10e to
8ea9477
Compare
|
Additional fix pushed:
All previous findings (balanced zero-weight bug, free pricing degradation, NaN/Inf validation, EWMA init flag, model-syntax provider override, etc.) remain fixed in the squashed commit. |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
coverage-bailian (1)
1-36:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove generated test coverage artifact from version control.
The
coverage-bailianfile is a Go test coverage profile generated at test time viago test -coverprofile=...(per Makefile line 51). Generated artifacts should never be committed to version control because:
- Causes constant diffs: Each test run produces different coverage data (different lines executed, percentages change).
- Obscures real changes: Coverage artifacts clutter PR diffs and hide actual code changes from review.
- Violates twelve-factor principles: Generated build outputs belong in CI/CD artifact storage, not git history.
- Risk of merge conflicts: Multiple developers/CI runs will cause conflicting coverage changes.
Action: Remove this file from the repository and add it to
.gitignorealongside the standardcoverage.outand other generated test artifacts. If coverage tracking is needed, store baselines in CI/CD systems or external coverage dashboards (Codecov, Coveralls, etc.), not git.[major_issue]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@coverage-bailian` around lines 1 - 36, Remove the coverage-bailian file from version control as it is a generated test coverage artifact that should not be committed. Additionally, add coverage-bailian to the .gitignore file to prevent it from being tracked in future commits, grouping it with other generated coverage files like coverage.out and similar test artifacts. Use git to remove the file from tracking while keeping it locally if needed for local testing purposes.Source: Learnings
♻️ Duplicate comments (1)
CLAUDE.md (1)
134-134: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSplit this dense line into focused sub-bullets for scannability.
Line 134 packs routing strategy defaults, weight configuration, error-rate filtering, EWMA tracking mechanics, header overrides, and explicit provider bypass logic into one wrapped sentence. Breaking it into 4–6 sub-bullets will make operator usage much easier to understand and reference independently.
As per the coding guidelines, documentation "should be concise, practical, and user-focused." Scannable structure directly supports that goal.
Suggested structure
- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request overrides via request headers `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` (`cost,latency`); invalid header values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing. +- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model (single-provider configs unaffected). + - **Configuration:** `MODEL_ROUTING_STRATEGY` (`balanced` default; also `cost_only`, `latency_only`, `first_fit`) + - **Balanced weights:** `MODEL_ROUTING_STRATEGY_WEIGHTS` as `cost_weight,latency_weight` (default `0.6,0.4`; balanced strategy only) + - **Filtering:** `MODEL_ROUTING_MAX_ERROR_RATE` (default `0.5`); candidates at/above this smoothed error rate are excluded before scoring + - **Latency stats:** in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate); not persisted + - **Per-request overrides:** `X-GoModel-Routing-Strategy` and `X-GoModel-Routing-Weights` headers; invalid values are WARN-logged and fall back to global strategy + - **Bypass rule:** provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 134, The documentation for intelligent routing in CLAUDE.md contains a single very long sentence that covers multiple distinct configuration options and behaviors all at once, making it difficult for users to scan and reference individual features. Break this single dense bullet point into 4-6 separate focused sub-bullets, each covering one specific aspect: routing strategy options and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode (MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior (MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request header overrides and their fallback behavior, and explicit provider-qualified request bypass logic. Ensure each sub-bullet is concise and independently understandable while maintaining the technical accuracy of the original content.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/gateway/inference_prepare.go`:
- Around line 240-242: The comment preceding the return statement in this helper
function contradicts the actual behavior of the logic. The return condition
`firstSlash > 0 && firstSlash < len(m)-1` treats slash-style model IDs like
"meta-llama/Llama-3-70b" as provider-qualified (returns true), but the comment
claims these are not treated as provider-qualified. Update the comment to
accurately reflect that the function returns true when there is a valid provider
prefix followed by a model ID (i.e., when a slash exists that is neither at the
start nor at the end of the string).
In `@internal/llmclient/latency.go`:
- Around line 86-103: The Record method in LatencyTracker truncates
sub-millisecond latencies to zero when converting to milliseconds using
duration.Milliseconds(), which returns an integer. To preserve sub-millisecond
precision and avoid losing valid fast latency measurements, modify the
conversion to use floating point arithmetic without truncation. Convert the
duration to milliseconds as a float64 by either using duration.Seconds() and
multiplying by 1000, or by dividing duration.Nanoseconds() by 1e6, then pass
this precise value to the Add methods for t.p50 and t.p99.
In `@internal/providers/router_strategy_integration_test.go`:
- Line 75: The NewRouter constructor error is being discarded using the blank
identifier in the router initialization across multiple test locations. Replace
the blank identifier pattern with proper error handling by assigning the error
return value to a variable, then check if the error is not nil and immediately
fail the test using t.Fatalf with a descriptive message. Apply this same
error-checking pattern consistently at each NewRouter call site throughout the
integration test file to ensure setup failures are caught immediately rather
than causing misleading downstream assertion failures.
In `@internal/router/scoring.go`:
- Around line 165-176: The perMtokCost function currently returns an implicit 0
when Pricing exists but both InputPerMtok and OutputPerMtok are nil, which
incorrectly treats non-per-token pricing models as free. Add a check after
attempting to accumulate the token costs to return -1 if neither InputPerMtok
nor OutputPerMtok is present, ensuring unknown per-token pricing is treated as
unavailable rather than zero cost. Apply the same fix to the related code at
lines 201-213 that has similar logic.
In `@internal/router/strategy_test.go`:
- Around line 103-149: Add new test functions to cover the edge case where
candidates have pricing metadata but lack InputPerMtok and OutputPerMtok values.
Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.
In `@internal/server/routing_strategy.go`:
- Around line 29-31: After trimming whitespace from the RoutingStrategyHeader
using strings.TrimSpace, also normalize the strategy value to lowercase before
passing it to router.WithStrategyOverride(). This ensures that case variants
like Latency_Only are properly handled and not silently dropped later due to
case mismatches, making the request parsing more permissive and following
Postel's Law for tolerant header handling.
---
Outside diff comments:
In `@coverage-bailian`:
- Around line 1-36: Remove the coverage-bailian file from version control as it
is a generated test coverage artifact that should not be committed.
Additionally, add coverage-bailian to the .gitignore file to prevent it from
being tracked in future commits, grouping it with other generated coverage files
like coverage.out and similar test artifacts. Use git to remove the file from
tracking while keeping it locally if needed for local testing purposes.
---
Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The documentation for intelligent routing in CLAUDE.md contains a
single very long sentence that covers multiple distinct configuration options
and behaviors all at once, making it difficult for users to scan and reference
individual features. Break this single dense bullet point into 4-6 separate
focused sub-bullets, each covering one specific aspect: routing strategy options
and defaults (MODEL_ROUTING_STRATEGY), weight configuration for balanced mode
(MODEL_ROUTING_STRATEGY_WEIGHTS), error-rate filtering behavior
(MODEL_ROUTING_MAX_ERROR_RATE), EWMA latency tracking mechanics, per-request
header overrides and their fallback behavior, and explicit provider-qualified
request bypass logic. Ensure each sub-bullet is concise and independently
understandable while maintaining the technical accuracy of the original content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 381fa4cf-069d-4358-b9be-aa1adcfa84bb
⛔ Files ignored due to path filters (3)
c.outis excluded by!**/*.outcoverage-bailian.outis excluded by!**/*.outgomodel.exeis excluded by!**/*.exe
📒 Files selected for processing (36)
.env.template.gitignore.idea/.gitignore.idea/GoModel.iml.idea/go.imports.xml.idea/golinter.xml.idea/modules.xml.idea/vcs.xmlCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.gocoverage-bailiangomodel.exe~internal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_cost.gointernal/router/strategy_firstfit.gointernal/router/strategy_latency.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| // A provider prefix is a non-empty segment before a slash that is NOT itself | ||
| // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b"). | ||
| return firstSlash > 0 && firstSlash < len(m)-1 |
There was a problem hiding this comment.
Comment contradicts the helper’s actual behavior.
The comment says slash-style model IDs like meta-llama/Llama-3-70b are not treated as provider-qualified, but the return condition does treat that shape as true. Please align the comment with the implemented semantics (or adjust the logic if that comment reflects the real intent).
✏️ Minimal comment-only fix
- // A provider prefix is a non-empty segment before a slash that is NOT itself
- // a plain model-ID containing a slash (e.g. "meta-llama/Llama-3-70b").
+ // Treat any non-edge "x/y" shape as provider-qualified model syntax.
+ // This intentionally includes slash-style model IDs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/gateway/inference_prepare.go` around lines 240 - 242, The comment
preceding the return statement in this helper function contradicts the actual
behavior of the logic. The return condition `firstSlash > 0 && firstSlash <
len(m)-1` treats slash-style model IDs like "meta-llama/Llama-3-70b" as
provider-qualified (returns true), but the comment claims these are not treated
as provider-qualified. Update the comment to accurately reflect that the
function returns true when there is a valid provider prefix followed by a model
ID (i.e., when a slash exists that is neither at the start nor at the end of the
string).
| func TestCostOnly_PicksCheapest(t *testing.T) { | ||
| s := NewCostOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15), Latency: 10 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4), Latency: 500 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (cheapest), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestCostOnly_DegradesToLatencyWhenNoPricing(t *testing.T) { | ||
| s := NewCostOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Latency: 200 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Latency: 50 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (degraded to latency), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestLatencyOnly_PicksFastest(t *testing.T) { | ||
| s := NewLatencyOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(1, 1), Latency: 300 * time.Millisecond}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(50, 50), Latency: 50 * time.Millisecond}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (fastest), got %s", got.ProviderName) | ||
| } | ||
| } | ||
|
|
||
| func TestLatencyOnly_DegradesToCostWhenNoLatency(t *testing.T) { | ||
| s := NewLatencyOnlyStrategy() | ||
| cands := candidates( | ||
| ProviderCandidate{ProviderName: "alpha", Pricing: pricing(5, 15)}, | ||
| ProviderCandidate{ProviderName: "zeta", Pricing: pricing(1, 4)}, | ||
| ) | ||
| got := mustSelect(t, s, cands) | ||
| if got.ProviderName != "zeta" { | ||
| t.Fatalf("expected azure (degraded to cost), got %s", got.ProviderName) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add regression coverage for “non-per-token pricing” candidates.
There’s no test asserting that candidates with pricing metadata but without InputPerMtok/OutputPerMtok are treated as unknown for cost scoring. Please add one to prevent cost-only/balanced regressions in this edge case.
✅ Suggested test case
+func TestCostOnly_DegradesToLatencyWhenNoPerTokenPricing(t *testing.T) {
+ s := NewCostOnlyStrategy()
+ cands := candidates(
+ ProviderCandidate{
+ ProviderName: "alpha",
+ Pricing: &core.ModelPricing{PerRequest: floatPtr(1)}, // pricing exists, but not per-token
+ Latency: 200 * time.Millisecond,
+ },
+ ProviderCandidate{
+ ProviderName: "zeta",
+ Latency: 50 * time.Millisecond,
+ },
+ )
+ got := mustSelect(t, s, cands)
+ if got.ProviderName != "zeta" {
+ t.Fatalf("expected zeta (degraded to latency when per-token pricing absent), got %s", got.ProviderName)
+ }
+}As per coding guidelines, "Add or update tests for behavior changes in Go."
Also applies to: 164-197
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/strategy_test.go` around lines 103 - 149, Add new test
functions to cover the edge case where candidates have pricing metadata but lack
InputPerMtok and OutputPerMtok values. Create test cases for both
TestCostOnly_DegradesToLatencyWhenPricingHasNoPerTokenCost and
TestLatencyOnly_DegradesToCostWhenPricingHasNoPerTokenCost that use the pricing
helper function but verify the strategies properly degrade to their fallback
metrics (latency for cost-only strategy, cost for latency-only strategy) when
per-token pricing values are absent. This ensures the selection logic correctly
treats incomplete pricing data as missing pricing for scoring purposes.
Source: Coding guidelines
|
I think it's a good direction, but the PR is messy. There are files committed which shouldn't be there, and if you want it merged into GoModel it'll need some more work. Here are the findings worth improving:
Also heads up: some of the more advanced cost-optimization routing might end up in a pro / closed-source build down the line, so let's keep this core routing seam clean and pluggable rather than baking too much logic into it. |
8ea9477 to
2bfca7b
Compare
|
@SantiagoDePolonia 感谢审查,5 点均已修复: 1. 删除工件文件已从 git 追踪中删除 2. 复用 provider 解析逻辑删除了 3. 精简策略抽象
4. Latency tracking 条件控制
5. gofmt/lint
|
|
@SantiagoDePolonia All 5 points have been addressed: 1. Remove committed artifactsgomodel.exe, gomodel.exe~, c.out, coverage-bailian*, .idea/ -- removed from git tracking, added to .gitignore. 2. Don't reinvent provider parsingRemoved hasProviderInModel entirely. It was a duplicate of splitQualifiedModel with a misleading comment. Now uses core.ParseModelSelector directly. 3. Trim the abstraction
4. Gate latency tracking
5. gofmt/lint
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
CLAUDE.md (1)
134-134:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAddress the prior feedback: split line 134 into focused sub-bullets.
SantiagoDePolonia's previous review explicitly requested splitting this dense paragraph into 4-6 focused sub-bullets for improved scannability. The current line still packs multiple distinct concepts (strategy configuration, weights, error-rate filtering, EWMA mechanics, header overrides, bypass rules) into a single paragraph.
Per the coding guideline: "Documentation should be concise, practical, and user-focused. Show defaults, explain when to change them, and include minimal examples when useful."
✏️ Proposed refactoring (from prior review)
-- **Intelligent routing:** When a request does not pin a provider and multiple configured providers serve the same model ID, the gateway scores candidates and routes to the best one (single-provider configs are unaffected — one candidate means no decision). `MODEL_ROUTING_STRATEGY` (`balanced` default; also `first_fit`), `MODEL_ROUTING_STRATEGY_WEIGHTS` (`0.6,0.4` = cost_weight,latency_weight; balanced only), `MODEL_ROUTING_MAX_ERROR_RATE` (`0.5`; candidates at/above this smoothed error rate are filtered before scoring). Latency is tracked per provider as in-memory EWMA (alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate), never persisted. Per-request strategy override via `X-GoModel-Routing-Strategy` header; invalid values are silently dropped (WARN log) and fall back to the global strategy. Explicit provider-qualified requests (`openai-east/gpt-4o`) always bypass intelligent routing. +- **Intelligent routing:** enabled when request has no pinned provider and multiple providers serve the same model. + - **Defaults:** `MODEL_ROUTING_STRATEGY=balanced`, `MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4`, `MODEL_ROUTING_MAX_ERROR_RATE=0.5` + - **Filtering:** candidates at/above `max_error_rate` are excluded before scoring. + - **Latency stats:** in-memory EWMA only (not persisted). + - **Per-request overrides:** `X-GoModel-Routing-Strategy`; invalid values are ignored with WARN logs. + - **Bypass rule:** provider-qualified requests like `openai-east/gpt-4o` bypass intelligent routing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` at line 134, The intelligent routing bullet point starting at line 134 in CLAUDE.md is a dense single paragraph that combines multiple distinct concepts including strategy configuration, weights, error-rate filtering, EWMA mechanics, header overrides, and bypass rules. Split this into 4-6 focused sub-bullets, where each sub-bullet addresses one specific aspect of intelligent routing functionality (such as strategy types and defaults, weight configuration, error-rate filtering logic, latency tracking mechanisms, header override behavior, and explicit provider bypass rules). Ensure each sub-bullet is concise and user-focused while preserving all technical details and configuration options currently mentioned.Source: Coding guidelines
internal/router/scoring.go (1)
162-177:⚠️ Potential issue | 🟠 Major | ⚡ Quick winModels with non-per-token pricing are incorrectly scored as free.
When
Pricingexists but bothInputPerMtokandOutputPerMtokare nil (e.g., models with onlyPerRequest,PerImage, or tier-based pricing), line 176 returns0. ThenormalizeCostfunction (lines 71-89) treats0as the best score (free model), so these candidates are incorrectly selected over models with known per-token costs.🔧 Proposed fix to distinguish unknown per-token pricing from free models
func perMtokCost(c *ProviderCandidate) float64 { if c == nil || c.Pricing == nil { return -1 } + if c.Pricing.InputPerMtok == nil && c.Pricing.OutputPerMtok == nil { + return -1 + } var cost float64 if c.Pricing.InputPerMtok != nil { cost += *c.Pricing.InputPerMtok } if c.Pricing.OutputPerMtok != nil { cost += *c.Pricing.OutputPerMtok } return cost }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/router/scoring.go` around lines 162 - 177, The perMtokCost function currently returns 0 when Pricing exists but both InputPerMtok and OutputPerMtok are nil, which causes models with alternative pricing models (PerRequest, PerImage, tier-based) to be incorrectly treated as free. Add an additional check after verifying c.Pricing is not nil to also return -1 when both InputPerMtok and OutputPerMtok pointers are nil, ensuring that unknown per-token pricing is distinguished from actually free models that return 0. This way the normalizeCost function will properly differentiate between models with unknown pricing versus those with confirmed zero per-token costs.internal/llmclient/latency.go (1)
85-105:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSub-millisecond latencies are truncated to zero and misinterpreted as "unknown."
Line 89 uses
duration.Milliseconds(), which returnsint64and truncates any value below 1ms to 0. The routing logic (perstrategy.go:36andscoring.go:57-58) treats zero latency as "unknown data" and scores it as worst-case (1.0), incorrectly penalizing fast providers.🔧 Proposed fix to preserve sub-millisecond precision
func (t *LatencyTracker) Record(duration time.Duration, isError bool) { - ms := float64(duration.Milliseconds()) + ms := float64(duration) / float64(time.Millisecond) t.p50.Add(ms) t.p99.Add(ms) t.errorRate.Add(boolToFloat64(isError)) } func (t *LatencyTracker) P50() time.Duration { - return time.Duration(t.p50.Value()) * time.Millisecond + return time.Duration(t.p50.Value() * float64(time.Millisecond)) } func (t *LatencyTracker) P99() time.Duration { - return time.Duration(t.p99.Value()) * time.Millisecond + return time.Duration(t.p99.Value() * float64(time.Millisecond)) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/llmclient/latency.go` around lines 85 - 105, In the Record method of LatencyTracker, replace the line that uses duration.Milliseconds() to calculate ms with an approach that preserves sub-millisecond precision. Instead of using Milliseconds() which returns int64 and truncates fractional values, convert the duration to a float64 representation that maintains decimal precision (for example, by using Seconds() and multiplying by 1000, or using Nanoseconds() and dividing appropriately) so that sub-millisecond latencies are accurately captured for the percentile trackers rather than being truncated to zero.config/router.go (1)
59-68:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winWhitespace-only strategy string should default to balanced, not error.
The empty-check at line 59 occurs before trimming at line 62, so a whitespace-only value like
" "will bypass the default assignment and later fail validation at line 66 instead of defaulting tobalanced.🔧 Proposed fix
func ValidateRouterConfig(cfg *RouterConfig) error { + strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy)) + if strategy == "" { + strategy = RouterStrategyBalanced + } - if cfg.Strategy == "" { - cfg.Strategy = RouterStrategyBalanced - } - strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy)) switch strategy { case RouterStrategyBalanced, RouterStrategyFirstFit: default: return fmt.Errorf("invalid router.strategy %q: must be one of balanced, first_fit", cfg.Strategy) } cfg.Strategy = strategyBased on learnings, configuration should accept input generously.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config/router.go` around lines 59 - 68, The empty strategy check on line 59 occurs before the string is trimmed on line 62, so a whitespace-only value like " " bypasses the default assignment and later fails validation instead of defaulting to balanced. Move the trimming operation before the empty check, then apply the default assignment to cfg.Strategy when the trimmed value is empty, and finally perform the validation switch statement on the normalized strategy. This ensures whitespace-only input is handled gracefully by defaulting to RouterStrategyBalanced.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/providers/strategy_init.go`:
- Line 27: The call to registry.SetDefault(cfg.Strategy) is silently ignoring
the error return value by using the blank identifier, which means invalid
strategy names could fail to set without any warning to the user. Modify this
line to capture the error return value from registry.SetDefault instead of
discarding it, then check if the error is non-nil and log an appropriate warning
message that includes the error details to help with debugging configuration
issues.
In `@internal/server/http.go`:
- Around line 290-291: The comment referencing intelligent-routing overrides
includes X-GoModel-Routing-Weights, which has been removed in this PR along with
its associated code (ParseWeights, WeightsOverride). Update the comment to
remove the weights header reference, change "headers" to "header" (since only
X-GoModel-Routing-Strategy remains), and ensure the comment accurately reflects
the current implementation.
---
Duplicate comments:
In `@CLAUDE.md`:
- Line 134: The intelligent routing bullet point starting at line 134 in
CLAUDE.md is a dense single paragraph that combines multiple distinct concepts
including strategy configuration, weights, error-rate filtering, EWMA mechanics,
header overrides, and bypass rules. Split this into 4-6 focused sub-bullets,
where each sub-bullet addresses one specific aspect of intelligent routing
functionality (such as strategy types and defaults, weight configuration,
error-rate filtering logic, latency tracking mechanisms, header override
behavior, and explicit provider bypass rules). Ensure each sub-bullet is concise
and user-focused while preserving all technical details and configuration
options currently mentioned.
In `@config/router.go`:
- Around line 59-68: The empty strategy check on line 59 occurs before the
string is trimmed on line 62, so a whitespace-only value like " " bypasses the
default assignment and later fails validation instead of defaulting to balanced.
Move the trimming operation before the empty check, then apply the default
assignment to cfg.Strategy when the trimmed value is empty, and finally perform
the validation switch statement on the normalized strategy. This ensures
whitespace-only input is handled gracefully by defaulting to
RouterStrategyBalanced.
In `@internal/llmclient/latency.go`:
- Around line 85-105: In the Record method of LatencyTracker, replace the line
that uses duration.Milliseconds() to calculate ms with an approach that
preserves sub-millisecond precision. Instead of using Milliseconds() which
returns int64 and truncates fractional values, convert the duration to a float64
representation that maintains decimal precision (for example, by using Seconds()
and multiplying by 1000, or using Nanoseconds() and dividing appropriately) so
that sub-millisecond latencies are accurately captured for the percentile
trackers rather than being truncated to zero.
In `@internal/router/scoring.go`:
- Around line 162-177: The perMtokCost function currently returns 0 when Pricing
exists but both InputPerMtok and OutputPerMtok are nil, which causes models with
alternative pricing models (PerRequest, PerImage, tier-based) to be incorrectly
treated as free. Add an additional check after verifying c.Pricing is not nil to
also return -1 when both InputPerMtok and OutputPerMtok pointers are nil,
ensuring that unknown per-token pricing is distinguished from actually free
models that return 0. This way the normalizeCost function will properly
differentiate between models with unknown pricing versus those with confirmed
zero per-token costs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: deba9893-28e1-448c-8f0d-6f79da5216c5
📒 Files selected for processing (29)
.env.template.githooks/pre-commit.gitignoreCLAUDE.mdREADME.mdconfig/config.example.yamlconfig/config.goconfig/router.godocs/dev/api-examples.mdinternal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_firstfit.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| return &router.FirstFitStrategy{MaxErrorRate: cfg.MaxErrorRate} | ||
| }) | ||
|
|
||
| _ = registry.SetDefault(cfg.Strategy) |
There was a problem hiding this comment.
Silently ignoring SetDefault error could hide config issues.
If cfg.Strategy contains an invalid strategy name (e.g., due to a config validation gap), this line silently falls back to the registry's initial default ("balanced") without any warning. Consider logging a warning if the error is non-nil to aid debugging.
🛠️ Suggested fix
- _ = registry.SetDefault(cfg.Strategy)
+ if err := registry.SetDefault(cfg.Strategy); err != nil {
+ slog.Warn("invalid routing strategy in config, falling back to default",
+ "requested", cfg.Strategy, "error", err)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _ = registry.SetDefault(cfg.Strategy) | |
| if err := registry.SetDefault(cfg.Strategy); err != nil { | |
| slog.Warn("invalid routing strategy in config, falling back to default", | |
| "requested", cfg.Strategy, "error", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/providers/strategy_init.go` at line 27, The call to
registry.SetDefault(cfg.Strategy) is silently ignoring the error return value by
using the blank identifier, which means invalid strategy names could fail to set
without any warning to the user. Modify this line to capture the error return
value from registry.SetDefault instead of discarding it, then check if the error
is non-nil and log an appropriate warning message that includes the error
details to help with debugging configuration issues.
| ```bash | ||
| curl http://localhost:8080/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -H "X-GoModel-Routing-Strategy: first_fit" \ | ||
| -d ‘{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}’ | ||
| ``` | ||
|
|
||
| Invalid override values are silently dropped and fall back to the global strategy. | ||
|
|
||
| ### Explicit provider bypass | ||
|
|
||
| Requests that specify a provider explicitly — either via `provider` field or `provider/model` syntax — always bypass intelligent routing: | ||
|
|
||
| ```bash | ||
| curl http://localhost:8080/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d ‘{"model": "gpt-4o", "provider": "openai-east"}’ | ||
| ``` | ||
|
|
||
| or using model‑qualified syntax: | ||
|
|
||
| ```bash | ||
| curl http://localhost:8080/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -d ‘{"model": "openai-east/gpt-4o"}’ | ||
| ``` |
There was a problem hiding this comment.
Fix shell quotes
These examples wrap the JSON payloads in curly quotes. A POSIX shell does not treat ‘ and ’ as string delimiters, so copy-pasting these commands sends malformed arguments to curl instead of the intended JSON body. Use straight single quotes for all three -d payloads.
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -H "X-GoModel-Routing-Strategy: first_fit" \ | |
| -d ‘{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}’ | |
| ``` | |
| Invalid override values are silently dropped and fall back to the global strategy. | |
| ### Explicit provider bypass | |
| Requests that specify a provider explicitly — either via `provider` field or `provider/model` syntax — always bypass intelligent routing: | |
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d ‘{"model": "gpt-4o", "provider": "openai-east"}’ | |
| ``` | |
| or using model‑qualified syntax: | |
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -d ‘{"model": "openai-east/gpt-4o"}’ | |
| ``` | |
| ```bash | |
| curl http://localhost:8080/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -H "X-GoModel-Routing-Strategy: first_fit" \ | |
| -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}' |
Invalid override values are silently dropped and fall back to the global strategy.
Explicit provider bypass
Requests that specify a provider explicitly — either via provider field or provider/model syntax — always bypass intelligent routing:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "provider": "openai-east"}'or using model‑qualified syntax:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "openai-east/gpt-4o"}'
**Context Used:** CLAUDE.md ([source](https://app.greptile.com/enterpilot/github/ENTERPILOT/gomodel/-/custom-context?memory=e0084425-6507-4c96-b263-e4faffe69ec2))
|
@SantiagoDePolonia The 5 points you raised have all been addressed:
Could you please take another look when you have a moment? Thanks! |
| @@ -0,0 +1,35 @@ | |||
| #!/bin/sh | |||
There was a problem hiding this comment.
This file definitely shouldn't be here.
|
@coderabbitai review feedback summary: Fixed (code pushed in this commit)Still-valid findings that were fixed:
Already fixed in prior commits
By design / not applicable
All fixes tested and passing. Thanks for the thorough review! |
|
Tip For best results, initiate chat on the files or code changes.
The fixes you listed address the earlier valid concerns, and the items you marked as by-design/not-applicable also make sense given the current GoModel semantics and routing design. In particular, the No further action needed from my side on these points. Nice work tightening both the implementation and the tests. 🐇✨ ✏️ Learnings added
If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
|
|
||
| --- | ||
|
|
||
| ## Intelligent Routing |
There was a problem hiding this comment.
The feature description shouldn't be in readme
15a7c0d to
26f527b
Compare
efef797 to
1628899
Compare
|
@SantiagoDePolonia All the feedback has been addressed and the PR has been cleaned up. Here's a summary of what was done: SantiagoDePolonia's 5 points
Additional inline comments addressed
coderabbitai findings addressed
PR diff is now clean
Could you please take another look? Thanks! |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
internal/llmclient/latency.go (1)
97-105: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winPreserve fractional milliseconds when converting EWMA values back to
time.Duration.
P50()andP99()still truncate the stored float milliseconds before applying the unit, so a valid500µssample comes back as0. That quietly undoes the sub-millisecond precision fix inRecord()and can skew latency-aware routing for very fast responses.Suggested fix
func (t *LatencyTracker) P50() time.Duration { - return time.Duration(t.p50.Value()) * time.Millisecond + return time.Duration(t.p50.Value() * float64(time.Millisecond)) } @@ func (t *LatencyTracker) P99() time.Duration { - return time.Duration(t.p99.Value()) * time.Millisecond + return time.Duration(t.p99.Value() * float64(time.Millisecond)) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/llmclient/latency.go` around lines 97 - 105, The P50 and P99 accessors in LatencyTracker are truncating EWMA values by converting the float milliseconds to an integer before applying the time unit, which drops sub-millisecond precision. Update P50() and P99() to convert the stored float64 millisecond values directly into time.Duration so fractional milliseconds are preserved, keeping the behavior consistent with Record() and preventing fast samples from being rounded down to zero.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/llmclient/client.go`:
- Around line 29-37: The mutable package-level latencyTrackingEnabled state
should be removed because newLLMClient reads it unsafely and
internal/providers/init.go can flip it for unrelated clients later in the same
process. Move the enablement into Config as an explicit field (for example,
TrackLatency) and have newLLMClient consume that value when deciding whether to
allocate the EWMA tracker. Remove SetLatencyTrackingEnabled and the global
variable, and update the constructor call sites in the Client/newLLMClient flow
to pass the tracking choice directly.
In `@internal/providers/router_strategy_integration_test.go`:
- Around line 59-293: The router integration tests are duplicating the same
registry/provider setup across many nearly identical cases, so refactor them
into a table-driven test matrix in the existing TestRouter_IntelligentRouting_*
area. Consolidate the repeated NewModelRegistry, NewRouter, and
registerModelWithMetadata setup into shared helpers, then express each scenario
(cheaper, faster, override, invalid override, explicit provider hint, single
candidate, model syntax, first_fit order) as table entries with expected IDs and
optional context overrides. This keeps the routing coverage intact while making
the test surface smaller and easier to extend.
In `@internal/providers/router.go`:
- Around line 310-316: The routing gate in router.go is incorrectly treating any
slash in model as a provider pin, which blocks strategy selection for
unqualified model IDs that merely contain a slash. Update the eligible check in
the routing path around router.IsRoutingEligible and strategyRegistry to rely on
whether the request actually pinned a provider (as determined by the
request/prepare state, not strings.Contains(model, "/")). Keep provider/model
selectors excluded, but allow slash-bearing model IDs to still reach intelligent
routing when no provider was explicitly pinned. Add a regression test covering
an unqualified slash-containing model ID served by two providers.
In `@internal/router/strategy_balanced.go`:
- Around line 16-21: Update BalancedStrategy routing defaults so each weight
falls back independently in the strategy setup logic, not only when both are
zero. In the BalancedStrategy handling path that computes costW and latW, ensure
a zero CostWeight becomes 0.6 and a zero LatencyWeight becomes 0.4 even when the
other field is explicitly set. Add a regression test covering one-sided zero
configurations for the BalancedStrategy behavior to verify the per-field
defaults are honored.
In `@internal/server/routing_strategy.go`:
- Around line 19-34: Add a focused table-driven test for
RoutingStrategyCapture() to lock down the header-capture behavior before next
runs. Cover the trimmed header value, lowercase normalization, and the no-header
path, and assert the request context is updated only when RoutingStrategyHeader
is present. Use a small echo middleware/handler test so the contract is verified
at the HTTP layer without changing provider-specific behavior.
---
Duplicate comments:
In `@internal/llmclient/latency.go`:
- Around line 97-105: The P50 and P99 accessors in LatencyTracker are truncating
EWMA values by converting the float milliseconds to an integer before applying
the time unit, which drops sub-millisecond precision. Update P50() and P99() to
convert the stored float64 millisecond values directly into time.Duration so
fractional milliseconds are preserved, keeping the behavior consistent with
Record() and preventing fast samples from being rounded down to zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4c2468d3-3d7d-4374-8d05-8508b60be62a
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.godocs/dev/api-examples.mddocs/features/intelligent-routing.mdxinternal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_firstfit.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| func TestRouter_IntelligentRouting_PicksCheaperProvider(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| cheap := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}}, | ||
| p50: 100 * time.Millisecond, | ||
| } | ||
| pricey := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}}, | ||
| p50: 100 * time.Millisecond, | ||
| } | ||
| registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)}) | ||
| registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)}) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "east-resp" { | ||
| t.Fatalf("expected cheaper east provider, got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_PicksFasterProvider(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| slow := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-slow", chatResponse: &core.ChatResponse{ID: "slow-resp", Model: "gpt-4o"}}, | ||
| p50: 500 * time.Millisecond, | ||
| } | ||
| fast := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-fast", chatResponse: &core.ChatResponse{ID: "fast-resp", Model: "gpt-4o"}}, | ||
| p50: 50 * time.Millisecond, | ||
| } | ||
| // Equal pricing so cost/latency scoring decides. | ||
| pricing := &core.ModelPricing{InputPerMtok: floatPtr(5), OutputPerMtok: floatPtr(5)} | ||
| registerModelWithMetadata(t, registry, slow, "openai-slow", "openai", "gpt-4o", pricing) | ||
| registerModelWithMetadata(t, registry, fast, "openai-fast", "openai", "gpt-4o", pricing) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "fast-resp" { | ||
| t.Fatalf("expected faster provider, got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_StrategyOverrideHonored(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| // Register pricey provider first, cheap provider second. | ||
| pricey := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "pricey-first", chatResponse: &core.ChatResponse{ID: "pricey-resp", Model: "gpt-4o"}}, | ||
| } | ||
| cheap := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "cheap-second", chatResponse: &core.ChatResponse{ID: "cheap-resp", Model: "gpt-4o"}}, | ||
| } | ||
| registerModelWithMetadata(t, registry, pricey, "pricey-first", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(10)}) | ||
| registerModelWithMetadata(t, registry, cheap, "cheap-second", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(1)}) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| // Default balanced should pick cheap (cost dominates at 0.6/0.4). | ||
| defaultResp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if defaultResp.ID != "cheap-resp" { | ||
| t.Fatalf("default balanced expected cheap, got %s", defaultResp.ID) | ||
| } | ||
|
|
||
| // Override to first_fit picks the first registered (pricey), ignoring cost. | ||
| ctx := router.WithStrategyOverride(context.Background(), "first_fit") | ||
| firstFitResp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if firstFitResp.ID != "pricey-resp" { | ||
| t.Fatalf("first_fit override expected pricey (first registered), got %s", firstFitResp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_InvalidOverrideFallsBack(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| cheap := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}}, | ||
| } | ||
| pricey := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}}, | ||
| } | ||
| registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)}) | ||
| registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)}) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| ctx := router.WithStrategyOverride(context.Background(), "not-a-real-strategy") | ||
| resp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| // Invalid override falls back to default balanced, which picks cheap. | ||
| if resp.ID != "east-resp" { | ||
| t.Fatalf("invalid override should fall back to default (cheap), got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_ExplicitProviderHintBypasses(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| cheap := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-east", chatResponse: &core.ChatResponse{ID: "east-resp", Model: "gpt-4o"}}, | ||
| } | ||
| pricey := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai-west", chatResponse: &core.ChatResponse{ID: "west-resp", Model: "gpt-4o"}}, | ||
| } | ||
| registerModelWithMetadata(t, registry, cheap, "openai-east", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(2)}) | ||
| registerModelWithMetadata(t, registry, pricey, "openai-west", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(20)}) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| // Explicit provider hint must route to the named provider regardless of cost. | ||
| resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "gpt-4o", | ||
| Provider: "openai-west", | ||
| }) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "west-resp" { | ||
| t.Fatalf("explicit provider hint should bypass routing, got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_SingleCandidateUnaffected(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| solo := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "openai", chatResponse: &core.ChatResponse{ID: "solo-resp", Model: "gpt-4o"}}, | ||
| } | ||
| registerModelWithMetadata(t, registry, solo, "openai", "openai", "gpt-4o", nil) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "solo-resp" { | ||
| t.Fatalf("single candidate should route normally, got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_IntelligentRouting_ModelSyntaxProviderNotOverridden(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| cheap := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "cheap", chatResponse: &core.ChatResponse{ID: "cheap", Model: "gpt-4o"}}, | ||
| } | ||
| pricey := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "pricey", chatResponse: &core.ChatResponse{ID: "pricey", Model: "gpt-4o"}}, | ||
| } | ||
| registerModelWithMetadata(t, registry, cheap, "cheap", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(1), OutputPerMtok: floatPtr(1)}) | ||
| registerModelWithMetadata(t, registry, pricey, "pricey", "openai", "gpt-4o", | ||
| &core.ModelPricing{InputPerMtok: floatPtr(10), OutputPerMtok: floatPtr(10)}) | ||
|
|
||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| // Provider specified via model syntax (pricey/gpt-4o) should not be overridden. | ||
| resp, err := r.ChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "pricey/gpt-4o", | ||
| }) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "pricey" { | ||
| t.Fatalf("model-syntax provider (pricey/gpt-4o) should be respected, got %s", resp.ID) | ||
| } | ||
| } | ||
|
|
||
| func TestRouter_FirstFit_RespectsRegistrationOrder(t *testing.T) { | ||
| registry := NewModelRegistry() | ||
| first := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "reg-first", chatResponse: &core.ChatResponse{ID: "first", Model: "gpt-4o"}}, | ||
| } | ||
| second := &statsMockProvider{ | ||
| mockProvider: mockProvider{name: "reg-second", chatResponse: &core.ChatResponse{ID: "second", Model: "gpt-4o"}}, | ||
| } | ||
| // Register in order first, then second. | ||
| registerModelWithMetadata(t, registry, first, "reg-first", "openai", "gpt-4o", nil) | ||
| registerModelWithMetadata(t, registry, second, "reg-second", "openai", "gpt-4o", nil) | ||
|
|
||
| // Override strategy so order is the only decision factor. | ||
| ctx := router.WithStrategyOverride(context.Background(), "first_fit") | ||
| r, err := NewRouter(registry, WithStrategyRegistry(router.NewStrategyRegistry())) | ||
| if err != nil { | ||
| t.Fatalf("NewRouter: %v", err) | ||
| } | ||
|
|
||
| resp, err := r.ChatCompletion(ctx, &core.ChatRequest{Model: "gpt-4o"}) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if resp.ID != "first" { | ||
| t.Fatalf("first_fit should route to reg-first (registration order), got %s", resp.ID) | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Prefer a table-driven matrix for these routing cases.
This new file repeats the same registry/provider setup across multiple cases. Folding them into a table-driven harness would make the routing matrix easier to extend and keep the new integration surface smaller.
As per coding guidelines, "**/*_test.go: Prefer table-driven tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/providers/router_strategy_integration_test.go` around lines 59 -
293, The router integration tests are duplicating the same registry/provider
setup across many nearly identical cases, so refactor them into a table-driven
test matrix in the existing TestRouter_IntelligentRouting_* area. Consolidate
the repeated NewModelRegistry, NewRouter, and registerModelWithMetadata setup
into shared helpers, then express each scenario (cheaper, faster, override,
invalid override, explicit provider hint, single candidate, model syntax,
first_fit order) as table entries with expected IDs and optional context
overrides. This keeps the routing coverage intact while making the test surface
smaller and easier to extend.
Source: Coding guidelines
| // Intelligent routing: when the caller did not pin a provider (or the provider | ||
| // was assigned by the system during the prepare phase for an unqualified request) | ||
| // and more than one provider serves this model, let the configured strategy | ||
| // choose the best candidate. This is a no-op when no strategy registry is | ||
| // configured (historical behaviour). | ||
| eligible := (strings.TrimSpace(providerHint) == "" || router.IsRoutingEligible(ctx)) && !strings.Contains(model, "/") | ||
| if eligible && r.strategyRegistry != nil { |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Don't treat every slash in model as a provider pin.
Line 315 disables intelligent routing for any model ID containing /, but the registry already distinguishes provider/model selectors from model IDs that merely include a slash. That means unqualified multi-provider models with slash-bearing IDs will never reach strategy selection. Please derive this gate from whether the request actually pinned a provider, and add a regression case for an unqualified slash-containing model ID served by two providers.
Based on internal/providers/registry.go, GetProvider falls through when the slash is part of the model ID rather than a configured provider name.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/providers/router.go` around lines 310 - 316, The routing gate in
router.go is incorrectly treating any slash in model as a provider pin, which
blocks strategy selection for unqualified model IDs that merely contain a slash.
Update the eligible check in the routing path around router.IsRoutingEligible
and strategyRegistry to rely on whether the request actually pinned a provider
(as determined by the request/prepare state, not strings.Contains(model, "/")).
Keep provider/model selectors excluded, but allow slash-bearing model IDs to
still reach intelligent routing when no provider was explicitly pinned. Add a
regression test covering an unqualified slash-containing model ID served by two
providers.
| // CostWeight in [0, 1]. Defaults to 0.6 when zero. | ||
| CostWeight float64 | ||
| // LatencyWeight in [0, 1]. Defaults to 0.4 when zero. | ||
| LatencyWeight float64 | ||
| // MaxErrorRate filters candidates at/above this ratio. Defaults to 0.5. | ||
| MaxErrorRate float64 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Honor the documented per-field weight defaults.
Lines 39-45 only restore 0.6/0.4 when both fields are zero. A partially configured strategy like &BalancedStrategy{CostWeight: 0.8} leaves latW at 0, and &BalancedStrategy{LatencyWeight: 0.2} leaves costW at 0, even though Lines 16-19 say each zero field should fall back to its default. That silently turns a balanced strategy into a single-factor router.
💡 Minimal fix
func (s *BalancedStrategy) Select(_ context.Context, candidates []ProviderCandidate) (*ProviderCandidate, error) {
costW := s.CostWeight
- if costW == 0 && s.LatencyWeight == 0 {
+ if costW == 0 {
costW = 0.6
}
latW := s.LatencyWeight
- if s.CostWeight == 0 && latW == 0 {
+ if latW == 0 {
latW = 0.4
}Please also add a regression test for one-sided zero values. As per coding guidelines, "Add or update tests for behavior changes."
Also applies to: 37-45
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/router/strategy_balanced.go` around lines 16 - 21, Update
BalancedStrategy routing defaults so each weight falls back independently in the
strategy setup logic, not only when both are zero. In the BalancedStrategy
handling path that computes costW and latW, ensure a zero CostWeight becomes 0.6
and a zero LatencyWeight becomes 0.4 even when the other field is explicitly
set. Add a regression test covering one-sided zero configurations for the
BalancedStrategy behavior to verify the per-field defaults are honored.
Source: Coding guidelines
| func RoutingStrategyCapture() echo.MiddlewareFunc { | ||
| return func(next echo.HandlerFunc) echo.HandlerFunc { | ||
| return func(c *echo.Context) error { | ||
| ctx := c.Request().Context() | ||
|
|
||
| if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" { | ||
| ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy)) | ||
| } | ||
|
|
||
| if ctx != c.Request().Context() { | ||
| c.SetRequest(c.Request().WithContext(ctx)) | ||
| } | ||
| return next(c) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Add a focused middleware test for header capture.
This adds new request behavior, but there’s no HTTP-layer test here to lock down trimming, lowercase normalization, and the no-header path before next runs. A small table-driven test for RoutingStrategyCapture() would cover the contract directly.
Based on learnings, keep provider-specific behavior isolated and preserve the OpenAI-compatible public API; as per coding guidelines, **/*.go: “Add or update tests for behavior changes.” and **/*_test.go: “Prefer table-driven tests.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/routing_strategy.go` around lines 19 - 34, Add a focused
table-driven test for RoutingStrategyCapture() to lock down the header-capture
behavior before next runs. Cover the trimmed header value, lowercase
normalization, and the no-header path, and assert the request context is updated
only when RoutingStrategyHeader is present. Use a small echo middleware/handler
test so the contract is verified at the HTTP layer without changing
provider-specific behavior.
Sources: Coding guidelines, Learnings
| if eligible && r.strategyRegistry != nil { | ||
| if chosen, newSelector, ok := r.applyStrategy(ctx, selector); ok { | ||
| selector = newSelector | ||
| p = chosen | ||
| } |
There was a problem hiding this comment.
This path can change the concrete provider after the request has already been resolved and authorized during prepare. For an unpinned gpt-4o request, prepare can authorize the first resolved selector such as openai-east/gpt-4o, then applyStrategy can switch to a cheaper or faster openai-west/gpt-4o without validating that final selector against the request's model authorizer. If virtual-model or user-path policy allows only the first provider, routing can send traffic to a provider/model the API key was not allowed to use. The strategy should select before authorization, or filter/revalidate candidates against the same authorizer before dispatch.
Context Used: CLAUDE.md (source)
| // Single-provider setups never participate in intelligent routing (no | ||
| // candidates to choose from), so suppress per-client latency tracking. | ||
| if len(providerMap) < 2 { | ||
| llmclient.SetLatencyTrackingEnabled(false) | ||
| } |
There was a problem hiding this comment.
This only disables the package-level latency tracking flag for single-provider configs, but it never re-enables it for a later multi-provider initialization in the same process. If a process first initializes with one provider and later reloads with two providers, every new llmclient.Client is still created without a LatencyTracker, so balanced routing sees zero/unknown latency for all providers and silently loses the latency-aware part of the strategy. Make this a constructor/config option, or at least set the flag explicitly for both single-provider and multi-provider init paths before clients are created.
Context Used: CLAUDE.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
SantiagoDePolonia
left a comment
There was a problem hiding this comment.
The architecture is sound and the interface/registry seam is genuinely the right abstraction (and the maintainer agrees). To get to merge:
- Fix the auth bypass - revalidate the strategy-chosen selector against modelAuthorizer (or move selection before authorization). This is non-negotiable.
- Gate routing eligibility so audio/realtime/other-capability endpoints can't be rerouted to an incapable provider.
- Make latency-tracking enablement a per-router config, not a sticky package global.
1628899 to
f3be180
Compare
|
@SantiagoDePolonia All three points have been addressed: 1. Auth bypass — fixedAdded a 2. Audio/realtime gating — analysed, no action neededTraced the full execution path for audio, realtime, and embeddings — none of them go through 3. Latency tracking config — fixedRemoved the package-level All three shipped as a squashed amend onto the same commit. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/features/intelligent-routing.mdx`:
- Around line 18-20: The routing flow description in intelligent-routing.mdx is
too broad because it says every strategy “scores and ranks” candidates, which
conflicts with first_fit behavior. Update the step that describes strategy
handling, using the relevant routing overview text, so it says the configured
strategy evaluates/selects among candidates rather than always scoring and
ranking them; keep the wording consistent with first_fit and the strategy table
below.
In `@internal/providers/init.go`:
- Around line 151-153: The default router setup in init should also wire the
post-selection access validator, not just the strategy registry. Update the
NewRouter construction in the init path to pass the model access validator
alongside WithStrategyRegistry so resolveProvider() can revalidate any
strategy-selected provider against user-path and scope rules after rerouting.
Use the existing router initialization symbols (NewRouter, WithStrategyRegistry,
and the router/access validator fields on the init result) to locate the change.
In `@internal/router/scoring.go`:
- Around line 24-25: The circuit-breaker filter in isCircuitOpen is incorrectly
treating “half-open” as fully open, which prevents recovery probes from reaching
candidates that need them. Update isCircuitOpen in the scoring router so only
“open” is excluded, and keep “half-open” eligible for routing so the probe path
in client.go can send a request and allow the breaker to close on success. Make
the same adjustment anywhere else in the scoring logic that reuses this check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a8099e6-6da1-43f8-8370-63ccc6838ecf
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.godocs/dev/api-examples.mddocs/features/intelligent-routing.mdxinternal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_firstfit.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| 2. GoModel collects all providers that serve this model | ||
| 3. If there are 2+ candidates, the configured **strategy** scores and ranks them | ||
| 4. The best-scoring provider handles the request |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Avoid saying every strategy “scores and ranks” candidates.
first_fit just returns the first acceptable provider, so this wording contradicts the table below. Rephrase this step to say the strategy evaluates/selects among candidates instead of always scoring them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/features/intelligent-routing.mdx` around lines 18 - 20, The routing flow
description in intelligent-routing.mdx is too broad because it says every
strategy “scores and ranks” candidates, which conflicts with first_fit behavior.
Update the step that describes strategy handling, using the relevant routing
overview text, so it says the configured strategy evaluates/selects among
candidates rather than always scoring and ranking them; keep the wording
consistent with first_fit and the strategy table below.
| strategyRegistry := buildStrategyRegistry(result.Config.Router) | ||
|
|
||
| router, err := NewRouter(registry, WithStrategyRegistry(strategyRegistry)) |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
Wire the post-selection access validator into the default router construction.
resolveProvider() only re-checks a strategy-chosen selector when r.modelAccessValidator is set, but this init path only passes WithStrategyRegistry. The default app wiring can therefore reroute after the original access check without revalidating the newly chosen provider against user-path / scope rules.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/providers/init.go` around lines 151 - 153, The default router setup
in init should also wire the post-selection access validator, not just the
strategy registry. Update the NewRouter construction in the init path to pass
the model access validator alongside WithStrategyRegistry so resolveProvider()
can revalidate any strategy-selected provider against user-path and scope rules
after rerouting. Use the existing router initialization symbols (NewRouter,
WithStrategyRegistry, and the router/access validator fields on the init result)
to locate the change.
0a9a1a3 to
74c9dc1
Compare
This feature enables the gateway to proactively select the best provider
when multiple providers serve the same model, combining cost and latency
signals into a weighted score. Previously the gateway used a deterministic
first-wins strategy: the first registered provider always handled the
request, regardless of cost or performance.
=== Architecture ===
- Pluggable RoutingStrategy interface (balanced, cost_only, latency_only,
first_fit) with extensible registry
- BalancedStrategy (default): weighted scoring (cost 0.6, latency 0.4),
lower score wins; missing data scores 1.0 (worst) to prefer
complete-data providers; ties broken by provider name lexicographic order
- CostOnlyStrategy / LatencyOnlyStrategy: single-factor ranking, degrade
to the other dimension when every candidate lacks their primary signal
(without the mutual-recursion bug that was identified and fixed)
- FirstFitStrategy: first acceptable candidate in registration order,
matching historical first-wins behaviour
=== Latency data collection ===
- EWMA-based LatencyTracker in llmclient.Client (in-memory, ~32 bytes per
provider, alpha 0.1 p50 / 0.05 p99 / 0.2 error-rate, never persisted)
- ProviderStats optional interface exposed by llmclient.Client so the
router can read P50Latency, P99Latency, ErrorRate, and CircuitState
via a single type assertion
- completeScope() is the single Record() call site
=== Candidate selection ===
- collectCandidates(modelID) enumerates providers serving modelID via the
new O(k) ListProvidersForModel registry method (registration-order aware)
- Candidates with circuit open or error rate at/above 0.5 are filtered
before scoring. When every candidate is filtered the caller falls back
to the first candidate with a WARN log
- Explicit provider-qualified requests bypass intelligent routing;
single-provider configs are unaffected (one candidate, no decision)
=== Per-request overrides ===
- X-GoModel-Routing-Strategy and X-GoModel-Routing-Weights headers;
invalid values silently fall back (WARN log)
- RoutingEligible context flag set during prepare for unqualified requests
=== Config ===
- MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS,
MODEL_ROUTING_MAX_ERROR_RATE env vars; router block in config.yaml
=== Bugs fixed during development ===
1. Mutual recursion: cost/latency strategies recursed when both
dimensions absent. Fixed by computing fallback scores inline
2. Free model mis-classification: normalizeCost treated cost=0 as
unknown. Fixed with -1 sentinel
3. Production path never executed strategy: PrepareChatRequest set
req.Provider, resolveProvider skipped due to providerHint != "".
Fixed via RoutingEligible context flag
4. FirstFit used alpha-sorted order instead of registration order.
Fixed via ListProvidersForModel with registration-order tracking
5. BalancedStrategy zero-weight default used modified variable instead
of original, causing LatencyWeight to stay 0 when both weights were 0
6. CostOnly degraded when free model present (max==0 sentinel).
Fixed with explicit hasPricingData() helper
7. ParseWeights accepted NaN/Inf/trailing-garbage. Fixed with math
validation and full-string consumption check
8. Model-syntax provider ("pricey/gpt-4o") overridden when req.Provider
was empty. Fixed with slash-in-model check in resolveProvider
=== Testing ===
- 38+ unit tests covering every strategy, edge cases, registry,
context helpers, and EWMA convergence
- 9 integration tests covering cheaper/faster selection, header
overrides, invalid override fallback, explicit provider bypass,
single-candidate no-op, weights override, FirstFit registration
order, and model-syntax provider pinning
- lint: 0 issues, full project build passes
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
internal/providers/init.go (1)
151-153: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winWire the post-selection access validator into the default router.
At Line 153 this init path still constructs the router with only
WithStrategyRegistry(...).resolveProvider()only re-checks a strategy-chosen selector whenr.modelAccessValidatoris non-nil, so the default app wiring can reroute after the original access check without validating the newly chosen provider. Pass the same access-validation callback used during prepare viaWithModelAccessValidator(...)here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/providers/init.go` around lines 151 - 153, The default router initialization in init should also receive the post-selection access validator, not just the strategy registry. Update the NewRouter wiring to pass the same model access validation callback used during prepare via WithModelAccessValidator(...) alongside WithStrategyRegistry(...), so resolveProvider() can re-check any provider selected by the strategy. Use the existing router construction in init and the router/model access validator symbols to locate the change.internal/providers/router.go (1)
332-338: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDon’t disable intelligent routing for every slash-bearing model ID.
At Line 337,
!strings.Contains(model, "/")treats any slash as a provider pin.internal/providers/registry.go:GetProvider()already distinguishes realprovider/modelselectors from model IDs that legitimately contain/, so this gate forces valid unpinned requests likemeta-llama/Llama-3-70Bdown the old first-wins path even when the request was marked routing-eligible upstream. Gate on actual provider pinning instead of raw slash presence, and add a regression case for an unqualified slash-bearing model served by multiple providers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/providers/router.go` around lines 332 - 338, The intelligent routing gate in router.go is too broad because strings.Contains(model, "/") treats every slash-bearing model ID as provider-pinned. Update the eligibility check around eligible and strategyRegistry to rely on the same provider-pinning semantics used by internal/providers/registry.go:GetProvider(), so routing-eligible requests with model IDs like meta-llama/Llama-3-70B can still use the strategy when not actually pinned. Add a regression test covering an unqualified slash-containing model served by multiple providers to verify the strategy path is chosen.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.env.template:
- Around line 126-131: This env template block only lists routing defaults and
should also guide operators on when to override them. Update the comments around
MODEL_ROUTING_STRATEGY, MODEL_ROUTING_STRATEGY_WEIGHTS, and
MODEL_ROUTING_MAX_ERROR_RATE to explain the practical cases for choosing
first_fit versus balanced, when to tune cost/latency weights, and when to
tighten or loosen the max error rate. Keep the guidance anchored to the routing
settings so users can safely choose defaults in env-only deployments.
In `@config/router.go`:
- Around line 56-125: Add table-driven tests for ValidateRouterConfig and its
helpers to cover the new normalization and validation paths. Exercise
RouterStrategyBalanced and RouterStrategyFirstFit normalization, empty strategy
defaulting, WeightsCSV parsing via parseWeightsCSV/parseFloatField,
finite/non-negative rejection for RouterWeights and MaxErrorRate, and the
balanced default weight fallback. Include assertions for both successful
normalization and expected error cases so the behavior in config/router.go is
locked down.
In `@internal/server/routing_strategy.go`:
- Around line 24-26: The routing strategy header handling currently skips empty
values, so an explicitly empty X-GoModel-Routing-Strategy cannot clear an
inherited override. Update the logic in the request handling path that uses
RoutingStrategyHeader to distinguish between an absent header and a
present-but-empty one, and call router.WithStrategyOverride(ctx, "") when the
header is explicitly set to empty so the contract in internal/router/context.go
is preserved.
---
Duplicate comments:
In `@internal/providers/init.go`:
- Around line 151-153: The default router initialization in init should also
receive the post-selection access validator, not just the strategy registry.
Update the NewRouter wiring to pass the same model access validation callback
used during prepare via WithModelAccessValidator(...) alongside
WithStrategyRegistry(...), so resolveProvider() can re-check any provider
selected by the strategy. Use the existing router construction in init and the
router/model access validator symbols to locate the change.
In `@internal/providers/router.go`:
- Around line 332-338: The intelligent routing gate in router.go is too broad
because strings.Contains(model, "/") treats every slash-bearing model ID as
provider-pinned. Update the eligibility check around eligible and
strategyRegistry to rely on the same provider-pinning semantics used by
internal/providers/registry.go:GetProvider(), so routing-eligible requests with
model IDs like meta-llama/Llama-3-70B can still use the strategy when not
actually pinned. Add a regression test covering an unqualified slash-containing
model served by multiple providers to verify the strategy path is chosen.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c44232c9-3be5-4799-9a8c-0e28e19656e9
📒 Files selected for processing (27)
.env.templateCLAUDE.mdconfig/config.example.yamlconfig/config.goconfig/router.godocs/dev/api-examples.mddocs/features/intelligent-routing.mdxinternal/core/interfaces.gointernal/gateway/inference_prepare.gointernal/llmclient/client.gointernal/llmclient/latency.gointernal/llmclient/latency_test.gointernal/providers/init.gointernal/providers/registry.gointernal/providers/router.gointernal/providers/router_strategy_integration_test.gointernal/providers/strategy_init.gointernal/router/context.gointernal/router/registry.gointernal/router/registry_test.gointernal/router/scoring.gointernal/router/strategy.gointernal/router/strategy_balanced.gointernal/router/strategy_firstfit.gointernal/router/strategy_test.gointernal/server/http.gointernal/server/routing_strategy.go
| # Strategy: balanced (default) or first_fit | ||
| # MODEL_ROUTING_STRATEGY=balanced | ||
| # Balanced weights as "cost_weight,latency_weight" (only affects balanced). | ||
| # MODEL_ROUTING_STRATEGY_WEIGHTS=0.6,0.4 | ||
| # Filter candidates at/above this smoothed error rate [0,1] (default: 0.5). | ||
| # MODEL_ROUTING_MAX_ERROR_RATE=0.5 |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Explain when operators should override these routing defaults.
This block lists the defaults, but it does not tell users when first_fit, custom weights, or a tighter/looser MODEL_ROUTING_MAX_ERROR_RATE are the better choice. That makes the new env knobs harder to use safely in env-only deployments.
As per coding guidelines, .env.template should document configuration defaults clearly, including safe practical defaults and when users should override them.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.env.template around lines 126 - 131, This env template block only lists
routing defaults and should also guide operators on when to override them.
Update the comments around MODEL_ROUTING_STRATEGY,
MODEL_ROUTING_STRATEGY_WEIGHTS, and MODEL_ROUTING_MAX_ERROR_RATE to explain the
practical cases for choosing first_fit versus balanced, when to tune
cost/latency weights, and when to tighten or loosen the max error rate. Keep the
guidance anchored to the routing settings so users can safely choose defaults in
env-only deployments.
Source: Coding guidelines
| // ValidateRouterConfig normalizes and validates the router config, applying | ||
| // defaults for empty/invalid fields. | ||
| func ValidateRouterConfig(cfg *RouterConfig) error { | ||
| if cfg.Strategy == "" { | ||
| cfg.Strategy = RouterStrategyBalanced | ||
| } | ||
| strategy := strings.ToLower(strings.TrimSpace(cfg.Strategy)) | ||
| switch strategy { | ||
| case RouterStrategyBalanced, RouterStrategyFirstFit: | ||
| default: | ||
| return fmt.Errorf("invalid router.strategy %q: must be one of balanced, first_fit", cfg.Strategy) | ||
| } | ||
| cfg.Strategy = strategy | ||
|
|
||
| // MODEL_ROUTING_STRATEGY_WEIGHTS overrides YAML weights when set. | ||
| if strings.TrimSpace(cfg.WeightsCSV) != "" { | ||
| cost, lat, err := parseWeightsCSV(cfg.WeightsCSV) | ||
| if err != nil { | ||
| return fmt.Errorf("invalid MODEL_ROUTING_STRATEGY_WEIGHTS: %w", err) | ||
| } | ||
| cfg.Weights = RouterWeights{Cost: cost, Latency: lat} | ||
| } | ||
|
|
||
| if math.IsNaN(cfg.Weights.Cost) || math.IsNaN(cfg.Weights.Latency) || | ||
| math.IsInf(cfg.Weights.Cost, 0) || math.IsInf(cfg.Weights.Latency, 0) { | ||
| return fmt.Errorf("router.weights must be finite numbers, got cost=%v latency=%v", cfg.Weights.Cost, cfg.Weights.Latency) | ||
| } | ||
| if cfg.Weights.Cost < 0 || cfg.Weights.Latency < 0 { | ||
| return fmt.Errorf("router.weights must be non-negative, got cost=%v latency=%v", cfg.Weights.Cost, cfg.Weights.Latency) | ||
| } | ||
| if cfg.Weights.Cost == 0 && cfg.Weights.Latency == 0 && strategy == RouterStrategyBalanced { | ||
| cfg.Weights = RouterWeights{Cost: 0.6, Latency: 0.4} | ||
| } | ||
| if math.IsNaN(cfg.MaxErrorRate) || math.IsInf(cfg.MaxErrorRate, 0) { | ||
| return fmt.Errorf("router.max_error_rate must be a finite number, got %v", cfg.MaxErrorRate) | ||
| } | ||
| if cfg.MaxErrorRate < 0 || cfg.MaxErrorRate > 1 { | ||
| return fmt.Errorf("router.max_error_rate must be in [0, 1], got %v", cfg.MaxErrorRate) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // parseWeightsCSV parses a "cost,latency" string into two non-negative floats. | ||
| func parseWeightsCSV(s string) (cost, latency float64, err error) { | ||
| parts := strings.Split(s, ",") | ||
| if len(parts) != 2 { | ||
| return 0, 0, fmt.Errorf("expected two comma-separated weights, got %q", s) | ||
| } | ||
| if cost, err = parseFloatField(parts[0], "cost"); err != nil { | ||
| return 0, 0, err | ||
| } | ||
| if latency, err = parseFloatField(parts[1], "latency"); err != nil { | ||
| return 0, 0, err | ||
| } | ||
| if cost < 0 || latency < 0 { | ||
| return 0, 0, fmt.Errorf("weights must be non-negative, got %v,%v", cost, latency) | ||
| } | ||
| return cost, latency, nil | ||
| } | ||
|
|
||
| func parseFloatField(s, name string) (float64, error) { | ||
| v, err := strconv.ParseFloat(strings.TrimSpace(s), 64) | ||
| if err != nil { | ||
| return 0, fmt.Errorf("invalid %s weight %q: %w", name, s, err) | ||
| } | ||
| if math.IsNaN(v) || math.IsInf(v, 0) { | ||
| return 0, fmt.Errorf("invalid %s weight %q: must be a finite number", name, s) | ||
| } | ||
| return v, nil | ||
| } |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win
Add table-driven tests for router config normalization.
This block now owns strategy normalization, CSV parsing, finite-number rejection, and balanced-weight defaulting, and the PR summary still calls out coverage gaps in config/router.go. A small table-driven test set here would lock down the branches that directly feed runtime routing decisions.
As per coding guidelines, **/*_test.go should add or update tests for behavior changes, covering default configuration and error handling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@config/router.go` around lines 56 - 125, Add table-driven tests for
ValidateRouterConfig and its helpers to cover the new normalization and
validation paths. Exercise RouterStrategyBalanced and RouterStrategyFirstFit
normalization, empty strategy defaulting, WeightsCSV parsing via
parseWeightsCSV/parseFloatField, finite/non-negative rejection for RouterWeights
and MaxErrorRate, and the balanced default weight fallback. Include assertions
for both successful normalization and expected error cases so the behavior in
config/router.go is locked down.
Source: Coding guidelines
| if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" { | ||
| ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy)) | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Propagate an explicitly empty header so it can clear an inherited override.
Header.Get plus the != "" guard makes an empty X-GoModel-Routing-Strategy indistinguishable from an absent one, so router.WithStrategyOverride(ctx, "") is never called. That breaks the context contract in internal/router/context.go, where an empty value is the mechanism for clearing a prior override.
💡 Minimal fix
- if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" {
- ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy))
+ if values, ok := c.Request().Header[RoutingStrategyHeader]; ok {
+ strategy := ""
+ if len(values) > 0 {
+ strategy = strings.ToLower(strings.TrimSpace(values[0]))
+ }
+ ctx = router.WithStrategyOverride(ctx, strategy)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if strategy := strings.TrimSpace(c.Request().Header.Get(RoutingStrategyHeader)); strategy != "" { | |
| ctx = router.WithStrategyOverride(ctx, strings.ToLower(strategy)) | |
| } | |
| if values, ok := c.Request().Header[RoutingStrategyHeader]; ok { | |
| strategy := "" | |
| if len(values) > 0 { | |
| strategy = strings.ToLower(strings.TrimSpace(values[0])) | |
| } | |
| ctx = router.WithStrategyOverride(ctx, strategy) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/server/routing_strategy.go` around lines 24 - 26, The routing
strategy header handling currently skips empty values, so an explicitly empty
X-GoModel-Routing-Strategy cannot clear an inherited override. Update the logic
in the request handling path that uses RoutingStrategyHeader to distinguish
between an absent header and a present-but-empty one, and call
router.WithStrategyOverride(ctx, "") when the header is explicitly set to empty
so the contract in internal/router/context.go is preserved.
Summary
When multiple providers serve the same model ID, the gateway now proactively selects the best provider based on a weighted cost + latency score — instead of the previous deterministic first-wins strategy where the first registered provider always handled the request.
Architecture
New components (8 files)
Routing strategies under
internal/router/:strategy.gostrategy_balanced.gostrategy_cost.gostrategy_latency.gostrategy_firstfit.goscoring.goregistry.gocontext.goSupporting:
internal/llmclient/latency.gointernal/core/interfaces.go(+ProviderStats)internal/server/routing_strategy.goconfig/router.goModified components
llmclient/client.goproviders/registry.goproviders/router.goproviders/init.goproviders/strategy_init.gogateway/inference_prepare.goconfig/config.goconfig/config.example.yaml.env.templateCLAUDE.mdKey design decisions
X-GoModel-Routing-StrategyandX-GoModel-Routing-Weights; invalid values silently fall back (WARN log)openai-east/gpt-4oorreq.Providernon-empty leads to exact route, no strategy involvementBugs fixed during development
Testing
Summary by CodeRabbit
balanced(cost/latency weighted) andfirst_fit(registration-order), with configurablemax_error_rate.X-GoModel-Routing-Strategy.provider/model) bypass Intelligent Routing.