feat(hosted key): Add hosted key, exa as shared key#3221
feat(hosted key): Add hosted key, exa as shared key#3221TheodoreSpeaks wants to merge 28 commits intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Greptile SummaryThis PR introduces hosted API key support for Exa search tools, allowing Sim to provide its own Exa API keys on the hosted platform so users don't need to supply their own. It includes a token-bucket rate limiter per billing actor (workspace), least-loaded key selection across a configurable key pool, exponential backoff retry on external rate limits, custom pricing calculation from Exa's Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User as User/Workflow
participant ET as executeTool()
participant BYOK as getBYOKKey()
participant RL as HostedKeyRateLimiter
participant DB as DB Token Bucket
participant API as External API (Exa)
User->>ET: executeTool(toolId, params, ctx)
ET->>ET: injectHostedKeyIfNeeded()
alt Tool has hosting config AND isHosted=true
ET->>BYOK: check workspace BYOK key
alt BYOK key found
BYOK-->>ET: user key (no billing)
else No BYOK key
ET->>RL: acquireKey(provider, prefix, config, workspaceId)
RL->>DB: consumeTokens(actor bucket, 1 token)
DB-->>RL: allowed or denied
alt Actor rate limited
RL-->>ET: billingActorRateLimited=true
ET-->>User: Error 429 (rate limited)
else No hosted keys available
RL-->>ET: success=false
ET-->>User: Error 503 (no keys)
else Key acquired
RL-->>ET: hosted key value
ET->>ET: inject key into params
end
end
end
ET->>API: executeWithRetry(executeToolRequest)
loop Retry up to 3x on 429 or 503
API-->>ET: 429 Too Many Requests
ET->>ET: exponential backoff delay
end
API-->>ET: 200 OK with response data
alt isUsingHostedKey AND success
ET->>RL: reportUsage() for custom dimensions
ET->>ET: calculateToolCost() via pricing config
ET->>DB: logFixedUsage() for audit trail
ET->>ET: attach cost to output
end
ET->>ET: stripInternalFields() remove underscore fields
ET-->>User: ToolResult (cost attached if hosted key used)
|
PR SummaryHigh Risk Overview Extends BYOK support to the new Refactors cost shaping: removes generic-handler output cost extraction and instead restructures cost/token/model fields in knowledge tool Written by Cursor Bugbot for commit 34cffdc. This will update automatically on new commits. Configure here. |
| throw error | ||
| } | ||
|
|
||
| const output = result.output |
There was a problem hiding this comment.
Moved this to knowledge block transformation, so generic handler doesn't need to handle special cases
| )::numeric | ||
| ) - ${requestedTokens}::numeric | ||
| ELSE ${rateLimitBucket.tokens}::numeric | ||
| ELSE -1 |
There was a problem hiding this comment.
DB rate limiting didn't work since tokens was always 0. Set the else case to -1 so rate limiting applies successfully.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for 2 of the 3 issues found in the latest run.
- ✅ Fixed: Key request counters grow unbounded, never decrement
- Added key release/decrement logic in
HostedKeyRateLimiterand wiredexecuteToolto always release acquired hosted keys in afinallyblock so counters track in-flight load.
- Added key release/decrement logic in
- ✅ Fixed: Hosted key injection ignores user-provided API key
injectHostedKeyIfNeedednow returns early when the configured API-key parameter is already present, preserving user-supplied keys in hosted mode.
Or push these changes by commenting:
@cursor push 80f7b70ba9
Preview (80f7b70ba9)
diff --git a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
--- a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
+++ b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
@@ -155,6 +155,41 @@
expect(r4.keyIndex).toBe(0) // Wraps back
})
+ it('should rebalance key selection after releasing an in-flight key', async () => {
+ const allowedResult: ConsumeResult = {
+ allowed: true,
+ tokensRemaining: 9,
+ resetAt: new Date(Date.now() + 60000),
+ }
+ mockAdapter.consumeTokens.mockResolvedValue(allowedResult)
+
+ const r1 = await rateLimiter.acquireKey(
+ testProvider,
+ envKeyPrefix,
+ perRequestRateLimit,
+ 'workspace-1'
+ )
+ const r2 = await rateLimiter.acquireKey(
+ testProvider,
+ envKeyPrefix,
+ perRequestRateLimit,
+ 'workspace-2'
+ )
+
+ expect(r1.keyIndex).toBe(0)
+ expect(r2.keyIndex).toBe(1)
+
+ rateLimiter.releaseKey(testProvider, 0)
+
+ const r3 = await rateLimiter.acquireKey(
+ testProvider,
+ envKeyPrefix,
+ perRequestRateLimit,
+ 'workspace-3'
+ )
+ expect(r3.keyIndex).toBe(0)
+ })
+
it('should handle partial key availability', async () => {
const allowedResult: ConsumeResult = {
allowed: true,
diff --git a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
--- a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
+++ b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
@@ -52,7 +52,7 @@
*/
export class HostedKeyRateLimiter {
private storage: RateLimitStorageAdapter
- /** In-memory request counters per key: "provider:keyIndex" -> count */
+ /** In-memory in-flight request counters per key: "provider:keyIndex" -> count */
private keyRequestCounts = new Map<string, number>()
constructor(storage?: RateLimitStorageAdapter) {
@@ -346,6 +346,21 @@
const key = `${provider}:${keyIndex}`
this.keyRequestCounts.set(key, (this.keyRequestCounts.get(key) ?? 0) + 1)
}
+
+ private decrementKeyCount(provider: string, keyIndex: number): void {
+ const key = `${provider}:${keyIndex}`
+ const current = this.keyRequestCounts.get(key)
+ if (!current) return
+ if (current <= 1) {
+ this.keyRequestCounts.delete(key)
+ return
+ }
+ this.keyRequestCounts.set(key, current - 1)
+ }
+
+ releaseKey(provider: string, keyIndex: number): void {
+ this.decrementKeyCount(provider, keyIndex)
+ }
}
let cachedInstance: HostedKeyRateLimiter | null = null
diff --git a/apps/sim/tools/index.test.ts b/apps/sim/tools/index.test.ts
--- a/apps/sim/tools/index.test.ts
+++ b/apps/sim/tools/index.test.ts
@@ -26,6 +26,7 @@
acquireKey: vi.fn(),
preConsumeCapacity: vi.fn(),
consumeCapacity: vi.fn(),
+ releaseKey: vi.fn(),
},
})
)
@@ -1338,6 +1339,73 @@
Object.assign(tools, originalTools)
})
+ it('should not overwrite user-provided API key in hosted mode', async () => {
+ mockIsHosted.value = true
+ const mockTool = {
+ id: 'test_user_key_hosted',
+ name: 'Test User Key Hosted',
+ description: 'A test tool with hosting config',
+ version: '1.0.0',
+ params: {
+ apiKey: { type: 'string', required: true, visibility: 'user-only' as const },
+ },
+ hosting: {
+ envKeyPrefix: 'TEST_API',
+ apiKeyParam: 'apiKey',
+ byokProviderId: 'exa',
+ pricing: {
+ type: 'per_request' as const,
+ cost: 0.005,
+ },
+ rateLimit: {
+ mode: 'per_request' as const,
+ requestsPerMinute: 100,
+ },
+ },
+ request: {
+ url: '/api/test/endpoint',
+ method: 'POST' as const,
+ headers: (params: any) => ({
+ 'Content-Type': 'application/json',
+ 'x-api-key': params.apiKey,
+ }),
+ },
+ transformResponse: vi.fn().mockResolvedValue({
+ success: true,
+ output: { result: 'success' },
+ }),
+ }
+
+ const originalTools = { ...tools }
+ ;(tools as any).test_user_key_hosted = mockTool
+
+ global.fetch = Object.assign(
+ vi.fn().mockImplementation(async () => ({
+ ok: true,
+ status: 200,
+ headers: new Headers(),
+ json: () => Promise.resolve({ success: true }),
+ })),
+ { preconnect: vi.fn() }
+ ) as typeof fetch
+
+ const mockContext = createToolExecutionContext()
+ const result = await executeTool(
+ 'test_user_key_hosted',
+ { apiKey: 'user-provided-key' },
+ false,
+ mockContext
+ )
+
+ expect(result.success).toBe(true)
+ expect(mockGetBYOKKey).not.toHaveBeenCalled()
+ expect(mockRateLimiterFns.acquireKey).not.toHaveBeenCalled()
+ expect(mockRateLimiterFns.releaseKey).not.toHaveBeenCalled()
+
+ Object.assign(tools, originalTools)
+ mockIsHosted.value = false
+ })
+
it('should use per_request pricing model correctly', async () => {
const mockTool = {
id: 'test_per_request_pricing',
diff --git a/apps/sim/tools/index.ts b/apps/sim/tools/index.ts
--- a/apps/sim/tools/index.ts
+++ b/apps/sim/tools/index.ts
@@ -40,6 +40,8 @@
interface HostedKeyInjectionResult {
isUsingHostedKey: boolean
envVarName?: string
+ provider?: string
+ keyIndex?: number
}
/**
@@ -57,6 +59,10 @@
if (!isHosted) return { isUsingHostedKey: false }
const { envKeyPrefix, apiKeyParam, byokProviderId, rateLimit } = tool.hosting
+ const existingApiKey = params[apiKeyParam]
+ if (existingApiKey !== undefined && existingApiKey !== null && existingApiKey !== '') {
+ return { isUsingHostedKey: false }
+ }
// Check BYOK workspace key first
if (byokProviderId && executionContext?.workspaceId) {
@@ -132,6 +138,8 @@
return {
isUsingHostedKey: true,
envVarName: acquireResult.envVarName,
+ provider,
+ keyIndex: acquireResult.keyIndex,
}
}
@@ -547,6 +555,7 @@
const startTime = new Date()
const startTimeISO = startTime.toISOString()
const requestId = generateRequestId()
+ let hostedKeyInfo: HostedKeyInjectionResult = { isUsingHostedKey: false }
try {
let tool: ToolConfig | undefined
@@ -615,7 +624,7 @@
}
// Inject hosted API key if tool supports it and user didn't provide one
- const hostedKeyInfo = await injectHostedKeyIfNeeded(
+ hostedKeyInfo = await injectHostedKeyIfNeeded(
tool,
contextParams,
executionContext,
@@ -930,6 +939,18 @@
duration,
},
}
+ } finally {
+ if (
+ hostedKeyInfo.isUsingHostedKey &&
+ hostedKeyInfo.provider &&
+ typeof hostedKeyInfo.keyIndex === 'number'
+ ) {
+ try {
+ getHostedKeyRateLimiter().releaseKey(hostedKeyInfo.provider, hostedKeyInfo.keyIndex)
+ } catch (error) {
+ logger.warn(`[${requestId}] Failed to release hosted key for ${toolId}:`, error)
+ }
+ }
}
}| contextParams, | ||
| executionContext, | ||
| requestId | ||
| ) |
There was a problem hiding this comment.
Required param validation runs before hosted key injection
High Severity
validateRequiredParametersAfterMerge is called on line 610 before injectHostedKeyIfNeeded on line 618. In hosted mode, the UI hides the apiKey subblock via hideWhenHosted: true and the serializer skips it via isSubBlockHiddenByHostedKey, so contextParams won't contain apiKey. But all Exa tools declare apiKey as required: true in their params. Validation would reject the missing required param before the hosted key ever gets a chance to be injected. This likely breaks all Exa hosted-key tool executions.
Additional Locations (1)
There was a problem hiding this comment.
api key is passed via hosted key. Confirmed hosted key does pass validation and execute in local. This is not an issue.
There was a problem hiding this comment.
Bugbot Autofix determined this is a false positive.
validateRequiredParametersAfterMerge only enforces required parameters with visibility: 'user-or-llm', so Exa's required apiKey (visibility: 'user-only') is not rejected before hosted-key injection.
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
Outdated
Show resolved
Hide resolved
| requestId: string | ||
| ): Promise<HostedKeyInjectionResult> { | ||
| if (!tool.hosting) return { isUsingHostedKey: false } | ||
| if (!isHosted) return { isUsingHostedKey: false } |
There was a problem hiding this comment.
Hosted key injection ignores user-provided API key
Low Severity
injectHostedKeyIfNeeded is documented as injecting a key only "if user didn't provide one," but it never checks whether params[apiKeyParam] is already populated. In hosted mode, if a user provides their own API key (e.g., via direct API call), the function unconditionally overwrites it with either a BYOK key or a hosted key, and the user gets billed for hosted key usage despite supplying their own.
There was a problem hiding this comment.
api key cannot be provided in the ui if sim is hosted. Therefore, this condition will never occur.
| throw new Error(`Tool not found: ${toolId}`) | ||
| } | ||
|
|
||
| // Inject hosted API key if tool supports it and user didn't provide one | ||
| const hostedKeyInfo = await injectHostedKeyIfNeeded( | ||
| tool, | ||
| contextParams, |
There was a problem hiding this comment.
stripInternalFields is applied unconditionally to all tools
stripInternalFields removes all output keys starting with _ and is applied to every tool execution regardless of whether a hosted key was used. This is a behavior change that could silently break any existing tool whose output contains underscore-prefixed fields.
For example, if any other tool in the registry returns a field like _metadata, _debug, or _raw in its output, those fields would now be silently dropped without any warning or error.
The stripping should only occur when a hosted key was actually used (or at minimum scoped to Exa tools), since the underscore-prefix convention for internal fields is only established in this PR:
// Strip internal fields only when using hosted key
if (hostedKeyInfo.isUsingHostedKey) {
const strippedOutput = stripInternalFields(finalResult.output || {})
finalResult = { ...finalResult, output: strippedOutput }
}| maxRetries: 0, | ||
| delayMs: acquireResult.retryAfterMs ?? 0, | ||
| userId: executionContext?.userId, | ||
| workspaceId: executionContext?.workspaceId, | ||
| workflowId: executionContext?.workflowId, |
There was a problem hiding this comment.
503 used as a rate-limit retry signal — may cause excessive retries on genuine service failures
isRateLimitError treats HTTP 503 (Service Unavailable) as a retriable rate-limit error. This means if the Exa API is genuinely down (and returns 503), executeWithRetry will retry 3 times with exponential backoff before surfacing the error to the user. Each retry could incur up to an 8-second wait (1s + 2s + 4s), plus the actual API timeout, causing a noticeably slow failure.
HTTP 503 is sometimes used for rate limiting, but it can also indicate a real outage. Exa's actual rate-limit response is 429. Consider restricting retries to 429 only, unless there's a confirmed reason to expect 503 from Exa as a throttle signal:
| maxRetries: 0, | |
| delayMs: acquireResult.retryAfterMs ?? 0, | |
| userId: executionContext?.userId, | |
| workspaceId: executionContext?.workspaceId, | |
| workflowId: executionContext?.workflowId, | |
| function isRateLimitError(error: unknown): boolean { | |
| if (error && typeof error === 'object') { | |
| const status = (error as { status?: number }).status | |
| if (status === 429) return true | |
| } | |
| return false | |
| } |
| import { createLogger } from '@sim/logger' | ||
| import { | ||
| createStorageAdapter, | ||
| type RateLimitStorageAdapter, | ||
| type TokenBucketConfig, | ||
| } from '@/lib/core/rate-limiter/storage' | ||
| import { | ||
| type AcquireKeyResult, | ||
| type CustomRateLimit, | ||
| DEFAULT_BURST_MULTIPLIER, |
There was a problem hiding this comment.
In-memory keyRequestCounts singleton won't distribute load across serverless instances
The keyRequestCounts Map and cachedInstance singleton are in-process memory. In a serverless/edge deployment (e.g., Vercel), each function invocation may run in an isolated container, so the least-loaded key selection only tracks requests seen by that specific instance. Different instances will independently increment their counters starting from 0, giving no guarantee of even distribution across the key pool.
The PR description says keys are distributed "via round robin", but the implementation is "least loaded in memory." In practice, each cold-started instance will always pick key index 0 first, which concentrates traffic on the first key until that instance warms up.
Consider replacing the in-memory counter with a distributed counter (e.g., using the existing RateLimitStorageAdapter) so load is balanced globally, or document this limitation explicitly.
| }) | ||
|
|
||
| const error = new Error(acquireResult.error || `Rate limit exceeded for ${tool.id}`) | ||
| ;(error as any).status = 429 | ||
| ;(error as any).retryAfterMs = acquireResult.retryAfterMs | ||
| throw error | ||
| } | ||
|
|
||
| // Handle no keys configured (503) | ||
| if (!acquireResult.success) { | ||
| logger.error(`[${requestId}] No hosted keys configured for ${tool.id}: ${acquireResult.error}`) | ||
| const error = new Error(acquireResult.error || `No hosted keys configured for ${tool.id}`) | ||
| ;(error as any).status = 503 | ||
| throw error | ||
| } | ||
|
|
||
| params[apiKeyParam] = acquireResult.key | ||
| logger.info(`[${requestId}] Using hosted key for ${tool.id} (${acquireResult.envVarName})`, { | ||
| keyIndex: acquireResult.keyIndex, | ||
| provider, | ||
| }) | ||
|
|
||
| return { | ||
| isUsingHostedKey: true, | ||
| envVarName: acquireResult.envVarName, | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Check if an error is a rate limit (throttling) error | ||
| */ | ||
| function isRateLimitError(error: unknown): boolean { | ||
| if (error && typeof error === 'object') { | ||
| const status = (error as { status?: number }).status | ||
| // 429 = Too Many Requests, 503 = Service Unavailable (sometimes used for rate limiting) | ||
| if (status === 429 || status === 503) return true | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| /** Context for retry with rate limit tracking */ | ||
| interface RetryContext { | ||
| requestId: string | ||
| toolId: string | ||
| envVarName: string |
There was a problem hiding this comment.
Post-execution cost & usage block is duplicated across two code paths
The block that calls reportCustomDimensionUsage, processHostedKeyCost, and applies the cost to finalResult.output appears twice — once in the internal-route code path and once in the external-route code path. This adds ~40 lines of identical logic that must be kept in sync manually.
Consider extracting this into a shared helper (e.g., applyHostedKeyCostToResult) that both paths call after finalResult is available.
| )::numeric | ||
| ) - ${requestedTokens}::numeric | ||
| ELSE ${rateLimitBucket.tokens}::numeric | ||
| ELSE -1 |
There was a problem hiding this comment.
Token bucket denial discards actual remaining token balance
Low Severity
When a consume request is denied (insufficient tokens), the ELSE branch unconditionally sets stored tokens to -1, discarding the actual remaining balance. For example, if a bucket has 3 tokens and a request for 5 is denied, the balance drops to -1 instead of remaining at 3. On the next refill, available tokens become refillRate - 1 instead of 3 + refillRate, causing a permanent loss of those 3 tokens. The denial case ideally preserves the refilled balance without consuming.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| const costDollars = output.__costDollars as { total?: number } | undefined | ||
| if (costDollars?.total) { | ||
| return { cost: costDollars.total, metadata: { costDollars } } | ||
| } |
There was a problem hiding this comment.
Falsy check on cost causes incorrect fallback billing
Medium Severity
The getCost functions in all Exa tools use a truthiness check if (costDollars?.total) to decide whether to use the API-reported cost or a fallback estimate. Since 0 is falsy in JavaScript, a legitimate zero-cost response from Exa (e.g., cached results) would incorrectly fall through to the fallback pricing, billing the user a non-zero amount (e.g., $0.005) instead of $0. The check needs to be != null or !== undefined instead of a truthiness test.
Additional Locations (2)
| if (status === 429 || status === 503) return true | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
Retry treats 503 as rate-limit causing unnecessary retries
Low Severity
isRateLimitError classifies HTTP 503 (Service Unavailable) as a rate-limit error, causing executeWithRetry to retry with exponential backoff on genuine server outages. This delays error reporting by up to ~7 seconds (1s + 2s + 4s) for non-throttle 503s, and could mask persistent outages since the error message returned to the user would still be opaque. Only 429 reliably indicates rate limiting.



Summary
Added hosted api key integration for exa search.
For hosted sim, api sub block no longer appears for exa search. Users can use the BYOK modal to pass an api key if needed. Non-hosted sim behaves unchanged.
Added throttling per billing actor. Requests are throttled proactively, failing fast if they exceed limits (users can switch to BYOK if they need higher limits). Can be either a limit based on requests per minute or a custom limit using the api response. If custom, rate limiting is optimistic and only applies after execution completes.
Moved cost modification behavior to knowledge block directly instead of transforming each block separately.
Hosted key calls retry 3 times with exponential backoff to handle throttles from increased load, with alarms.
Important: API keys need to be stored in prod and staging before merging:
EXA_API_KEY_COUNTnumberEXA_API_KEY_{number}api keys (traffic spread evenly via round robin)Type of Change
Testing
Checklist
Screenshots/Videos