Skip to content

feat(hosted key): Add hosted key, exa as shared key#3221

Open
TheodoreSpeaks wants to merge 28 commits intostagingfrom
feat/sim-provided-key
Open

feat(hosted key): Add hosted key, exa as shared key#3221
TheodoreSpeaks wants to merge 28 commits intostagingfrom
feat/sim-provided-key

Conversation

@TheodoreSpeaks
Copy link
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Feb 14, 2026

Summary

Added hosted api key integration for exa search.
For hosted sim, api sub block no longer appears for exa search. Users can use the BYOK modal to pass an api key if needed. Non-hosted sim behaves unchanged.

Added throttling per billing actor. Requests are throttled proactively, failing fast if they exceed limits (users can switch to BYOK if they need higher limits). Can be either a limit based on requests per minute or a custom limit using the api response. If custom, rate limiting is optimistic and only applies after execution completes.

Moved cost modification behavior to knowledge block directly instead of transforming each block separately.

Hosted key calls retry 3 times with exponential backoff to handle throttles from increased load, with alarms.

Important: API keys need to be stored in prod and staging before merging:

  • EXA_API_KEY_COUNT number
  • EXA_API_KEY_{number} api keys (traffic spread evenly via round robin)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

  • Tested non-host still has required api subblock.
  • Tested hosted sim does not have api subblock
  • Tested hosted key updates user stats and usage log tables correctly
  • Unit tested throttling handles correctly
  • Tested rate limiting works successfully for exa search with hosted key and does not apply when BYOK is applied.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link

vercel bot commented Feb 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 6, 2026 4:10am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 14, 2026

Greptile Summary

This PR introduces hosted API key support for Exa search tools, allowing Sim to provide its own Exa API keys on the hosted platform so users don't need to supply their own. It includes a token-bucket rate limiter per billing actor (workspace), least-loaded key selection across a configurable key pool, exponential backoff retry on external rate limits, custom pricing calculation from Exa's costDollars response field, and a BYOK override path so users can always substitute their own key.

Key changes:

  • New ToolHostingConfig type on ToolConfig to declare hosted key behavior for any tool
  • HostedKeyRateLimiter with per-actor token buckets (DB-backed) and optional multi-dimension usage tracking
  • injectHostedKeyIfNeeded in executeTool handles BYOK check, rate limiting, and key selection before each request
  • Exa's API key subblock is now hidden in the block editor and serializer when running on hosted Sim (hideWhenHosted: true)
  • Cost and usage logging via logFixedUsage after each successful hosted-key execution
  • BYOKProviderId centralized to tools/types.ts and exa added as a BYOK provider

Issues found:

  • The ExaSearchResponse, ExaGetContentsResponse, ExaFindSimilarLinksResponse, and ExaAnswerResponse types declare costDollars? (no underscore) but the implementations set _costDollars (with underscore); the typed field is never populated
  • stripInternalFields is applied unconditionally to all tool outputs, not just hosted-key paths — this silently drops any underscore-prefixed field across the entire tool registry
  • isRateLimitError treats HTTP 503 as retriable, which can cause 3 unnecessary retry attempts (up to ~7 seconds total delay) when Exa returns a genuine service-unavailable response
  • The in-memory keyRequestCounts singleton provides no load distribution across serverless instances; each cold-started instance independently routes all traffic to key index 0 first
  • lib/api-key/byok.ts now imports from tools/types.ts, inverting the expected libtools dependency direction
  • The post-execution cost block is copy-pasted into two separate code paths rather than extracted into a shared helper

Confidence Score: 3/5

  • Merging is relatively safe but two correctness issues should be addressed before or shortly after merge: the type mismatch on Exa response types and the unconditional stripping of underscore-prefixed fields from all tool outputs.
  • The core rate-limiting, BYOK fallback, and cost-logging logic is sound and well-tested. However, the stripInternalFields function being applied to all tool executions (not just hosted-key paths) is a silent breaking change for any existing tool with underscore-prefixed output fields. The type mismatch between costDollars and _costDollars in the Exa response interfaces is also a correctness issue. The in-memory singleton and 503-retry concerns are operational risks rather than launch blockers.
  • apps/sim/tools/index.ts (unconditional field stripping + duplicated cost block), apps/sim/tools/exa/types.ts (costDollars vs _costDollars type mismatch)

Important Files Changed

Filename Overview
apps/sim/tools/index.ts Core hosted key injection logic added; contains two issues: stripInternalFields applied unconditionally to all tools (not just hosted-key paths), and the post-execution cost block is duplicated across internal/external code paths. isRateLimitError also retries on 503 which may cause excessive delays on genuine outages.
apps/sim/tools/exa/types.ts Type declarations add costDollars? (without underscore) to response interfaces, but all four Exa tool implementations set _costDollars (with underscore). The declared field is never populated; the actual internal field is untyped.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Well-structured rate limiter with per-actor token buckets and multi-dimension support; the in-memory keyRequestCounts singleton will not distribute load correctly in a multi-instance serverless environment.
apps/sim/tools/types.ts New hosting config types (ToolHostingConfig, PerRequestPricing, CustomPricing) and BYOKProviderId are well-defined; moving BYOKProviderId here causes lib modules to import from tools, inverting the typical dependency direction.
apps/sim/lib/api-key/byok.ts Simple change to import BYOKProviderId from tools/types instead of defining it locally; this introduces a libtools dependency that inverts the expected module hierarchy.
apps/sim/blocks/blocks/exa.ts Correctly splits the apiKey subblock into two variants: one with hideWhenHosted: true for non-research operations, and one always-visible for exa_research (which has no hosted key support).
apps/sim/lib/workflows/subblocks/visibility.ts Clean addition of isSubBlockHiddenByHostedKey utility; correctly reads the compile-time isHosted flag.
apps/sim/lib/core/rate-limiter/storage/db-token-bucket.ts Changes ELSE clause in token-bucket SQL from returning the current token count (without consuming) to returning -1, fixing a silent bug where tokens weren't decremented when no refill was due.
apps/sim/tools/exa/search.ts Adds hosting config with custom pricing (using _costDollars from Exa API response) and per-request rate limiting; fallback pricing logic for when _costDollars is missing looks reasonable.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts Comprehensive test coverage for acquireKey and reportUsage including rate limiting, dimension exhaustion, partial key availability, and error handling edge cases.
apps/sim/executor/handlers/generic/generic-handler.ts Removes the knowledge-block-specific cost transformation from the generic handler; this logic has been moved into tools/knowledge/search.ts and upload_chunk.ts where it belongs.
apps/sim/lib/billing/core/usage-log.ts Widens FixedUsageMetadata from Record<string, never> to Record<string, unknown> and adds an optional metadata param to LogFixedUsageParams — straightforward extensibility change.

Sequence Diagram

sequenceDiagram
    participant User as User/Workflow
    participant ET as executeTool()
    participant BYOK as getBYOKKey()
    participant RL as HostedKeyRateLimiter
    participant DB as DB Token Bucket
    participant API as External API (Exa)

    User->>ET: executeTool(toolId, params, ctx)
    ET->>ET: injectHostedKeyIfNeeded()

    alt Tool has hosting config AND isHosted=true
        ET->>BYOK: check workspace BYOK key
        alt BYOK key found
            BYOK-->>ET: user key (no billing)
        else No BYOK key
            ET->>RL: acquireKey(provider, prefix, config, workspaceId)
            RL->>DB: consumeTokens(actor bucket, 1 token)
            DB-->>RL: allowed or denied
            alt Actor rate limited
                RL-->>ET: billingActorRateLimited=true
                ET-->>User: Error 429 (rate limited)
            else No hosted keys available
                RL-->>ET: success=false
                ET-->>User: Error 503 (no keys)
            else Key acquired
                RL-->>ET: hosted key value
                ET->>ET: inject key into params
            end
        end
    end

    ET->>API: executeWithRetry(executeToolRequest)
    loop Retry up to 3x on 429 or 503
        API-->>ET: 429 Too Many Requests
        ET->>ET: exponential backoff delay
    end
    API-->>ET: 200 OK with response data

    alt isUsingHostedKey AND success
        ET->>RL: reportUsage() for custom dimensions
        ET->>ET: calculateToolCost() via pricing config
        ET->>DB: logFixedUsage() for audit trail
        ET->>ET: attach cost to output
    end

    ET->>ET: stripInternalFields() remove underscore fields
    ET-->>User: ToolResult (cost attached if hosted key used)
Loading

Comments Outside Diff (2)

  1. apps/sim/tools/exa/types.ts, line 47-50 (link)

    Type mismatch: costDollars vs _costDollars

    The response type declarations add costDollars? (without underscore), but all four Exa tool implementations (search.ts, get_contents.ts, find_similar_links.ts, answer.ts) set _costDollars (with underscore) in their transformResponse output. The declared field is never actually set, while the real internal field is not typed.

    This mismatch means any TypeScript consumer trusting the type annotation for ExaSearchResponse.output.costDollars will find it always undefined, while the actual internal field (_costDollars) that drives billing is untyped.

    The costDollars? field in the type interface should be changed to _costDollars? to match the implementation, or omitted entirely since it's an internal field stripped before returning to callers.

    This same issue appears in ExaGetContentsResponse, ExaFindSimilarLinksResponse, and ExaAnswerResponse.

  2. apps/sim/lib/api-key/byok.ts, line 18 (link)

    lib importing from tools inverts the intended dependency direction

    lib/api-key/byok.ts now imports BYOKProviderId from @/tools/types. Library code in lib/ is typically consumed by tools/, not the other way around. This creates an inward dependency (libtools) that could complicate future refactoring and makes the lib/ module less reusable independently of the tools layer.

    Since BYOKProviderId is conceptually a cross-cutting type (used by both the BYOK mechanism and tool definitions), it would be better placed in a shared types module — e.g., lib/api-key/types.ts — and imported from there by both lib/api-key/byok.ts and tools/types.ts.

Last reviewed commit: a90777a

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

22 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@TheodoreSpeaks TheodoreSpeaks changed the title feat(hosted key): Add hosted key, exa as shared key4. feat(hosted key): Add hosted key, exa as shared key Feb 16, 2026
@TheodoreSpeaks TheodoreSpeaks marked this pull request as draft March 5, 2026 19:30
@TheodoreSpeaks TheodoreSpeaks marked this pull request as ready for review March 6, 2026 03:22
@cursor
Copy link

cursor bot commented Mar 6, 2026

PR Summary

High Risk
Touches core tool execution, billing usage logging, and rate-limiting storage semantics; mistakes could cause incorrect billing, throttling, or API key leakage/usage across workspaces.

Overview
Adds first-class hosted API key support to tool execution: tools can declare hosting config, and on hosted deployments executeTool now injects a workspace BYOK key if present or else selects a shared env key (round-robin) with per-workspace rate limiting, 429/503 handling, and exponential-backoff retries; hosted-key executions also log fixed-usage costs (with optional metadata) and strip internal __* fields from outputs.

Extends BYOK support to the new exa provider (API + settings UI), and introduces hideWhenHosted subblock gating so hosted deployments hide tool API-key fields in the editor, block preview, and serializer; the Exa block/tool configs are updated accordingly, including capturing Exa response cost breakdown for billing.

Refactors cost shaping: removes generic-handler output cost extraction and instead restructures cost/token/model fields in knowledge tool transformResponse; also broadens FixedUsageMetadata and adjusts DB token-bucket behavior to return a sentinel on insufficient tokens, plus adds telemetry + tests for the new hosted-key rate limiter and tool hosting behavior.

Written by Cursor Bugbot for commit 34cffdc. This will update automatically on new commits. Configure here.

throw error
}

const output = result.output
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to knowledge block transformation, so generic handler doesn't need to handle special cases

)::numeric
) - ${requestedTokens}::numeric
ELSE ${rateLimitBucket.tokens}::numeric
ELSE -1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB rate limiting didn't work since tokens was always 0. Set the else case to -1 so rate limiting applies successfully.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for 2 of the 3 issues found in the latest run.

  • ✅ Fixed: Key request counters grow unbounded, never decrement
    • Added key release/decrement logic in HostedKeyRateLimiter and wired executeTool to always release acquired hosted keys in a finally block so counters track in-flight load.
  • ✅ Fixed: Hosted key injection ignores user-provided API key
    • injectHostedKeyIfNeeded now returns early when the configured API-key parameter is already present, preserving user-supplied keys in hosted mode.

Create PR

Or push these changes by commenting:

@cursor push 80f7b70ba9
Preview (80f7b70ba9)
diff --git a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
--- a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
+++ b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts
@@ -155,6 +155,41 @@
       expect(r4.keyIndex).toBe(0) // Wraps back
     })
 
+    it('should rebalance key selection after releasing an in-flight key', async () => {
+      const allowedResult: ConsumeResult = {
+        allowed: true,
+        tokensRemaining: 9,
+        resetAt: new Date(Date.now() + 60000),
+      }
+      mockAdapter.consumeTokens.mockResolvedValue(allowedResult)
+
+      const r1 = await rateLimiter.acquireKey(
+        testProvider,
+        envKeyPrefix,
+        perRequestRateLimit,
+        'workspace-1'
+      )
+      const r2 = await rateLimiter.acquireKey(
+        testProvider,
+        envKeyPrefix,
+        perRequestRateLimit,
+        'workspace-2'
+      )
+
+      expect(r1.keyIndex).toBe(0)
+      expect(r2.keyIndex).toBe(1)
+
+      rateLimiter.releaseKey(testProvider, 0)
+
+      const r3 = await rateLimiter.acquireKey(
+        testProvider,
+        envKeyPrefix,
+        perRequestRateLimit,
+        'workspace-3'
+      )
+      expect(r3.keyIndex).toBe(0)
+    })
+
     it('should handle partial key availability', async () => {
       const allowedResult: ConsumeResult = {
         allowed: true,

diff --git a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
--- a/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
+++ b/apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts
@@ -52,7 +52,7 @@
  */
 export class HostedKeyRateLimiter {
   private storage: RateLimitStorageAdapter
-  /** In-memory request counters per key: "provider:keyIndex" -> count */
+  /** In-memory in-flight request counters per key: "provider:keyIndex" -> count */
   private keyRequestCounts = new Map<string, number>()
 
   constructor(storage?: RateLimitStorageAdapter) {
@@ -346,6 +346,21 @@
     const key = `${provider}:${keyIndex}`
     this.keyRequestCounts.set(key, (this.keyRequestCounts.get(key) ?? 0) + 1)
   }
+
+  private decrementKeyCount(provider: string, keyIndex: number): void {
+    const key = `${provider}:${keyIndex}`
+    const current = this.keyRequestCounts.get(key)
+    if (!current) return
+    if (current <= 1) {
+      this.keyRequestCounts.delete(key)
+      return
+    }
+    this.keyRequestCounts.set(key, current - 1)
+  }
+
+  releaseKey(provider: string, keyIndex: number): void {
+    this.decrementKeyCount(provider, keyIndex)
+  }
 }
 
 let cachedInstance: HostedKeyRateLimiter | null = null

diff --git a/apps/sim/tools/index.test.ts b/apps/sim/tools/index.test.ts
--- a/apps/sim/tools/index.test.ts
+++ b/apps/sim/tools/index.test.ts
@@ -26,6 +26,7 @@
       acquireKey: vi.fn(),
       preConsumeCapacity: vi.fn(),
       consumeCapacity: vi.fn(),
+      releaseKey: vi.fn(),
     },
   })
 )
@@ -1338,6 +1339,73 @@
     Object.assign(tools, originalTools)
   })
 
+  it('should not overwrite user-provided API key in hosted mode', async () => {
+    mockIsHosted.value = true
+    const mockTool = {
+      id: 'test_user_key_hosted',
+      name: 'Test User Key Hosted',
+      description: 'A test tool with hosting config',
+      version: '1.0.0',
+      params: {
+        apiKey: { type: 'string', required: true, visibility: 'user-only' as const },
+      },
+      hosting: {
+        envKeyPrefix: 'TEST_API',
+        apiKeyParam: 'apiKey',
+        byokProviderId: 'exa',
+        pricing: {
+          type: 'per_request' as const,
+          cost: 0.005,
+        },
+        rateLimit: {
+          mode: 'per_request' as const,
+          requestsPerMinute: 100,
+        },
+      },
+      request: {
+        url: '/api/test/endpoint',
+        method: 'POST' as const,
+        headers: (params: any) => ({
+          'Content-Type': 'application/json',
+          'x-api-key': params.apiKey,
+        }),
+      },
+      transformResponse: vi.fn().mockResolvedValue({
+        success: true,
+        output: { result: 'success' },
+      }),
+    }
+
+    const originalTools = { ...tools }
+    ;(tools as any).test_user_key_hosted = mockTool
+
+    global.fetch = Object.assign(
+      vi.fn().mockImplementation(async () => ({
+        ok: true,
+        status: 200,
+        headers: new Headers(),
+        json: () => Promise.resolve({ success: true }),
+      })),
+      { preconnect: vi.fn() }
+    ) as typeof fetch
+
+    const mockContext = createToolExecutionContext()
+    const result = await executeTool(
+      'test_user_key_hosted',
+      { apiKey: 'user-provided-key' },
+      false,
+      mockContext
+    )
+
+    expect(result.success).toBe(true)
+    expect(mockGetBYOKKey).not.toHaveBeenCalled()
+    expect(mockRateLimiterFns.acquireKey).not.toHaveBeenCalled()
+    expect(mockRateLimiterFns.releaseKey).not.toHaveBeenCalled()
+
+    Object.assign(tools, originalTools)
+    mockIsHosted.value = false
+  })
+
   it('should use per_request pricing model correctly', async () => {
     const mockTool = {
       id: 'test_per_request_pricing',

diff --git a/apps/sim/tools/index.ts b/apps/sim/tools/index.ts
--- a/apps/sim/tools/index.ts
+++ b/apps/sim/tools/index.ts
@@ -40,6 +40,8 @@
 interface HostedKeyInjectionResult {
   isUsingHostedKey: boolean
   envVarName?: string
+  provider?: string
+  keyIndex?: number
 }
 
 /**
@@ -57,6 +59,10 @@
   if (!isHosted) return { isUsingHostedKey: false }
 
   const { envKeyPrefix, apiKeyParam, byokProviderId, rateLimit } = tool.hosting
+  const existingApiKey = params[apiKeyParam]
+  if (existingApiKey !== undefined && existingApiKey !== null && existingApiKey !== '') {
+    return { isUsingHostedKey: false }
+  }
 
   // Check BYOK workspace key first
   if (byokProviderId && executionContext?.workspaceId) {
@@ -132,6 +138,8 @@
   return {
     isUsingHostedKey: true,
     envVarName: acquireResult.envVarName,
+    provider,
+    keyIndex: acquireResult.keyIndex,
   }
 }
 
@@ -547,6 +555,7 @@
   const startTime = new Date()
   const startTimeISO = startTime.toISOString()
   const requestId = generateRequestId()
+  let hostedKeyInfo: HostedKeyInjectionResult = { isUsingHostedKey: false }
 
   try {
     let tool: ToolConfig | undefined
@@ -615,7 +624,7 @@
     }
 
     // Inject hosted API key if tool supports it and user didn't provide one
-    const hostedKeyInfo = await injectHostedKeyIfNeeded(
+    hostedKeyInfo = await injectHostedKeyIfNeeded(
       tool,
       contextParams,
       executionContext,
@@ -930,6 +939,18 @@
         duration,
       },
     }
+  } finally {
+    if (
+      hostedKeyInfo.isUsingHostedKey &&
+      hostedKeyInfo.provider &&
+      typeof hostedKeyInfo.keyIndex === 'number'
+    ) {
+      try {
+        getHostedKeyRateLimiter().releaseKey(hostedKeyInfo.provider, hostedKeyInfo.keyIndex)
+      } catch (error) {
+        logger.warn(`[${requestId}] Failed to release hosted key for ${toolId}:`, error)
+      }
+    }
   }
 }
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

contextParams,
executionContext,
requestId
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required param validation runs before hosted key injection

High Severity

validateRequiredParametersAfterMerge is called on line 610 before injectHostedKeyIfNeeded on line 618. In hosted mode, the UI hides the apiKey subblock via hideWhenHosted: true and the serializer skips it via isSubBlockHiddenByHostedKey, so contextParams won't contain apiKey. But all Exa tools declare apiKey as required: true in their params. Validation would reject the missing required param before the hosted key ever gets a chance to be injected. This likely breaks all Exa hosted-key tool executions.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api key is passed via hosted key. Confirmed hosted key does pass validation and execute in local. This is not an issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot Autofix determined this is a false positive.

validateRequiredParametersAfterMerge only enforces required parameters with visibility: 'user-or-llm', so Exa's required apiKey (visibility: 'user-only') is not rejected before hosted-key injection.

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

requestId: string
): Promise<HostedKeyInjectionResult> {
if (!tool.hosting) return { isUsingHostedKey: false }
if (!isHosted) return { isUsingHostedKey: false }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hosted key injection ignores user-provided API key

Low Severity

injectHostedKeyIfNeeded is documented as injecting a key only "if user didn't provide one," but it never checks whether params[apiKeyParam] is already populated. In hosted mode, if a user provides their own API key (e.g., via direct API call), the function unconditionally overwrites it with either a BYOK key or a hosted key, and the user gets billed for hosted key usage despite supplying their own.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api key cannot be provided in the ui if sim is hosted. Therefore, this condition will never occur.

Comment on lines 614 to +620
throw new Error(`Tool not found: ${toolId}`)
}

// Inject hosted API key if tool supports it and user didn't provide one
const hostedKeyInfo = await injectHostedKeyIfNeeded(
tool,
contextParams,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripInternalFields is applied unconditionally to all tools

stripInternalFields removes all output keys starting with _ and is applied to every tool execution regardless of whether a hosted key was used. This is a behavior change that could silently break any existing tool whose output contains underscore-prefixed fields.

For example, if any other tool in the registry returns a field like _metadata, _debug, or _raw in its output, those fields would now be silently dropped without any warning or error.

The stripping should only occur when a hosted key was actually used (or at minimum scoped to Exa tools), since the underscore-prefix convention for internal fields is only established in this PR:

// Strip internal fields only when using hosted key
if (hostedKeyInfo.isUsingHostedKey) {
  const strippedOutput = stripInternalFields(finalResult.output || {})
  finalResult = { ...finalResult, output: strippedOutput }
}

Comment on lines +105 to +109
maxRetries: 0,
delayMs: acquireResult.retryAfterMs ?? 0,
userId: executionContext?.userId,
workspaceId: executionContext?.workspaceId,
workflowId: executionContext?.workflowId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

503 used as a rate-limit retry signal — may cause excessive retries on genuine service failures

isRateLimitError treats HTTP 503 (Service Unavailable) as a retriable rate-limit error. This means if the Exa API is genuinely down (and returns 503), executeWithRetry will retry 3 times with exponential backoff before surfacing the error to the user. Each retry could incur up to an 8-second wait (1s + 2s + 4s), plus the actual API timeout, causing a noticeably slow failure.

HTTP 503 is sometimes used for rate limiting, but it can also indicate a real outage. Exa's actual rate-limit response is 429. Consider restricting retries to 429 only, unless there's a confirmed reason to expect 503 from Exa as a throttle signal:

Suggested change
maxRetries: 0,
delayMs: acquireResult.retryAfterMs ?? 0,
userId: executionContext?.userId,
workspaceId: executionContext?.workspaceId,
workflowId: executionContext?.workflowId,
function isRateLimitError(error: unknown): boolean {
if (error && typeof error === 'object') {
const status = (error as { status?: number }).status
if (status === 429) return true
}
return false
}

Comment on lines +1 to +10
import { createLogger } from '@sim/logger'
import {
createStorageAdapter,
type RateLimitStorageAdapter,
type TokenBucketConfig,
} from '@/lib/core/rate-limiter/storage'
import {
type AcquireKeyResult,
type CustomRateLimit,
DEFAULT_BURST_MULTIPLIER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-memory keyRequestCounts singleton won't distribute load across serverless instances

The keyRequestCounts Map and cachedInstance singleton are in-process memory. In a serverless/edge deployment (e.g., Vercel), each function invocation may run in an isolated container, so the least-loaded key selection only tracks requests seen by that specific instance. Different instances will independently increment their counters starting from 0, giving no guarantee of even distribution across the key pool.

The PR description says keys are distributed "via round robin", but the implementation is "least loaded in memory." In practice, each cold-started instance will always pick key index 0 first, which concentrates traffic on the first key until that instance warms up.

Consider replacing the in-memory counter with a distributed counter (e.g., using the existing RateLimitStorageAdapter) so load is balanced globally, or document this limitation explicitly.

Comment on lines +110 to +154
})

const error = new Error(acquireResult.error || `Rate limit exceeded for ${tool.id}`)
;(error as any).status = 429
;(error as any).retryAfterMs = acquireResult.retryAfterMs
throw error
}

// Handle no keys configured (503)
if (!acquireResult.success) {
logger.error(`[${requestId}] No hosted keys configured for ${tool.id}: ${acquireResult.error}`)
const error = new Error(acquireResult.error || `No hosted keys configured for ${tool.id}`)
;(error as any).status = 503
throw error
}

params[apiKeyParam] = acquireResult.key
logger.info(`[${requestId}] Using hosted key for ${tool.id} (${acquireResult.envVarName})`, {
keyIndex: acquireResult.keyIndex,
provider,
})

return {
isUsingHostedKey: true,
envVarName: acquireResult.envVarName,
}
}

/**
* Check if an error is a rate limit (throttling) error
*/
function isRateLimitError(error: unknown): boolean {
if (error && typeof error === 'object') {
const status = (error as { status?: number }).status
// 429 = Too Many Requests, 503 = Service Unavailable (sometimes used for rate limiting)
if (status === 429 || status === 503) return true
}
return false
}

/** Context for retry with rate limit tracking */
interface RetryContext {
requestId: string
toolId: string
envVarName: string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-execution cost & usage block is duplicated across two code paths

The block that calls reportCustomDimensionUsage, processHostedKeyCost, and applies the cost to finalResult.output appears twice — once in the internal-route code path and once in the external-route code path. This adds ~40 lines of identical logic that must be kept in sync manually.

Consider extracting this into a shared helper (e.g., applyHostedKeyCostToResult) that both paths call after finalResult is available.

)::numeric
) - ${requestedTokens}::numeric
ELSE ${rateLimitBucket.tokens}::numeric
ELSE -1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token bucket denial discards actual remaining token balance

Low Severity

When a consume request is denied (insufficient tokens), the ELSE branch unconditionally sets stored tokens to -1, discarding the actual remaining balance. For example, if a bucket has 3 tokens and a request for 5 is denied, the balance drops to -1 instead of remaining at 3. On the next refill, available tokens become refillRate - 1 instead of 3 + refillRate, causing a permanent loss of those 3 tokens. The denial case ideally preserves the refilled balance without consuming.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

const costDollars = output.__costDollars as { total?: number } | undefined
if (costDollars?.total) {
return { cost: costDollars.total, metadata: { costDollars } }
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falsy check on cost causes incorrect fallback billing

Medium Severity

The getCost functions in all Exa tools use a truthiness check if (costDollars?.total) to decide whether to use the API-reported cost or a fallback estimate. Since 0 is falsy in JavaScript, a legitimate zero-cost response from Exa (e.g., cached results) would incorrectly fall through to the fallback pricing, billing the user a non-zero amount (e.g., $0.005) instead of $0. The check needs to be != null or !== undefined instead of a truthiness test.

Additional Locations (2)

Fix in Cursor Fix in Web

if (status === 429 || status === 503) return true
}
return false
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry treats 503 as rate-limit causing unnecessary retries

Low Severity

isRateLimitError classifies HTTP 503 (Service Unavailable) as a rate-limit error, causing executeWithRetry to retry with exponential backoff on genuine server outages. This delays error reporting by up to ~7 seconds (1s + 2s + 4s) for non-throttle 503s, and could mask persistent outages since the error message returned to the user would still be opaque. Only 429 reliably indicates rate limiting.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant