Skip to content

Add support for LLM middleware#1872

Open
tkattkat wants to merge 6 commits intomainfrom
Add-support-for-LLM-middleware
Open

Add support for LLM middleware#1872
tkattkat wants to merge 6 commits intomainfrom
Add-support-for-LLM-middleware

Conversation

@tkattkat
Copy link
Collaborator

@tkattkat tkattkat commented Mar 23, 2026

why

  • In some scenarios, users need more granular control / access to the LLM at the provider leve

what changed

  • Added support for passing custom "middleware" in the model options

test plan

  • wrote tests
  • tested locally with own middleware

Summary by cubic

Adds first-class LLM middleware to intercept and wrap model calls for logging, usage tracking, or request transforms without changing call sites. Per-call middleware is used only for that call and is not cached.

  • New Features

    • Accepts middleware: LanguageModelV2Middleware in V3 model config; LLMProvider and getAISDKLanguageModel wrap models via wrapLanguageModel from ai.
    • Supports per-call overrides via model options; override middleware is passed through LLMProvider.getClient, applies only to that call, and is never cached.
    • Works across providers and preserves modelId, doGenerate, and doStream. Tests cover usage capture, streaming, errors, chaining, and edge cases.
  • Migration

    • To use: pass middleware in model config or a per-call override, e.g. { modelName: 'openai/gpt-4o', apiKey: '...', middleware }.
    • Middleware is only applied in direct/local execution and is not serialized over HTTP.

Written for commit b09e42c. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Mar 23, 2026

🦋 Changeset detected

Latest commit: b09e42c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@tkattkat tkattkat marked this pull request as draft March 23, 2026 18:25
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Confidence score: 3/5

  • In packages/core/lib/v3/v3.ts, per-request model.middleware passed via act/extract/observe overrides is being discarded, so middleware that callers expect to run will silently not execute.
  • The tests in packages/core/tests/unit/llm-middleware.test.ts validate JavaScript semantics (typeof, destructuring) and simulated internals more than app behavior, so they may give false confidence and miss this regression path.
  • Given the concrete runtime impact and high confidence on the middleware issue, this PR carries some merge risk until that path is corrected or covered by behavior-level tests.
  • Pay close attention to packages/core/lib/v3/v3.ts, packages/core/tests/unit/llm-middleware.test.ts - middleware override flow appears broken and current tests may not reliably detect it.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/tests/unit/llm-middleware.test.ts">

<violation number="1" location="packages/core/tests/unit/llm-middleware.test.ts:284">
P2: These tests verify JavaScript language semantics (destructuring and `typeof` operator) rather than application logic. Simulating internal functions provides a false sense of test coverage.</violation>
</file>

<file name="packages/core/lib/v3/v3.ts">

<violation number="1" location="packages/core/lib/v3/v3.ts:452">
P2: Per-request `model.middleware` is silently discarded, so middleware passed via `act/extract/observe` model overrides never runs.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User
    participant SH as Stagehand (V3)
    participant LP as LLMProvider
    participant AISDK as AI SDK (Internal)
    participant MW as Middleware (User Provided)
    participant Provider as Model Provider (OpenAI/Anthropic/etc.)

    Note over User,Provider: Initialization Flow

    User->>SH: new Stagehand({ model: { modelName, middleware, ... } })
    SH->>SH: CHANGED: resolveModelConfiguration()
    Note right of SH: Extracts middleware from model object
    SH->>LP: NEW: new LLMProvider(logger, middleware)

    Note over User,Provider: Runtime Request Flow (e.g., .act(), .extract())

    SH->>LP: getClient(modelName)
    LP->>AISDK: CHANGED: getAISDKLanguageModel(..., middleware)
    
    AISDK->>AISDK: Create base provider model
    
    alt NEW: Middleware provided
        AISDK->>AISDK: wrapLanguageModel({ model, middleware })
        Note right of AISDK: Applies AI SDK v2 Middleware
    end
    
    AISDK-->>LP: Returns (Wrapped) LanguageModelV2
    LP-->>SH: Returns AISdkClient

    SH->>LP: client.doGenerate(prompt)
    
    alt NEW: Middleware Active
        LP->>MW: wrapGenerate({ doGenerate, model, ... })
        MW->>Provider: doGenerate()
        Provider-->>MW: Raw result + Usage
        MW-->>LP: Result (intercepted/transformed)
    else No Middleware
        LP->>Provider: doGenerate()
        Provider-->>LP: Raw result
    end

    LP-->>SH: Return inference result
    SH-->>User: Return action/data

    Note over MW,Provider: Middleware only applies to local execution.<br/>Not serialized over HTTP/API.
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@tkattkat tkattkat marked this pull request as ready for review March 23, 2026 20:18
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Confidence score: 3/5

  • There is a concrete regression risk in packages/core/lib/v3/v3.ts: per-call middleware cache key collisions can cause one middleware to reuse another middleware’s cached LLM client, leading to incorrect runtime behavior.
  • The issue is high-confidence and user-facing (severity 7/10, confidence 9/10), which moves this PR into moderate merge risk rather than a low-risk merge.
  • The test concern in packages/core/tests/unit/llm-middleware.test.ts is lower severity but important: a tautological test won’t validate the real resolver path, so this bug class may slip through CI.
  • Pay close attention to packages/core/lib/v3/v3.ts and packages/core/tests/unit/llm-middleware.test.ts - fix cache key isolation and add a resolver-driven test to prevent recurrence.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/v3.ts">

<violation number="1" location="packages/core/lib/v3/v3.ts:486">
P1: Per-call middleware cache keys collide, so different middleware functions can reuse the wrong cached LLM client.</violation>
</file>

<file name="packages/core/tests/unit/llm-middleware.test.ts">

<violation number="1" location="packages/core/tests/unit/llm-middleware.test.ts:296">
P2: This test is tautological: it only destructures a local object instead of exercising the actual model-configuration resolver, so it won’t catch regressions in production logic.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User as User/Caller
    participant V3 as V3 Engine
    participant LP as LLMProvider
    participant SDK as AI SDK (External)
    participant MW as Middleware (User Provided)

    Note over User, MW: Initialization Flow (Global Middleware)

    User->>V3: new V3({ model: { ..., middleware } })
    V3->>V3: resolveModelConfiguration()
    V3->>LP: CHANGED: new LLMProvider(logger, middleware)
    V3->>LP: getClient(modelName, clientOptions)
    LP->>LP: getAISDKLanguageModel()
    LP->>SDK: NEW: wrapLanguageModel({ model, middleware })
    SDK-->>LP: Wrapped Model instance
    LP-->>V3: AISdkClient (using wrapped model)

    Note over User, MW: Execution Flow (Per-Call Override)

    User->>V3: act / extract / observe ({ model: { ..., middleware: overrideMW } })
    V3->>V3: NEW: Check overrideLlmClients cache (incl. middleware state)
    alt Cache Miss
        V3->>LP: getClient(..., { middleware: overrideMW })
        LP->>LP: NEW: effectiveMiddleware = overrideMW ?? globalMiddleware
        LP->>SDK: wrapLanguageModel({ model, effectiveMiddleware })
        SDK-->>LP: Wrapped Model instance
        LP-->>V3: AISdkClient
        V3->>V3: NEW: Cache client by model + options + middleware
    end

    Note over V3, MW: Runtime Interaction

    V3->>LP: llmClient.doGenerate(prompt)
    LP->>MW: NEW: wrapGenerate({ doGenerate, model })
    MW->>MW: Custom logic (Logging / Transformation)
    MW->>SDK: doGenerate()
    SDK-->>MW: Result + Usage Data
    MW-->>LP: Result
    LP-->>V3: Final Response
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@miguelg719
Copy link
Collaborator

@cubic-dev-ai

@cubic-dev-ai
Copy link
Contributor

cubic-dev-ai bot commented Mar 26, 2026

@cubic-dev-ai

@miguelg719 I have started the AI code review. It will take a few minutes to complete.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files

Confidence score: 3/5

  • There is a concrete regression risk in packages/core/lib/v3/v3.ts: cached clients in overrideLlmClients are keyed by model name + client options, so calls with different per-call middleware can incorrectly reuse an earlier client.
  • Because this issue is medium severity (6/10) with fairly solid confidence (7/10) and can change runtime behavior across requests, this carries some user-impacting risk if merged as-is.
  • This is likely straightforward to address by aligning cache keying with middleware-sensitive inputs; once fixed, the PR should be in safer shape to merge.
  • Pay close attention to packages/core/lib/v3/v3.ts - client cache reuse may ignore middleware differences and apply the wrong middleware path.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/v3.ts">

<violation number="1" location="packages/core/lib/v3/v3.ts:499">
P2: Per-call middleware is passed to getClient, but overrideLlmClients is still keyed only by modelName + clientOptions. That means a later call with a different middleware can reuse the cached client from the first call, so the override is ignored.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User as Caller/Client
    participant V3 as V3 Orchestrator
    participant Prov as LLMProvider
    participant AI as AI SDK (wrapLanguageModel)
    participant Model as LanguageModelV2

    Note over User, Model: Initialization with Global Middleware
    User->>V3: new Stagehand({ model: { middleware, ... } })
    V3->>V3: resolveModelConfiguration()
    V3->>Prov: NEW: new LLMProvider(logger, middleware)
    
    Note over User, Model: Execution with Per-Call Override
    User->>V3: act / extract / observe({ model: { middleware: perCallMw, ... } })
    
    V3->>V3: CHANGED: Determine effective middleware
    alt Per-call middleware provided
        V3->>V3: Use perCallMw
    else No per-call override
        V3->>V3: Use global middleware
    end

    V3->>Prov: getClient(modelName, options, { middleware })
    Prov->>Prov: getAISDKLanguageModel(..., middleware)
    
    alt Middleware exists
        Prov->>AI: NEW: wrapLanguageModel({ model, middleware })
        AI-->>Prov: Wrapped Model Instance
    else No middleware
        Prov-->>Prov: Raw Model Instance
    end
    
    Prov-->>V3: AISdkClient (with wrapped model)
    
    V3->>Model: doGenerate() / doStream()
    
    Note over Model: Middleware Interception Logic
    opt Middleware Active
        Model->>Model: NEW: wrapGenerate() / wrapStream()
        Note right of Model: Interceptor can log, track usage,<br/>or transform requests
    end

    Model-->>V3: Result (usage, content, etc.)
    V3-->>User: Response
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants