Skip to content

fix(core): declare CUA screenshot media type at capture boundary#2300

Open
yawbtng wants to merge 1 commit into
browserbase:mainfrom
yawbtng:fix-screenshot-provider-mediatype
Open

fix(core): declare CUA screenshot media type at capture boundary#2300
yawbtng wants to merge 1 commit into
browserbase:mainfrom
yawbtng:fix-screenshot-provider-mediatype

Conversation

@yawbtng

@yawbtng yawbtng commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

why

Closes #2046. This is the reshaped version of #2159, following the approach @seanmcguire12 outlined when closing that PR.

setScreenshotProvider returned a bare base64 string, so every CUA client had to independently infer or hardcode the media type — all four assumed image/png. A non-PNG screenshot (e.g. a JPEG from a custom provider) was then mislabeled as PNG in the provider function-response payload, which is the root of #2046. Clients also stripped a hardcoded data:image/png;base64, prefix by regex, so any other prefix silently broke.

what changed

Move the media-type declaration to the capture boundary. setScreenshotProvider now returns an explicit payload:

export interface ScreenshotProviderResult {
  base64: string;
  mediaType: "image/png" | "image/jpeg";
}
  • Default handler (v3CuaAgentHandler) captures PNG explicitly (type: "png") and returns { base64, mediaType: "image/png" }, so the default is unchanged.
  • Anthropic: media_type: screenshot.mediaType, data: screenshot.base64 (drops the .replace(/^data:image\/png;base64,/, "")).
  • Google: mimeType: screenshot.mediaType (drops the PNG-only prefix strip).
  • OpenAI / Microsoft: build data:${screenshot.mediaType};base64,${screenshot.base64}.
  • options.base64Image (caller-supplied) still defaults to image/png, preserving existing behavior.

ScreenshotProviderResult is exported from the public entrypoint.

testing

  • New cua-screenshot-mediatype.test.ts: asserts a non-PNG (image/jpeg) media type is honored by all four clients' captureScreenshot(), and that the options.base64Image path still defaults to png.
  • Updated the public API type test for setScreenshotProvider(...) and the Anthropic/Microsoft CUA client tests to the new provider shape.
  • pnpm --filter @browserbasehq/stagehand run typecheck passes; the CUA + public-API unit suites are green (55 tests).

Summary by cubic

Declare the screenshot media type at the capture boundary and pass it through all CUA clients. Fixes non‑PNG screenshots being mislabeled as PNG and removes PNG-only prefix stripping.

  • Bug Fixes

    • setScreenshotProvider now returns { base64, mediaType } (ScreenshotProviderResult) instead of a string.
    • Default handler explicitly captures PNG and returns image/png.
    • Clients: Anthropic/Google pass mediaType through; OpenAI/Microsoft build data:${mediaType};base64,${base64}; removed PNG-only prefix regex.
    • options.base64Image still defaults to image/png.
    • Added tests validating JPEG flows through all clients; updated public API type tests.
  • Migration

    • If you provide a custom setScreenshotProvider, return { base64, mediaType: "image/png" | "image/jpeg" } instead of a base64 string.
    • No changes needed if you use the built-in handler.

Written for commit affd2ad. Summary will update on new commits.

Review in cubic

Closes browserbase#2046. Reshapes the screenshot-provider contract per review on
browserbase#2159: setScreenshotProvider now returns { base64, mediaType }
(ScreenshotProviderResult) instead of a bare base64 string, so the media
type is declared once at capture and passed through by every CUA client
rather than hardcoded or inferred per-client.

- handler captures PNG explicitly and returns image/png
- Anthropic: media_type: screenshot.mediaType
- Google: mimeType: screenshot.mediaType (removes PNG-only prefix strip)
- OpenAI/Microsoft: build data:${mediaType};base64,${base64}
- options.base64Image path defaults to image/png (unchanged behavior)

Updates the public API type test and CUA client tests, and adds
cua-screenshot-mediatype coverage asserting a non-PNG media type is
honored across all four clients.
@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: affd2ad

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels Jul 2, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 12 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core(cua): Google function-response image handling hardcodes PNG mimeType

1 participant