Skip to content

fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159

Closed
yawbtng wants to merge 2 commits into
browserbase:mainfrom
yawbtng:fix/google-cua-mime-type
Closed

fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159
yawbtng wants to merge 2 commits into
browserbase:mainfrom
yawbtng:fix/google-cua-mime-type

Conversation

@yawbtng

@yawbtng yawbtng commented May 22, 2026

Copy link
Copy Markdown
Contributor

why

Closes #2046. GoogleCUAClient builds function-response image parts with inlineData.mimeType = "image/png" and strips the data URL prefix using a PNG-only regex. If a screenshot is JPEG/WebP (or anything non-PNG), the metadata is wrong and the regex no-ops — the whole data URL ends up as the payload.

what changed

  • Added parseImageDataUrl(input) helper that extracts { mimeType, data } from an image/<type> data URL, falling back to image/png + the raw input for unrecognized formats (preserves prior behavior).
  • Replaced the hardcoded PNG strip + hardcoded mimeType at the function-response call site with the helper's parsed values.
  • No change to captureScreenshot() — it still normalizes raw provider input to PNG data URLs, matching the issue's stated current contract.

test plan

New packages/core/tests/unit/google-cua-client.test.ts covering:

  • PNG data URL → image/png + base64 payload
  • JPEG data URL → image/jpeg + base64 payload
  • WebP data URL → image/webp + base64 payload
  • Raw base64 → fallback to image/png with the input as data
  • Non-image data URL (e.g. text/plain) → fallback to image/png

Also ran pnpm run typecheck and pnpm run eslint in packages/core — both clean.


Summary by cubic

Fixes screenshot MIME parsing in GoogleCUAClient function responses by deriving the type and base64 payload from image data URLs, including newline-wrapped payloads. Addresses Linear issue #2046 and ensures JPEG/WebP screenshots set the correct inlineData.mimeType, with a PNG fallback for raw inputs.

  • Bug Fixes
    • Added parseImageDataUrl(input) to extract { mimeType, data } from data:image/*;base64,..., handling newline-wrapped base64.
    • Replaced PNG-only strip and hardcoded mimeType with parsed values.
    • Added unit tests for PNG, JPEG, WebP, newline-wrapped data, raw base64, and non-image data URLs.

Written for commit 246ef50. Summary will update on new commits. Review in cubic

@changeset-bot

changeset-bot Bot commented May 22, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 246ef50

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels May 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b7e1b1861

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

} from "../flowlogger/FlowLogger.js";
import { v7 as uuidv7 } from "uuid";

const IMAGE_DATA_URL_PATTERN = /^data:(image\/[a-zA-Z0-9.+-]+);base64,(.*)$/;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle newline-wrapped base64 data URLs

The new parser regex uses (.*) for the payload, but . does not match newlines in JavaScript regexes, so valid data:image/...;base64,... strings that contain line breaks will fail to match and fall back to returning the entire data URL as inlineData.data. In that case Google receives data:image/...;base64,... instead of raw base64 bytes, which can break function-response image decoding. This is a regression from the previous replace(/^data:image\/png;base64,/, "") behavior, which stripped the prefix regardless of payload newlines.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 246ef50. Switched (.*) to ([\s\S]*) so the payload group captures newline-wrapped (MIME-style) base64, and added a regression test (parses a PNG data URL with newline-wrapped base64 (MIME-style)) so this can't drift back. The new behavior is now a strict superset of the previous replace(/^data:image\/png;base64,/, "") — wrapped payloads, JPEG/WebP, raw base64, and unrecognized URLs all handled. 6/6 tests pass locally.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

@yawbtng

yawbtng commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Hi @miguelg719 — gentle bump on this one. Small fix for #2046 (GoogleCUAClient hardcoded PNG mimetype). Both automated reviewers are green, and I addressed Codex's P2 finding in 246ef50. Would love a maintainer ack to unblock the external-contributor CI pipeline when you have a moment.

One direction question while I have your attention: there's a sibling bug for Anthropic (#2035) with the identical fix shape. Happy to either land this as-is and mirror it for Anthropic, or refactor parseImageDataUrl into a shared util in this PR — whichever you'd prefer.

@seanmcguire12

Copy link
Copy Markdown
Member

hey @yawbtng thanks for opening this! i think the parser approach you have here is reasonable, but i think a better approach would be to change the contract of the screenshot provider API such that it is more explicit, & returns the MIME type along with the base64 data.

right now setScreenshotProvider returns a bare base64 string, so every CUA client has to infer or hardcode the media type. instead, can you update the contract to return an explicit image payload?

something like:

type ScreenshotProviderResult = {
  base64: string;
  mediaType: "image/png" | "image/jpeg";
};

& then setScreenshotProvider would accept:

provider: () => Promise<ScreenshotProviderResult>

the default CUA handler should still capture PNG explicitly:

const screenshot = await page.screenshot({
  fullPage: false,
  type: "png",
});

return {
  base64: screenshot.toString("base64"),
  mediaType: "image/png",
};

& then the clients should pass that through:

  • Anthropic: media_type: screenshot.mediaType
  • Google: mimeType: screenshot.mediaType
  • OpenAI/Microsoft: build data:${screenshot.mediaType};base64,${screenshot.base64}

this would keep PNG the default, but moves the MIME declaration to the actual screenshot capture boundary instead of hardcoding it (or introspecting it) independently in each CUA client

you'll also need to update relevant CUA client tests & the public API type test for AgentProvider.getClient().setScreenshotProvider(...) in the PR

@yawbtng

yawbtng commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed direction @seanmcguire12! I think contract change is much cleaner than the per-client parsing. Opened #2300 following the approach: setScreenshotProvider now returns { base64, mediaType }, the handler declares image/png at capture, and each client passes the media type through (Anthropic media_type, Google mimeType, OpenAI/Microsoft data URL). Updated the public API type test + CUA client tests and added coverage asserting a non-PNG media type is honored across all four clients.

seanmcguire12 pushed a commit that referenced this pull request Jul 2, 2026
## why

Closes #2046. This is the reshaped version of #2159, following the
approach @seanmcguire12 outlined when closing that PR.

`setScreenshotProvider` returned a bare base64 string, so every CUA
client had to independently infer or hardcode the media type — all four
assumed `image/png`. A non-PNG screenshot (e.g. a JPEG from a custom
provider) was then mislabeled as PNG in the provider function-response
payload, which is the root of #2046. Clients also stripped a hardcoded
`data:image/png;base64,` prefix by regex, so any other prefix silently
broke.

## what changed

Move the media-type declaration to the capture boundary.
`setScreenshotProvider` now returns an explicit payload:

```ts
export interface ScreenshotProviderResult {
  base64: string;
  mediaType: "image/png" | "image/jpeg";
}
```

- **Default handler** (`v3CuaAgentHandler`) captures PNG explicitly
(`type: "png"`) and returns `{ base64, mediaType: "image/png" }`, so the
default is unchanged.
- **Anthropic**: `media_type: screenshot.mediaType`, `data:
screenshot.base64` (drops the `.replace(/^data:image\/png;base64,/,
"")`).
- **Google**: `mimeType: screenshot.mediaType` (drops the PNG-only
prefix strip).
- **OpenAI / Microsoft**: build
`data:${screenshot.mediaType};base64,${screenshot.base64}`.
- `options.base64Image` (caller-supplied) still defaults to `image/png`,
preserving existing behavior.

`ScreenshotProviderResult` is exported from the public entrypoint.

## testing

- New `cua-screenshot-mediatype.test.ts`: asserts a non-PNG
(`image/jpeg`) media type is honored by all four clients'
`captureScreenshot()`, and that the `options.base64Image` path still
defaults to png.
- Updated the public API type test for `setScreenshotProvider(...)` and
the Anthropic/Microsoft CUA client tests to the new provider shape.
- `pnpm --filter @browserbasehq/stagehand run typecheck` passes; the CUA
+ public-API unit suites are green (55 tests).


<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Declare the screenshot media type at the capture boundary and pass it
through all CUA clients. Fixes non‑PNG screenshots being mislabeled as
PNG and removes PNG-only prefix stripping.

- **Bug Fixes**
- `setScreenshotProvider` now returns `{ base64, mediaType }`
(`ScreenshotProviderResult`) instead of a string.
  - Default handler explicitly captures PNG and returns `image/png`.
- Clients: Anthropic/Google pass `mediaType` through; OpenAI/Microsoft
build `data:${mediaType};base64,${base64}`; removed PNG-only prefix
regex.
  - `options.base64Image` still defaults to `image/png`.
- Added tests validating JPEG flows through all clients; updated public
API type tests.

- **Migration**
- If you provide a custom `setScreenshotProvider`, return `{ base64,
mediaType: "image/png" | "image/jpeg" }` instead of a base64 string.
  - No changes needed if you use the built-in handler.

<sup>Written for commit affd2ad.
Summary will update on new commits.</sup>

<a
href="https://cubic.dev/pr/browserbase/stagehand/pull/2300?utm_source=github"
target="_blank" rel="noopener noreferrer"
data-no-image-dialog="true"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source
media="(prefers-color-scheme: light)"
srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img
alt="Review in cubic"
src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a>

<!-- End of auto-generated description by cubic. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core(cua): Google function-response image handling hardcodes PNG mimeType

2 participants