fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159
fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159yawbtng wants to merge 2 commits into
Conversation
🦋 Changeset detectedLatest commit: 246ef50 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b7e1b1861
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } from "../flowlogger/FlowLogger.js"; | ||
| import { v7 as uuidv7 } from "uuid"; | ||
|
|
||
| const IMAGE_DATA_URL_PATTERN = /^data:(image\/[a-zA-Z0-9.+-]+);base64,(.*)$/; |
There was a problem hiding this comment.
Handle newline-wrapped base64 data URLs
The new parser regex uses (.*) for the payload, but . does not match newlines in JavaScript regexes, so valid data:image/...;base64,... strings that contain line breaks will fail to match and fall back to returning the entire data URL as inlineData.data. In that case Google receives data:image/...;base64,... instead of raw base64 bytes, which can break function-response image decoding. This is a regression from the previous replace(/^data:image\/png;base64,/, "") behavior, which stripped the prefix regardless of payload newlines.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in 246ef50. Switched (.*) to ([\s\S]*) so the payload group captures newline-wrapped (MIME-style) base64, and added a regression test (parses a PNG data URL with newline-wrapped base64 (MIME-style)) so this can't drift back. The new behavior is now a strict superset of the previous replace(/^data:image\/png;base64,/, "") — wrapped payloads, JPEG/WebP, raw base64, and unrecognized URLs all handled. 6/6 tests pass locally.
|
Hi @miguelg719 — gentle bump on this one. Small fix for #2046 ( One direction question while I have your attention: there's a sibling bug for Anthropic (#2035) with the identical fix shape. Happy to either land this as-is and mirror it for Anthropic, or refactor |
|
hey @yawbtng thanks for opening this! i think the parser approach you have here is reasonable, but i think a better approach would be to change the contract of the screenshot provider API such that it is more explicit, & returns the MIME type along with the base64 data. right now something like: type ScreenshotProviderResult = {
base64: string;
mediaType: "image/png" | "image/jpeg";
};& then provider: () => Promise<ScreenshotProviderResult>the default CUA handler should still capture PNG explicitly: const screenshot = await page.screenshot({
fullPage: false,
type: "png",
});
return {
base64: screenshot.toString("base64"),
mediaType: "image/png",
};& then the clients should pass that through:
this would keep PNG the default, but moves the MIME declaration to the actual screenshot capture boundary instead of hardcoding it (or introspecting it) independently in each CUA client you'll also need to update relevant CUA client tests & the public API type test for |
|
Thanks for the detailed direction @seanmcguire12! I think contract change is much cleaner than the per-client parsing. Opened #2300 following the approach: |
## why Closes #2046. This is the reshaped version of #2159, following the approach @seanmcguire12 outlined when closing that PR. `setScreenshotProvider` returned a bare base64 string, so every CUA client had to independently infer or hardcode the media type — all four assumed `image/png`. A non-PNG screenshot (e.g. a JPEG from a custom provider) was then mislabeled as PNG in the provider function-response payload, which is the root of #2046. Clients also stripped a hardcoded `data:image/png;base64,` prefix by regex, so any other prefix silently broke. ## what changed Move the media-type declaration to the capture boundary. `setScreenshotProvider` now returns an explicit payload: ```ts export interface ScreenshotProviderResult { base64: string; mediaType: "image/png" | "image/jpeg"; } ``` - **Default handler** (`v3CuaAgentHandler`) captures PNG explicitly (`type: "png"`) and returns `{ base64, mediaType: "image/png" }`, so the default is unchanged. - **Anthropic**: `media_type: screenshot.mediaType`, `data: screenshot.base64` (drops the `.replace(/^data:image\/png;base64,/, "")`). - **Google**: `mimeType: screenshot.mediaType` (drops the PNG-only prefix strip). - **OpenAI / Microsoft**: build `data:${screenshot.mediaType};base64,${screenshot.base64}`. - `options.base64Image` (caller-supplied) still defaults to `image/png`, preserving existing behavior. `ScreenshotProviderResult` is exported from the public entrypoint. ## testing - New `cua-screenshot-mediatype.test.ts`: asserts a non-PNG (`image/jpeg`) media type is honored by all four clients' `captureScreenshot()`, and that the `options.base64Image` path still defaults to png. - Updated the public API type test for `setScreenshotProvider(...)` and the Anthropic/Microsoft CUA client tests to the new provider shape. - `pnpm --filter @browserbasehq/stagehand run typecheck` passes; the CUA + public-API unit suites are green (55 tests). <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Declare the screenshot media type at the capture boundary and pass it through all CUA clients. Fixes non‑PNG screenshots being mislabeled as PNG and removes PNG-only prefix stripping. - **Bug Fixes** - `setScreenshotProvider` now returns `{ base64, mediaType }` (`ScreenshotProviderResult`) instead of a string. - Default handler explicitly captures PNG and returns `image/png`. - Clients: Anthropic/Google pass `mediaType` through; OpenAI/Microsoft build `data:${mediaType};base64,${base64}`; removed PNG-only prefix regex. - `options.base64Image` still defaults to `image/png`. - Added tests validating JPEG flows through all clients; updated public API type tests. - **Migration** - If you provide a custom `setScreenshotProvider`, return `{ base64, mediaType: "image/png" | "image/jpeg" }` instead of a base64 string. - No changes needed if you use the built-in handler. <sup>Written for commit affd2ad. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2300?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. -->
why
Closes #2046.
GoogleCUAClientbuilds function-response image parts withinlineData.mimeType = "image/png"and strips the data URL prefix using a PNG-only regex. If a screenshot is JPEG/WebP (or anything non-PNG), the metadata is wrong and the regex no-ops — the whole data URL ends up as the payload.what changed
parseImageDataUrl(input)helper that extracts{ mimeType, data }from animage/<type>data URL, falling back toimage/png+ the raw input for unrecognized formats (preserves prior behavior).mimeTypeat the function-response call site with the helper's parsed values.captureScreenshot()— it still normalizes raw provider input to PNG data URLs, matching the issue's stated current contract.test plan
New
packages/core/tests/unit/google-cua-client.test.tscovering:image/png+ base64 payloadimage/jpeg+ base64 payloadimage/webp+ base64 payloadimage/pngwith the input as datatext/plain) → fallback toimage/pngAlso ran
pnpm run typecheckandpnpm run eslintinpackages/core— both clean.Summary by cubic
Fixes screenshot MIME parsing in
GoogleCUAClientfunction responses by deriving the type and base64 payload from image data URLs, including newline-wrapped payloads. Addresses Linear issue #2046 and ensures JPEG/WebP screenshots set the correctinlineData.mimeType, with a PNG fallback for raw inputs.parseImageDataUrl(input)to extract{ mimeType, data }fromdata:image/*;base64,..., handling newline-wrapped base64.mimeTypewith parsed values.Written for commit 246ef50. Summary will update on new commits. Review in cubic