fix(core): declare CUA screenshot media type at capture boundary (#2300)#2306
Open
seanmcguire12 wants to merge 5 commits into
Open
fix(core): declare CUA screenshot media type at capture boundary (#2300)#2306seanmcguire12 wants to merge 5 commits into
seanmcguire12 wants to merge 5 commits into
Conversation
## why Closes #2046. This is the reshaped version of #2159, following the approach @seanmcguire12 outlined when closing that PR. `setScreenshotProvider` returned a bare base64 string, so every CUA client had to independently infer or hardcode the media type — all four assumed `image/png`. A non-PNG screenshot (e.g. a JPEG from a custom provider) was then mislabeled as PNG in the provider function-response payload, which is the root of #2046. Clients also stripped a hardcoded `data:image/png;base64,` prefix by regex, so any other prefix silently broke. ## what changed Move the media-type declaration to the capture boundary. `setScreenshotProvider` now returns an explicit payload: ```ts export interface ScreenshotProviderResult { base64: string; mediaType: "image/png" | "image/jpeg"; } ``` - **Default handler** (`v3CuaAgentHandler`) captures PNG explicitly (`type: "png"`) and returns `{ base64, mediaType: "image/png" }`, so the default is unchanged. - **Anthropic**: `media_type: screenshot.mediaType`, `data: screenshot.base64` (drops the `.replace(/^data:image\/png;base64,/, "")`). - **Google**: `mimeType: screenshot.mediaType` (drops the PNG-only prefix strip). - **OpenAI / Microsoft**: build `data:${screenshot.mediaType};base64,${screenshot.base64}`. - `options.base64Image` (caller-supplied) still defaults to `image/png`, preserving existing behavior. `ScreenshotProviderResult` is exported from the public entrypoint. ## testing - New `cua-screenshot-mediatype.test.ts`: asserts a non-PNG (`image/jpeg`) media type is honored by all four clients' `captureScreenshot()`, and that the `options.base64Image` path still defaults to png. - Updated the public API type test for `setScreenshotProvider(...)` and the Anthropic/Microsoft CUA client tests to the new provider shape. - `pnpm --filter @browserbasehq/stagehand run typecheck` passes; the CUA + public-API unit suites are green (55 tests). <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Declare the screenshot media type at the capture boundary and pass it through all CUA clients. Fixes non‑PNG screenshots being mislabeled as PNG and removes PNG-only prefix stripping. - **Bug Fixes** - `setScreenshotProvider` now returns `{ base64, mediaType }` (`ScreenshotProviderResult`) instead of a string. - Default handler explicitly captures PNG and returns `image/png`. - Clients: Anthropic/Google pass `mediaType` through; OpenAI/Microsoft build `data:${mediaType};base64,${base64}`; removed PNG-only prefix regex. - `options.base64Image` still defaults to `image/png`. - Added tests validating JPEG flows through all clients; updated public API type tests. - **Migration** - If you provide a custom `setScreenshotProvider`, return `{ base64, mediaType: "image/png" | "image/jpeg" }` instead of a base64 string. - No changes needed if you use the built-in handler. <sup>Written for commit affd2ad. Summary will update on new commits.</sup> <a href="https://cubic.dev/pr/browserbase/stagehand/pull/2300?utm_source=github" target="_blank" rel="noopener noreferrer" data-no-image-dialog="true"><picture><source media="(prefers-color-scheme: dark)" srcset="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"><source media="(prefers-color-scheme: light)" srcset="https://www.cubic.dev/buttons/review-in-cubic-light.svg"><img alt="Review in cubic" src="https://www.cubic.dev/buttons/review-in-cubic-dark.svg"></picture></a> <!-- End of auto-generated description by cubic. -->
🦋 Changeset detectedLatest commit: 42fb085 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Contributor
There was a problem hiding this comment.
3 issues found across 12 files
Confidence score: 2/5
- The riskiest issue is a versioning mismatch in
.changeset/fix-screenshot-provider-mediatype.md: this appears to introduce a breakingsetScreenshotProviderreturn-shape change but is labeled like a patch, which could surprise downstream consumers with unplanned breakage—mark the changeset asmajor(or at leastminor) and add migration guidance before merging. - In
packages/core/lib/v3/agent/AnthropicCUAClient.ts,captureScreenshot()now assumes{ base64, mediaType }, so legacy providers returning a string can fail at runtime and break screenshot-dependent flows—add legacy string normalization (or a clear shape validation error) before merging. - In
packages/core/lib/v3/agent/OpenAICUAClient.ts, legacy string screenshot results can become malformed data URLs, creating silent bad screenshot payloads instead of actionable failures—normalize old string outputs to a PNG object shape or throw an explicit error incaptureScreenshot()before merging.
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Member
Author
|
@cubic-dev-ai review. keep in mind that the versioning decision is intentional based on this comment |
Contributor
@seanmcguire12 I have started the AI code review. It will take a few minutes to complete. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
thanks @yawbtng for the contribution here!
why
Closes #2046. This is the reshaped version of #2159, following the approach @seanmcguire12 outlined when closing that PR.
setScreenshotProviderreturned a bare base64 string, so every CUA client had to independently infer or hardcode the media type — all four assumedimage/png. A non-PNG screenshot (e.g. a JPEG from a custom provider) was then mislabeled as PNG in the provider function-response payload, which is the root of #2046. Clients also stripped a hardcodeddata:image/png;base64,prefix by regex, so any other prefix silently broke.what changed
Move the media-type declaration to the capture boundary.
setScreenshotProvidernow returns an explicit payload:v3CuaAgentHandler) captures PNG explicitly (type: "png") and returns{ base64, mediaType: "image/png" }, so the default is unchanged.media_type: screenshot.mediaType,data: screenshot.base64(drops the.replace(/^data:image\/png;base64,/, "")).mimeType: screenshot.mediaType(drops the PNG-only prefix strip).data:${screenshot.mediaType};base64,${screenshot.base64}.options.base64Image(caller-supplied) still defaults toimage/png, preserving existing behavior.ScreenshotProviderResultis exported from the public entrypoint.testing
cua-screenshot-mediatype.test.ts: asserts a non-PNG (image/jpeg) media type is honored by all four clients'captureScreenshot(), and that theoptions.base64Imagepath still defaults to png.setScreenshotProvider(...)and the Anthropic/Microsoft CUA client tests to the new provider shape.pnpm --filter @browserbasehq/stagehand run typecheckpasses; the CUASummary by cubic
Declare the screenshot media type at the capture boundary and thread it through all CUA clients to fix mislabeled images and remove PNG-only prefix handling. Non‑PNG screenshots (e.g. JPEG) now work end-to-end.
Bug Fixes
setScreenshotProvidernow returnsScreenshotProviderResult({ base64, mediaType }) instead of a string.type: "png") and returnsimage/png.mediaTypethrough; OpenAI/Microsoft builddata:${mediaType};base64,${base64}; removed PNG-only prefix stripping.captureScreenshot({ base64Image })accepts optionalmediaType; defaults toimage/png.Migration
setScreenshotProvider, return{ base64, mediaType: "image/png" | "image/jpeg" }instead of a base64 string. No changes needed with the built-in handler.Written for commit 42fb085. Summary will update on new commits.
why
what changed
test plan