autobrowse: add --target stagehand to export + loop by ziruihao · Pull Request #117 · browserbase/skills

ziruihao · 2026-05-22T23:33:31Z

Summary

Folds Stagehand codegen into the existing export.mjs / loop.mjs pipeline behind a new --target stagehand flag, instead of maintaining the standalone /stagehand-export skill that was referenced in SKILL.md but never actually existed.

The op walker (command-mapping.mjs) is target-agnostic and stays shared. Only the emitter differs.

Design — stagehand-native

Every interaction op (click_*, fill_*, select_*, unhandled) collapses into await page.act("…") so the script self-heals across DOM drift.
Deterministic ops (goto, wait_*, press, scroll, eval, page_nav, type_focused) stay as raw page.* calls — there's no element to find, so no LLM cost.
Ref-based ops still resolve the snapshot just enough to extract role + name for a richer act() instruction (e.g. click the button "Continue" — turn 5: confirm form).
Final extract uses page.extract({ instruction, schema }) with a one-sentence instruction generated by a tiny Haiku call at export time (~$0.001), or a generic fallback if ANTHROPIC_API_KEY is missing.
Selector-resolver (Playwright's locator-ranking machinery) is unused on this path — Stagehand is itself the self-healing layer, so selectors.cache.json and the forceCheck/forceClickRadio/selectWithFallback/reactFill helpers don't carry over.

Changes

New: skills/autobrowse/scripts/lib/codegen-stagehand.mjs — emitter, script wrapper, package.json/tsconfig scaffolds.
scripts/export.mjs — dispatches on --target playwright|stagehand; target-aware stats line.
scripts/loop.mjs — --target flag flows through; output dir, report path, log labels are target-scoped.
scripts/lib/distill-failure.mjs — target-aware prompt (Stagehand failures usually mean a vague act() instruction / timing / extract issue; Playwright failures usually mean a broken locator / actionability / timing issue) and target-scoped section header (## Recent Stagehand Failures vs ## Recent Playwright Failures).
scripts/lib/pick-run.mjs — removed stale stagehand-export comment.
SKILL.md — documents --target stagehand for both export.mjs and loop.mjs; removes the /stagehand-export reference.

Runtime contract

The emitted Stagehand script:

Honors BROWSERBASE_API_KEY / BROWSERBASE_PROJECT_ID (uses env: \"BROWSERBASE\"); falls back to env: \"LOCAL\" when absent.
Honors BROWSERBASE_CONTEXT_ID for pre-authed sessions via browserSettings.context.
Model selectable via STAGEHAND_MODEL env var; defaults to a current Claude Sonnet.

Test plan

node --check on all modified files
node scripts/export.mjs --help renders the new target line
node scripts/loop.mjs --help renders the new --target flag
--target bogus is rejected with a clear error on both scripts
End-to-end: node scripts/loop.mjs --task <existing-task> --target stagehand --env remote against a task that already has passing traces
Confirm the Stagehand script's package.json deps install cleanly and npx tsx <task>.ts runs

🤖 Generated with Claude Code

Note

Medium Risk
Adds new codegen/export/loop tooling that provisions Browserbase sessions, runs generated scripts, and optionally makes LLM calls for extraction and failure distillation; failures could affect automation reliability and incur unexpected external/API costs.

Overview
Adds a new deterministic replay pipeline for autobrowse tasks via scripts/export.mjs and scripts/loop.mjs, generating runnable TypeScript artifacts from the latest passing trace.json and verifying them by running npm install + tsx.

export.mjs now supports --target playwright|stagehand: Playwright resolves snapshot refs into ranked locator candidates and can generate a final extract block via Haiku; Stagehand emits self-healing page.act(...) steps plus a page.extract(...) instruction. evaluate.mjs gains --max-turns and optional pre-attached Browserbase session support via BROWSERBASE_CONTEXT_ID (rewriting agent browse commands to --connect and no-oping session lifecycle commands).

Documentation in SKILL.md is expanded to cover export/loop usage, deterministic outputs, and persistent-context sessions, and new helper libs are added for trace→op mapping, task output→Zod schema parsing, run selection, selector resolution, verification, and replay-failure distillation into target-scoped strategy.md sections.

^{Reviewed by Cursor Bugbot for commit a7180dd. Bugbot is set up for automated code reviews on this repo. Configure here.}

Folds Stagehand codegen into the existing export/loop pipeline instead of maintaining a separate /stagehand-export skill. The op walker stays shared; only the emitter differs. Stagehand-native: every interaction op (click_*, fill_*, select_*) collapses into page.act("…") so the script self-heals across DOM drift. Deterministic ops (goto, waits, keyboard, scroll, eval, page_nav) stay as raw page.* calls since there's no element to locate. The extract step uses page.extract({ instruction, schema }) with a one-sentence instruction generated at export time (Haiku) or a generic fallback. - New: scripts/lib/codegen-stagehand.mjs - export.mjs: dispatches on --target playwright|stagehand - loop.mjs: --target flag, target-scoped output dir / report / logs - distill-failure.mjs: target-aware prompt + section header (Recent Playwright Failures vs Recent Stagehand Failures) - SKILL.md: documents the integrated --target stagehand flow, removes stale /stagehand-export references Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ziruihao changed the base branch from main to aq/autobrowse-iterative-playwright-loop May 22, 2026 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autobrowse: add --target stagehand to export + loop#117

autobrowse: add --target stagehand to export + loop#117
ziruihao wants to merge 1 commit into
aq/autobrowse-iterative-playwright-loopfrom
claude/frosty-shtern-227c6d

ziruihao commented May 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ziruihao commented May 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design — stagehand-native

Changes

Runtime contract

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ziruihao commented May 22, 2026 •

edited by cursor Bot

Loading