autobrowse: add --target stagehand to export + loop#117
Open
ziruihao wants to merge 1 commit into
Open
Conversation
Folds Stagehand codegen into the existing export/loop pipeline instead of
maintaining a separate /stagehand-export skill. The op walker stays shared;
only the emitter differs.
Stagehand-native: every interaction op (click_*, fill_*, select_*) collapses
into page.act("…") so the script self-heals across DOM drift. Deterministic
ops (goto, waits, keyboard, scroll, eval, page_nav) stay as raw page.* calls
since there's no element to locate. The extract step uses
page.extract({ instruction, schema }) with a one-sentence instruction
generated at export time (Haiku) or a generic fallback.
- New: scripts/lib/codegen-stagehand.mjs
- export.mjs: dispatches on --target playwright|stagehand
- loop.mjs: --target flag, target-scoped output dir / report / logs
- distill-failure.mjs: target-aware prompt + section header
(Recent Playwright Failures vs Recent Stagehand Failures)
- SKILL.md: documents the integrated --target stagehand flow,
removes stale /stagehand-export references
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Folds Stagehand codegen into the existing
export.mjs/loop.mjspipeline behind a new--target stagehandflag, instead of maintaining the standalone/stagehand-exportskill that was referenced inSKILL.mdbut never actually existed.The op walker (
command-mapping.mjs) is target-agnostic and stays shared. Only the emitter differs.Design — stagehand-native
click_*,fill_*,select_*,unhandled) collapses intoawait page.act("…")so the script self-heals across DOM drift.goto,wait_*,press,scroll,eval,page_nav,type_focused) stay as rawpage.*calls — there's no element to find, so no LLM cost.click the button "Continue" — turn 5: confirm form).page.extract({ instruction, schema })with a one-sentence instruction generated by a tiny Haiku call at export time (~$0.001), or a generic fallback ifANTHROPIC_API_KEYis missing.selectors.cache.jsonand theforceCheck/forceClickRadio/selectWithFallback/reactFillhelpers don't carry over.Changes
skills/autobrowse/scripts/lib/codegen-stagehand.mjs— emitter, script wrapper, package.json/tsconfig scaffolds.scripts/export.mjs— dispatches on--target playwright|stagehand; target-aware stats line.scripts/loop.mjs—--targetflag flows through; output dir, report path, log labels are target-scoped.scripts/lib/distill-failure.mjs— target-aware prompt (Stagehand failures usually mean a vague act() instruction / timing / extract issue; Playwright failures usually mean a broken locator / actionability / timing issue) and target-scoped section header (## Recent Stagehand Failuresvs## Recent Playwright Failures).scripts/lib/pick-run.mjs— removed stalestagehand-exportcomment.SKILL.md— documents--target stagehandfor bothexport.mjsandloop.mjs; removes the/stagehand-exportreference.Runtime contract
The emitted Stagehand script:
BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID(usesenv: \"BROWSERBASE\"); falls back toenv: \"LOCAL\"when absent.BROWSERBASE_CONTEXT_IDfor pre-authed sessions viabrowserSettings.context.STAGEHAND_MODELenv var; defaults to a current Claude Sonnet.Test plan
node --checkon all modified filesnode scripts/export.mjs --helprenders the new target linenode scripts/loop.mjs --helprenders the new--targetflag--target bogusis rejected with a clear error on both scriptsnode scripts/loop.mjs --task <existing-task> --target stagehand --env remoteagainst a task that already has passing tracespackage.jsondeps install cleanly andnpx tsx <task>.tsruns🤖 Generated with Claude Code
Note
Medium Risk
Adds new codegen/export/loop tooling that provisions Browserbase sessions, runs generated scripts, and optionally makes LLM calls for extraction and failure distillation; failures could affect automation reliability and incur unexpected external/API costs.
Overview
Adds a new deterministic replay pipeline for autobrowse tasks via
scripts/export.mjsandscripts/loop.mjs, generating runnable TypeScript artifacts from the latest passingtrace.jsonand verifying them by runningnpm install+tsx.export.mjsnow supports--target playwright|stagehand: Playwright resolves snapshot refs into ranked locator candidates and can generate a final extract block via Haiku; Stagehand emits self-healingpage.act(...)steps plus apage.extract(...)instruction.evaluate.mjsgains--max-turnsand optional pre-attached Browserbase session support viaBROWSERBASE_CONTEXT_ID(rewriting agentbrowsecommands to--connectand no-oping session lifecycle commands).Documentation in
SKILL.mdis expanded to cover export/loop usage, deterministic outputs, and persistent-context sessions, and new helper libs are added for trace→op mapping, task output→Zod schema parsing, run selection, selector resolution, verification, and replay-failure distillation into target-scopedstrategy.mdsections.Reviewed by Cursor Bugbot for commit a7180dd. Bugbot is set up for automated code reviews on this repo. Configure here.