Skip to content

autobrowse: add --target stagehand to export + loop#117

Open
ziruihao wants to merge 1 commit into
aq/autobrowse-iterative-playwright-loopfrom
claude/frosty-shtern-227c6d
Open

autobrowse: add --target stagehand to export + loop#117
ziruihao wants to merge 1 commit into
aq/autobrowse-iterative-playwright-loopfrom
claude/frosty-shtern-227c6d

Conversation

@ziruihao
Copy link
Copy Markdown
Contributor

@ziruihao ziruihao commented May 22, 2026

Summary

Folds Stagehand codegen into the existing export.mjs / loop.mjs pipeline behind a new --target stagehand flag, instead of maintaining the standalone /stagehand-export skill that was referenced in SKILL.md but never actually existed.

The op walker (command-mapping.mjs) is target-agnostic and stays shared. Only the emitter differs.

Design — stagehand-native

  • Every interaction op (click_*, fill_*, select_*, unhandled) collapses into await page.act("…") so the script self-heals across DOM drift.
  • Deterministic ops (goto, wait_*, press, scroll, eval, page_nav, type_focused) stay as raw page.* calls — there's no element to find, so no LLM cost.
  • Ref-based ops still resolve the snapshot just enough to extract role + name for a richer act() instruction (e.g. click the button "Continue" — turn 5: confirm form).
  • Final extract uses page.extract({ instruction, schema }) with a one-sentence instruction generated by a tiny Haiku call at export time (~$0.001), or a generic fallback if ANTHROPIC_API_KEY is missing.
  • Selector-resolver (Playwright's locator-ranking machinery) is unused on this path — Stagehand is itself the self-healing layer, so selectors.cache.json and the forceCheck/forceClickRadio/selectWithFallback/reactFill helpers don't carry over.

Changes

  • New: skills/autobrowse/scripts/lib/codegen-stagehand.mjs — emitter, script wrapper, package.json/tsconfig scaffolds.
  • scripts/export.mjs — dispatches on --target playwright|stagehand; target-aware stats line.
  • scripts/loop.mjs--target flag flows through; output dir, report path, log labels are target-scoped.
  • scripts/lib/distill-failure.mjs — target-aware prompt (Stagehand failures usually mean a vague act() instruction / timing / extract issue; Playwright failures usually mean a broken locator / actionability / timing issue) and target-scoped section header (## Recent Stagehand Failures vs ## Recent Playwright Failures).
  • scripts/lib/pick-run.mjs — removed stale stagehand-export comment.
  • SKILL.md — documents --target stagehand for both export.mjs and loop.mjs; removes the /stagehand-export reference.

Runtime contract

The emitted Stagehand script:

  • Honors BROWSERBASE_API_KEY / BROWSERBASE_PROJECT_ID (uses env: \"BROWSERBASE\"); falls back to env: \"LOCAL\" when absent.
  • Honors BROWSERBASE_CONTEXT_ID for pre-authed sessions via browserSettings.context.
  • Model selectable via STAGEHAND_MODEL env var; defaults to a current Claude Sonnet.

Test plan

  • node --check on all modified files
  • node scripts/export.mjs --help renders the new target line
  • node scripts/loop.mjs --help renders the new --target flag
  • --target bogus is rejected with a clear error on both scripts
  • End-to-end: node scripts/loop.mjs --task <existing-task> --target stagehand --env remote against a task that already has passing traces
  • Confirm the Stagehand script's package.json deps install cleanly and npx tsx <task>.ts runs

🤖 Generated with Claude Code


Note

Medium Risk
Adds new codegen/export/loop tooling that provisions Browserbase sessions, runs generated scripts, and optionally makes LLM calls for extraction and failure distillation; failures could affect automation reliability and incur unexpected external/API costs.

Overview
Adds a new deterministic replay pipeline for autobrowse tasks via scripts/export.mjs and scripts/loop.mjs, generating runnable TypeScript artifacts from the latest passing trace.json and verifying them by running npm install + tsx.

export.mjs now supports --target playwright|stagehand: Playwright resolves snapshot refs into ranked locator candidates and can generate a final extract block via Haiku; Stagehand emits self-healing page.act(...) steps plus a page.extract(...) instruction. evaluate.mjs gains --max-turns and optional pre-attached Browserbase session support via BROWSERBASE_CONTEXT_ID (rewriting agent browse commands to --connect and no-oping session lifecycle commands).

Documentation in SKILL.md is expanded to cover export/loop usage, deterministic outputs, and persistent-context sessions, and new helper libs are added for trace→op mapping, task output→Zod schema parsing, run selection, selector resolution, verification, and replay-failure distillation into target-scoped strategy.md sections.

Reviewed by Cursor Bugbot for commit a7180dd. Bugbot is set up for automated code reviews on this repo. Configure here.

Folds Stagehand codegen into the existing export/loop pipeline instead of
maintaining a separate /stagehand-export skill. The op walker stays shared;
only the emitter differs.

Stagehand-native: every interaction op (click_*, fill_*, select_*) collapses
into page.act("…") so the script self-heals across DOM drift. Deterministic
ops (goto, waits, keyboard, scroll, eval, page_nav) stay as raw page.* calls
since there's no element to locate. The extract step uses
page.extract({ instruction, schema }) with a one-sentence instruction
generated at export time (Haiku) or a generic fallback.

- New: scripts/lib/codegen-stagehand.mjs
- export.mjs: dispatches on --target playwright|stagehand
- loop.mjs: --target flag, target-scoped output dir / report / logs
- distill-failure.mjs: target-aware prompt + section header
  (Recent Playwright Failures vs Recent Stagehand Failures)
- SKILL.md: documents the integrated --target stagehand flow,
  removes stale /stagehand-export references

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ziruihao ziruihao changed the base branch from main to aq/autobrowse-iterative-playwright-loop May 22, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant