Add browsability skill — score how usable a site is for a browser agent by shubh24 · Pull Request #122 · browserbase/skills

shubh24 · 2026-05-30T01:55:00Z

What

Adds the browsability skill: an operational rubric + tooling for scoring how well an AI browser agent can drive a website's UI.

It's the sibling of agent-experience (which audits docs/SDK onboarding DX). This one is about operability — perceiving and driving the live DOM — not discoverability (no SEO/AEO/llms.txt).

The opinion

Browsability = how little help an agent needs to succeed, and how much harder the site is for an agent than for a human. Step-count is scored as the delta over a human baseline, so a genuinely long workflow isn't penalized — only the agent-specific tax (e.g. a custom dropdown that costs two steps where a native <select> costs one).

Score (0–100)

Axis	Pts	What
A · Access Resistance	30	lowest assistance rung (stealth / proxy / captcha ladder) a task needs to complete
B1 · Reachability	25	% of controls that survive the accessibility-tree prune
B3 · Structural traps	15	cross-origin iframes, shadow DOM, DOM depth/size
C · Agent tax	20	agent steps over the human baseline
D · Recoverability	10	self-heal / site errors / blocking overlays / step ceiling

Agent-native affordance (an API/deep-link path) is noted as a ceiling badge, not scored — this rubric measures UI operability.

SKILL.md — the workflow (probe → agent ladder → composite score → report).
references/rubric.md — the full code-grounded rubric, the assistance ladder, the delta framing, and a remediation table.
scripts/friction.ts — deterministic Drivability probe (B1 + B3) from a single page load via the browse CLI; no model needed.
scripts/score.ts — composite 0–100 scorer combining the probe with agent-run results.

Status

The Drivability slice (B1 + B3, 40 pts) runs today, deterministically.
The agent ladder (A/C/D, 60 pts) is scaffolded; it needs a model-driven reference agent (use the browser skill as the driver) to climb the rungs and record tasks.json. Without it, the scorer reports a Drivability-only score and marks A/C/D pending.

Notes

Grounded in what the open-source Stagehand framework treats as hard; uses only public Browserbase session settings.
solveCaptchas defaults to on, so an honest rung-0 baseline explicitly disables it (documented in the skill).
Verified: drivability probe + scorer run end-to-end on real sites.

🤖 Generated with Claude Code

Note

Low Risk
Documentation and local scoring scripts only; no changes to auth, production services, or existing skill runtime behavior.

Overview
Adds a new browsability skill that scores how well an AI browser agent can operate a site’s UI (not SEO/docs DX), and lists it in the root README skills table.

The skill documents a 0–100 rubric (Access Resistance ladder, reachability, structural traps, agent tax vs human baseline, recoverability) in SKILL.md and references/rubric.md, with a workflow: deterministic Drivability probe → optional agent ladder in tasks.json → composite report.

New Bun scripts drive the flow: friction.ts loads a URL via the browse CLI, evaluates DOM signals, and writes friction.json; score.ts merges that with optional task runs into browsability.json (full score or Drivability-only when ladder data is missing). Output goes under browsability-out/ (gitignored); MIT LICENSE is included.

^{Reviewed by Cursor Bugbot for commit 9c30e80. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds the `browsability` skill — an operational rubric for how well an AI *browser* agent can drive a website's UI (the sibling of agent-experience, which covers docs/SDK onboarding DX). Scores 0–100 across: A Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder) a task needs to complete B1 Reachability — % of controls that survive the accessibility-tree prune B3 Structural traps — cross-origin iframes, shadow DOM, DOM depth/size C Agent tax — agent steps OVER the human baseline (delta, not absolute) D Recoverability — self-heal / site errors / blocking overlays / step ceiling The Drivability slice (B1+B3) runs deterministically from one page load via scripts/friction.ts (no model). The full score adds an agent run across the assistance ladder via scripts/score.ts. Rubric grounded in what the open-source Stagehand framework treats as hard; uses only public Browserbase session settings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.}

cursor · 2026-05-30T02:05:10Z

+type Run = { rung: number; success: boolean; steps?: number; model?: string; note?: string };
+type Task = { name: string; type?: string; humanBaselineSteps?: number; runs?: Run[] };
+
+let a = 0, c = taxProxy, d = 10, tasks: Task[] = [], haveRuns = false;


Recoverability defaults to full marks when unassessed

Medium Severity

When no tasks.json is provided (drivability-only mode), d is initialized to 10 — giving full recoverability marks for an axis that was never assessed. Meanwhile a (Access Resistance) correctly defaults to 0 when unknown. This asymmetry inflates the reported score by 10 points. The SKILL.md documentation explicitly states the drivability-only score is "B1 + B3, 40 max" with A/C/D pending, but the code can produce up to 70. d needs to default to 0 to match the pessimistic stance used for a and the documented behavior.

Additional Locations (1)

skills/browsability/SKILL.md#L87-L89

^{Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.}

cursor · 2026-05-30T02:05:10Z

+
+const total = Math.round(a + b1 + b3 + c + d);
+const grade = (t: number) => t >= 90 ? "A" : t >= 80 ? "B+" : t >= 70 ? "B" : t >= 60 ? "C+" : t >= 50 ? "C" : t >= 35 ? "D" : "F";
+const rungName = ["L0 vanilla", "L1 default-assist", "L2 proxy+fingerprint", "L3 advanced-stealth", "L4 verified"];


Unused rungName variable in scorer

Low Severity

rungName is defined but never referenced anywhere in score.ts or the rest of the codebase. It's dead code — likely intended for the report output but never wired in.

^{Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.}

cursor Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add browsability skill — score how usable a site is for a browser agent#122

Add browsability skill — score how usable a site is for a browser agent#122
shubh24 wants to merge 1 commit into
mainfrom
shubh24/browsability-skill

shubh24 commented May 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 30, 2026

Uh oh!

cursor Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shubh24 commented May 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

The opinion

Score (0–100)

Contents

Status

Notes

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 30, 2026

Choose a reason for hiding this comment

Recoverability defaults to full marks when unassessed

Uh oh!

cursor Bot May 30, 2026

Choose a reason for hiding this comment

Unused rungName variable in scorer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shubh24 commented May 30, 2026 •

edited by cursor Bot

Loading

Unused `rungName` variable in scorer