Add browsability skill — score how usable a site is for a browser agent#122
Add browsability skill — score how usable a site is for a browser agent#122shubh24 wants to merge 1 commit into
Conversation
Adds the `browsability` skill — an operational rubric for how well an AI
*browser* agent can drive a website's UI (the sibling of agent-experience,
which covers docs/SDK onboarding DX).
Scores 0–100 across:
A Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder)
a task needs to complete
B1 Reachability — % of controls that survive the accessibility-tree prune
B3 Structural traps — cross-origin iframes, shadow DOM, DOM depth/size
C Agent tax — agent steps OVER the human baseline (delta, not absolute)
D Recoverability — self-heal / site errors / blocking overlays / step ceiling
The Drivability slice (B1+B3) runs deterministically from one page load via
scripts/friction.ts (no model). The full score adds an agent run across the
assistance ladder via scripts/score.ts. Rubric grounded in what the open-source
Stagehand framework treats as hard; uses only public Browserbase session settings.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.
| type Run = { rung: number; success: boolean; steps?: number; model?: string; note?: string }; | ||
| type Task = { name: string; type?: string; humanBaselineSteps?: number; runs?: Run[] }; | ||
|
|
||
| let a = 0, c = taxProxy, d = 10, tasks: Task[] = [], haveRuns = false; |
There was a problem hiding this comment.
Recoverability defaults to full marks when unassessed
Medium Severity
When no tasks.json is provided (drivability-only mode), d is initialized to 10 — giving full recoverability marks for an axis that was never assessed. Meanwhile a (Access Resistance) correctly defaults to 0 when unknown. This asymmetry inflates the reported score by 10 points. The SKILL.md documentation explicitly states the drivability-only score is "B1 + B3, 40 max" with A/C/D pending, but the code can produce up to 70. d needs to default to 0 to match the pessimistic stance used for a and the documented behavior.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.
|
|
||
| const total = Math.round(a + b1 + b3 + c + d); | ||
| const grade = (t: number) => t >= 90 ? "A" : t >= 80 ? "B+" : t >= 70 ? "B" : t >= 60 ? "C+" : t >= 50 ? "C" : t >= 35 ? "D" : "F"; | ||
| const rungName = ["L0 vanilla", "L1 default-assist", "L2 proxy+fingerprint", "L3 advanced-stealth", "L4 verified"]; |
There was a problem hiding this comment.
Unused rungName variable in scorer
Low Severity
rungName is defined but never referenced anywhere in score.ts or the rest of the codebase. It's dead code — likely intended for the report output but never wired in.
Reviewed by Cursor Bugbot for commit 9c30e80. Configure here.


What
Adds the
browsabilityskill: an operational rubric + tooling for scoring how well an AI browser agent can drive a website's UI.It's the sibling of
agent-experience(which audits docs/SDK onboarding DX). This one is about operability — perceiving and driving the live DOM — not discoverability (no SEO/AEO/llms.txt).The opinion
Browsability = how little help an agent needs to succeed, and how much harder the site is for an agent than for a human. Step-count is scored as the delta over a human baseline, so a genuinely long workflow isn't penalized — only the agent-specific tax (e.g. a custom dropdown that costs two steps where a native
<select>costs one).Score (0–100)
Agent-native affordance (an API/deep-link path) is noted as a ceiling badge, not scored — this rubric measures UI operability.
Contents
SKILL.md— the workflow (probe → agent ladder → composite score → report).references/rubric.md— the full code-grounded rubric, the assistance ladder, the delta framing, and a remediation table.scripts/friction.ts— deterministic Drivability probe (B1 + B3) from a single page load via the browse CLI; no model needed.scripts/score.ts— composite 0–100 scorer combining the probe with agent-run results.Status
browserskill as the driver) to climb the rungs and recordtasks.json. Without it, the scorer reports a Drivability-only score and marks A/C/D pending.Notes
solveCaptchasdefaults to on, so an honest rung-0 baseline explicitly disables it (documented in the skill).🤖 Generated with Claude Code
Note
Low Risk
Documentation and local scoring scripts only; no changes to auth, production services, or existing skill runtime behavior.
Overview
Adds a new
browsabilityskill that scores how well an AI browser agent can operate a site’s UI (not SEO/docs DX), and lists it in the root README skills table.The skill documents a 0–100 rubric (Access Resistance ladder, reachability, structural traps, agent tax vs human baseline, recoverability) in
SKILL.mdandreferences/rubric.md, with a workflow: deterministic Drivability probe → optional agent ladder intasks.json→ composite report.New Bun scripts drive the flow:
friction.tsloads a URL via thebrowseCLI, evaluates DOM signals, and writesfriction.json;score.tsmerges that with optional task runs intobrowsability.json(full score or Drivability-only when ladder data is missing). Output goes underbrowsability-out/(gitignored); MIT LICENSE is included.Reviewed by Cursor Bugbot for commit 9c30e80. Bugbot is set up for automated code reviews on this repo. Configure here.