kernel · dprevoznik · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/README.md b/README.md
@@ -72,6 +72,7 @@ Command-line interface skills for using Kernel CLI commands.
 | **kernel-cli** | Complete guide to Kernel CLI - cloud browser platform with automation, deployment, and management |
 | **kernel-agent-browser** | Best practices for `agent-browser -p kernel` automation, bot detection handling, iframes, login persistence |
 | **kernel-auth** | Setup and manage Kernel authentication connections for any website with safety checks and reauthentication support |
+| **cua-cli** | Drive a Kernel browser from shell via the `cua` binary: one-shot subcommands, named sessions, TUI, profile persistence, transcripts, live-view handoff |
 | **profile-website-bot-detection** | Profile a website for bot detection vendors using stealth vs non-stealth Kernel browsers; compare effectiveness and identify vendor products |
 
 ### kernel-sdks
@@ -82,6 +83,7 @@ SDK skills for building browser automation with TypeScript and Python.
 |-------|-------------|
 | **typescript-sdk** | Build automation with Kernel's Typescript SDK |
 | **python-sdk** | Build automation with kernel's Python SDK |
+| **cua-agent** | Build TypeScript apps that embed Kernel cua's loop with `CuaAgent` / `CuaAgentHarness`: provider switching, custom tools, session repos, event-stream debugging |
 
 ### generate-video
 

diff --git a/plugins/kernel-cli/skills/cua-cli/SKILL.md b/plugins/kernel-cli/skills/cua-cli/SKILL.md
@@ -0,0 +1,217 @@
+---
+name: cua-cli
+description: Drive a Kernel cloud browser from the shell using the `cua` CLI. Use this skill when you need to open URLs, click elements, type into fields, take screenshots, or chain multi-step browser tasks across shell calls. Supports named sessions for stateful workflows, profile persistence for logins, transcript-based debugging, and live-view handoff when stealth fails. For building your own TS agent on top of cua, see `cua-agent`.
+---
+
+# cua-cli
+
+`cua` is a single-binary CLI that drives a real Chrome session running in Kernel. It's designed for agentic use: each subcommand returns a one-line result on stdout and a deterministic exit code, so you can chain calls together and parse the output. An LLM picks targets semantically from screenshots — there are no CSS selectors.
+
+## When to use this skill
+
+- **Use this skill** when you need shell-callable computer-use steps (`cua open`, `cua click`, `cua do …`), an interactive TUI, or want to chain browser actions in a shell pipeline.
+- **Reach for the `cua-agent` skill** (in the `kernel-sdks` plugin) when you're writing a TypeScript app that needs to embed cua's prompt → screenshot → tool-call loop programmatically.
+- **Reach for `kernel-agent-browser`** when you need deterministic browser scripting (semantic selectors, `find role`, `wait --text`, accessibility-tree snapshots).
+- **Reach for `kernel-cli`** for raw Kernel browser management (`kernel browsers create`, `kernel browsers exec`, profile / proxy CRUD).
+
+## Prerequisites
+
+- A Kernel account and `KERNEL_API_KEY`. See `kernel-cli` for install + auth.
+- At least one model-provider API key, matched to the model you pick (table below).
+- Node 20+ for the npm install.
+
+## Install
+
+```bash
+# Global install — puts `cua` on $PATH
+npm i -g @onkernel/cua-cli
+
+# Or zero-install one-shot
+npx -y -p @onkernel/cua-cli cua --help
+```
+
+## Environment variables
+
+| Env | Used for |
+| --- | --- |
+| `KERNEL_API_KEY` | Kernel API key (always required) |
+| `OPENAI_API_KEY` | OpenAI models (`-m openai:…`) |
+| `ANTHROPIC_API_KEY` | Anthropic models (`-m anthropic:…`) |
+| `GOOGLE_API_KEY` / `GEMINI_API_KEY` | Google / Gemini models (`-m google:…`) |
+| `YUTORI_API_KEY` | Yutori Navigator (`-m yutori:…`) |
+| `TZAFON_API_KEY` | Tzafon (`-m tzafon:…`) |
+| `KERNEL_BASE_URL` | Override Kernel base URL |
+| `XDG_DATA_HOME` | Sessions / transcripts dir (defaults to `~/.local/share`) |
+| `CUA_IMAGE_PROTOCOL` | Force inline image protocol (`kitty` / `iterm2` / `none` / `auto`) |
+
+## One-shot subcommands
+
+Each call provisions a fresh Kernel browser by default, runs the action, prints a one-line result, and tears the browser down. Chain via `-s <name>` (next section) to keep state.
+
+| Subcommand | What it does | Stdout | Exit code |
+| --- | --- | --- | --- |
+| `cua open <url>` | Navigate to a URL. | `ok` | 0 ok, 2 error |
+| `cua click "<desc>"` | Find element matching natural-language description and click it. | `ok clicked (x, y)` or `not_found <reason>` | 0 ok, 1 not_found, 2 error |
+| `cua type "<field>" "<text>"` | Focus a field by description and type. | `ok typed` or `not_found <reason>` | 0 ok, 1 not_found, 2 error |
+| `cua press <key> [<key>...]` | Send a key combo (`cua press ctrl l`, `cua press Return`). | `ok pressed` | 0 ok, 2 error |
+| `cua url` | Print the current URL. | the URL | 0 ok, 2 error |
+| `cua observe ["<question>"]` | Describe the page; optionally answer a question. | the description | 0 ok, 2 error |
+| `cua screenshot --out <file\|->` | Save a PNG. `--out -` writes bytes to stdout. | the path or `(stdout)` | 0 ok, 2 error |
+| `cua do "<instruction>"` | Open-ended; agent plans and acts. Bound by `--max-steps` (default 3). | the assistant's final text | 0 ok, 2 error |
+
+Useful flags:
+
+- `-m <model>` — pick the LLM (default `openai:gpt-5.5`). `cua models` to list.
+- `--max-steps <n>` — bound the loop on `cua do`.
+- `--profile <id-or-name>` — load a Kernel browser profile for persisted cookies / storage. Existing ids or names are reused; a non-id name is created if missing. Pass `--profile-no-save-changes` for read-only.
+- `-v` — verbose progress on stderr (provisioning, tool calls, transcript path).
+
+`click` and `type` match **semantically**, not by selector — use natural-language descriptions of what's visible on screen.
+
+The cua CLI always provisions **stealth-on** browsers. If you need non-stealth or a custom viewport / proxy, pre-create the browser via `kernel browsers create` and attach the cua session to it.
+
+## Named sessions
+
+Without `-s`, each subcommand provisions a brand-new browser. To keep state across calls, allocate a named session first:
+
+```bash
+cua --profile github session start login    # provisions a Kernel browser, prints `name=login`
+cua -s login open https://github.com/login
+cua -s login type "email field"    "$EMAIL"
+cua -s login type "password field" "$PASSWORD"
+cua -s login click "Sign in"
+cua -s login url                            # prints post-login URL
+cua session stop login                      # tears down the Kernel browser
+```
+
+Inspect:
+
+```bash
+cua session list                            # NAME / KERNEL_ID / AGE / LIVE_URL
+cua session show login                      # full JSON metadata
+```
+
+Pass `--profile` when starting the named session; later `cua -s <name> …` calls attach to the same browser, so they don't need the profile flag again.
+
+**Liveness**: Kernel browsers time out from inactivity. If you see `error session "<name>" is no longer alive on Kernel …`, run `cua session stop <name> && cua --profile <same-profile-as-before> session start <name>` to re-provision with the same persisted profile.
+
+Named-session metadata lives in `$XDG_DATA_HOME/cua/named-sessions/<name>.json`.
+
+## Free-form mode
+
+```bash
+cua --print "open hn and tell me the top story"   # one-shot, streams text
+cua --print -o jsonl "..."                        # one-shot, streams JSONL events
+cua "..."                                         # interactive TUI (real terminal)
+```
+
+`--print` exits when the agent finishes; the TUI runs until Ctrl+C. Add `--jsonl-include-deltas` for token deltas, `--jsonl-include-images` for base64 screenshots in `tool_result` events.
+
+## Model selection
+
+Run `cua models` for the current catalog. Pick with `-m <ref>` (default `openai:gpt-5.5`). Switch per call or per named session.
+
+| Model ref | Provider |
+| --- | --- |
+| `openai:gpt-5.5` | OpenAI (default) |
+| `anthropic:claude-opus-4-7` | Anthropic (supports `--thinking off\|minimal\|low\|medium\|high\|xhigh`) |
+| `google:gemini-3-flash-preview` | Google / Gemini |
+| `yutori:n1.5-latest` | Yutori Navigator |
+
+Not every provider's native vocabulary includes navigation. If a model can click and type but can't navigate (`goto`, `back`, `forward`, `url`), pick a different model.
+
+## Live view URL and manual login fallback
+
+Stealth-on doesn't always beat bot detection. When automation gets stuck on a login, hand off to a human via the live view URL.
+
+```bash
+cua --profile mysite session start login
+cua session show login | jq -r .live_url   # share this URL with the user
+# user logs in manually in the live view
+cua -s login url                           # confirm post-login URL
+cua session stop login                     # profile state saves on teardown
+```
+
+If you only have a session id (e.g. from `cua session list`), the `kernel` CLI also surfaces it:
+
+```bash
+kernel browsers view <session-id>
+```
+
+## Cross-origin iframes / Playwright escape hatch
+
+cua drives by clicking pixels, so cross-origin iframes (payment forms, embedded vendor widgets) work in the screenshot flow without special handling — the model just clicks them. When you need a deterministic Playwright action against the underlying browser (e.g. fill a card form via a fixed selector), break out to Kernel's exec endpoint with the session id:
+
+```bash
+# Find the session id
+cua session show login | jq -r .kernel_session_id
+
+# Run a Playwright snippet against the same browser
+kernel browsers exec <session-id> --code "
+  const frame = page.frameLocator('#payment-iframe');
+  await frame.locator('#card-number').fill('4111111111111111');
+  await frame.locator('#submit').click();
+"
+```
+
+## Debugging
+
+- **Verbose stderr**: `cua -v --print "…"` writes provisioning info, tool calls, and the transcript path to stderr.
+- **Live event stream**: `cua --print -o jsonl "…"` emits one event per line (`tool_call`, `tool_result`, `assistant_text_done`, etc.). Add `--jsonl-include-images` to inline screenshots in `tool_result`.
+- **Persisted transcript**: every `--print`, TUI, and `-s <name>` invocation appends to `$XDG_DATA_HOME/cua/sessions/<cwd-hash>/<id>.jsonl`. Find the exact path:
+  ```bash
+  cua -v --print "..."                       # stderr includes: [cua] session=<path>
+  cua session show login | jq -r .transcript_path
+  ```
+  Roles: `user`, `assistant`, `toolResult`. There's also a custom `cua-browser` entry written once per session with `kernel_session_id` / `live_url` / `profile_id`.
+- **Screenshots**: `cua screenshot --out shot.png` or inspect `image` blocks in `toolResult` transcript entries.
+- **Page URL**: `cua url` to confirm post-action navigation.
+
+A few `jq` starters against a transcript path:
+
+```bash
+# Every tool call the agent made, in order
+jq -c 'select(.role == "assistant") | .content[]?
+       | select(.type == "tool_use") | {name, input}' "$TRANSCRIPT"
+
+# Final assistant text (the answer)
+jq -r 'select(.role == "assistant") | .content[]?
+       | select(.type == "text") | .text' "$TRANSCRIPT" | tail -1
+```
+
+## Gotchas
+
+- **Element descriptions are semantic, not selectors.** `cua click "Sign in button"` looks at the screenshot — describe what the user sees, not a CSS selector.
+- **Viewport defaults to 1920x1080.** Pre-create the browser with `kernel browsers create` if you need something else.
+- **Keyboard navigation > mouse-wheel scroll.** `cua press Page_Down` / `Home` / arrow keys is more reliable than scroll wheel via the LLM.
+- **Multi-step state requires `-s <name>`.** A second one-shot subcommand can't see what the first one did.
+- **Profile saves on close, not continuously.** Tear down cleanly with `cua session stop` or you'll lose recent state.
+- **`--max-steps` defaults to 3 on `cua do`.** Bump it for non-trivial tasks.
+
+## Quick reference
+
+```bash
+# One-shot, fresh browser
+cua --print "open hn and tell me the top story"
+
+# Named session for multi-step
+cua --profile mysite session start work
+cua -s work open https://example.com
+cua -s work click "Log in"
+cua -s work type "email field" "$EMAIL"
+cua -s work click "Submit"
+cua -s work url
+cua session stop work
+
+# List models, switch model per call
+cua models
+cua --print -m anthropic:claude-opus-4-7 "..."
+
+# Get the live view URL
+cua session show work | jq -r .live_url
+kernel browsers view <session-id>   # alternative
+
+# Drop to Playwright for deterministic actions
+cua session show work | jq -r .kernel_session_id
+kernel browsers exec <session-id> --code "..."
+```