Natural-language device control for your coding agent — across iOS, Android, and web. Every session is a replayable trail you can run as a test.
Trailblaze gives your coding agent a single, typed, replayable way to drive any device.
Built-in primitives plus your own typed tools, with a natural-language source of truth
that travels across platforms. Whether you're exploring an app, automating a workflow,
or building a test suite, the artifact is the same: a portable .trail.yaml your CI can
replay deterministically from recorded steps with no LLM at replay time.
-
Composable, replayable device control for your agent. Your coding agent drives the device through Trailblaze's primitives — plus first-class commands of your own, like
loginoraddToCart, defined in a typed language with type-safe bindings. Each step records its objective. The resulting.trail.yamlis both a natural-language source of truth — what you're testing or doing — and a deterministic execution artifact — how it runs. One trail can describe a flow across iOS, Android, and web — same natural-language steps, platform-specific actions or tools captured for each. Agents read the what; the framework handles the how. -
A cross-platform Trace Viewer. Open any session — yours or your agent's — and inspect view hierarchy, screenshots, video, and platform logs at every step. When you want a different selector than the one Trailblaze auto-picked for a step, the viewer lets you choose from generated alternatives computed against the same captured hierarchy — human judgment, no re-recording. Same viewer for iOS, Android, and web.
Most multi-platform testing tools expose the intersection of what iOS, Android, and
web can do — a generic API that maps onto all three, at the cost of losing
platform-specific capability. Trailblaze takes the opposite approach: full-fidelity
native semantic surfaces, OS-native primitives, and typed tool sets composed per
(target, platform). Your agent works against each platform's native UI tree —
the accessibility tree on Android, native UI semantics on iOS, the DOM on web —
rather than a flattened cross-platform abstraction.
The agent picks elements semantically — "the Sign in button" — from the native hierarchy. Trailblaze computes the platform-specific selectors behind the scenes. The natural-language test stays the same; the execution uses each platform's full power.
This only works because an agent is driving. Exposing full native fidelity to a human is overwhelming — twenty platform-specific selector strategies per element is no one's idea of a good testing SDK. Exposing it to an LLM is the point. The agent handles the complexity; you get the expressive power of native automation with the consistency of a single natural-language test.
You can stop anywhere. Tools you add to your agent's surface become first-class commands the next time your agent drives a device.
-
Drive a device. Point your coding agent (Claude Code, Cursor, Codex CLI) at the Trailblaze CLI. Natural-language device control across iOS, Android, and web — through built-in primitives plus any custom tools your team has shipped.
-
Save and replay. Any session becomes a
.trail.yaml— replay it ad-hoc, commit it to your repo as a CI regression test, or open it in the Trace Viewer to see exactly what happened. Same artifact, three uses, no LLM at replay. -
Compose your own agent surface. Give your agent first-class commands like
loginoraddToCart, named waypoints for your screens, and trailmaps from other teams. Curate exactly what your agent sees: surface yourloginand hide the low-level taps; pick four of twenty primitives if that's what your tests need. Custom commands are typed and type-safe, with IDE support, replayable capture, and LLM-facing descriptions you write for the tool and each parameter — your agent reads those descriptions to decide when and how to call them. Every call — yours, theirs, or built-in — is a first-class command, recordable and replayable. Your agent gets dramatically more capable on every task that uses your composition.¹
Recorded trails replay deterministically by default — no LLM in the loop, no flake. When a recorded step genuinely doesn't match the screen anymore, there are two repair paths:
- Built-in self-heal handles small drift — text changes, an unexpected popup, a
minor reorder within an established flow. Opt in with
--self-healand Trailblaze's built-in agent patches the failing step against the live screen. - Your coding agent handles the larger cases — anything that needs project context, log inspection, or judgment calls about intent. The trace session (view hierarchies, screenshots, video, platform logs, the trail YAML itself) is the diagnosis surface. Claude Code, Cursor, or Codex read the trace, compare what the step intended (its natural-language objective) to what the app now does, and propose a fix. You review and commit.
The natural-language objectives are what make this work. They tell the agent what the step was trying to do, so repair is a matter of updating the how against the current app — not re-deriving intent from a broken selector.
The AI agent ecosystem is moving fast. Whatever it looks like in a year — or five, or ten — your natural-language tests will come with you. Trailblaze captures what you're testing as portable prose; the how (selectors, recordings, agent harness, framework version) adapts as the landscape changes.
curl -fsSL https://raw.githubusercontent.com/block/trailblaze/main/install.sh | bashRequired: curl, java 17+.
Optional (Homebrew users: brew install bun esbuild ffmpeg):
bun+esbuild— needed only when authoring or running trailmap-defined scripted tools (on disk undertrails/config/trailmaps/<trailmap>/tools/*.ts). Without these,./trailblaze run …will fail at dispatch withUnsupported tool type for RPC executionfor any trail that uses atarget:trailmap with.tstool descriptors.ffmpeg— needed only for trail video capture / sprite extraction (visual playback artifacts under~/.trailblaze/logs/<session>/). Trails still run without it; only the rendered video and sprite-strip outputs are missing.
# List connected devices (Android emulator, iOS simulator, or web browser)
trailblaze device list
# Pin this shell to a device + target so subsequent calls inherit both from the env
eval $(trailblaze device connect android --target default)
# Read the screen — returns a view hierarchy with refs (e.g. ab42) the agent can target
trailblaze snapshot
# Act on a referenced element — your agent picks the element; Trailblaze picks the
# platform-specific selector strategy. Every action takes a --step for self-heal.
trailblaze tool tap ref=ab42 -s "Tap sign in"
# Done — release the device + clear TRAILBLAZE_DEVICE from this shell
eval $(trailblaze device disconnect)Paste those into Claude Code, Codex, Cursor, Goose, or anything that can run bash and
you're already authoring trails. For CI / scripts that prefer determinism over shell
state, every device-acting command (snapshot, tool, step, ask, verify,
session start/stop, run) also accepts -d <platform> (and --target <app> where
supported — tool, step, session start, mcp) as a per-call override.
(blaze remains accepted as a deprecated alias of step.)
Runnable, standalone workspaces under examples/ — each a complete template
you can copy: typed trailblaze.tool<Input, Output>() tools auto-discovered from bare
.ts files (no .yaml manifests), a trail suite, and tool unit tests.
| Example | Platform | What it teaches |
|---|---|---|
examples/ios-contacts |
iOS | Canonical mobile reference — 9 typed scripted tools, 18 trails, and tool unit tests (*.test.ts). |
examples/wikipedia |
Web | Canonical web reference — 9 typed scripted tools driving live en.wikipedia.org, 28 trails. |
examples/playwright-native |
Web | Smallest end-to-end scripted-tool setup, with a bundled sample app. |
./trailblaze check --workspace examples/ios-contacts # materialize SDK + typed bindings (once)See examples/README.md for the full index, including sample target
apps.
Trailblaze is moving fast. These are landing now and are worth knowing about even if they're not stable yet:
- Trailmaps — reusable target-aware capability bundles (tools + waypoints + routes
- recorded trails) shipped per app, consumed by humans and agents alike. Designed to distribute via npm so other teams (and eventually the broader community) can install an agent-ready vocabulary for your app. (devlog)
- Scripted Tools — write custom tools with the
@trailblaze/scriptingSDK in a typed language with type-safe bindings. Tools execute in a QuickJS sandbox on-device or in a host subprocess. After clone, run./trailblaze check --workspace examples/playwright-nativeonce to materialize the workspace SDK + per-trailmap typed bindings. (devlog) - Waypoints — named, assertable app locations defined structurally (element identity, stable labels), never by content. Agents can ask "am I on the Inbox?", land on a waypoint after a step, or use waypoints as trail checkpoints. (devlog)
- Trail-as-Tool — expose a saved trail as a tool so other trails (and agents) can
call it. A
loginAsTestUsertrail becomes a one-line setup step inside any other test. (devlog)
trailblaze app # Launch the desktop app for visual trail authoring and report browsingThe desktop app, HTML reports, and session browser all work the same regardless of how the trail was authored.
Full docs at block.github.io/trailblaze.
- CLI Reference — Every command and flag
- Tool Authoring — Add your own tools
- Configuration — Providers, devices, target apps
- Examples — Runnable, copy-me workspaces (start with
ios-contactsorwikipedia)
(A longer Getting Started walkthrough is being rewritten to match the new direction. Until that lands, the README above is the canonical starting point.)
Trailblaze is not a full coding agent. It ships a functioning agent focused on the natural-language step it's currently executing, with vision into the current screen and past steps — fine for many flows. What it doesn't try to be is a Claude Code, Cursor, or Codex: those have your entire codebase in context, can query logcat at runtime, and bring far more project context to bear. For serious authoring work, you want one of them driving Trailblaze. And Trailblaze is not a SaaS test platform — the trail YAML lives in your repo, you own it, you can read and edit it.
¹ Cross-team and cross-repo trailmap sharing today; npm-based distribution for community-published trailmaps is in active development.