diff --git a/COMMAND_OWNERSHIP.md b/COMMAND_OWNERSHIP.md new file mode 100644 index 00000000..6d0d39b5 --- /dev/null +++ b/COMMAND_OWNERSHIP.md @@ -0,0 +1,141 @@ +# Command Ownership Inventory + +This inventory keeps the public boundary stable while command semantics move into +the runtime layer. New integrations should prefer the runtime, backend, and IO +interfaces over helper subpaths. + +## Portable Command Runtime + +These commands describe device, app, capture, selector, or interaction behavior. +Their semantics should live in `agent-device/commands` as they migrate. + +- `alert` +- `app-switcher` +- `apps` +- `appstate` +- `back` +- `click` +- `clipboard` +- `close` +- `diff` +- `fill` +- `find` +- `focus` +- `get` +- `home` +- `is` +- `keyboard` +- `longpress` +- `open` +- `pinch` +- `press` +- `push` +- `rotate` +- `screenshot` +- `scroll` +- `settings` +- `snapshot` +- `swipe` +- `trigger-app-event` +- `type` +- `wait` + +## Runtime Migration Status + +- `screenshot`: runtime command implemented; daemon screenshot dispatch calls the runtime. +- `diff screenshot`: runtime command implemented; CLI screenshot diff dispatch calls the runtime. +- `snapshot`: runtime command implemented; daemon snapshot dispatch calls the runtime. +- `diff snapshot`: runtime command implemented; daemon snapshot diff dispatch calls the runtime. +- `find`: read-only runtime actions implemented for `exists`, `wait`, `get text`, + and `get attrs`; mutating find actions remain on the existing interaction path. +- `get`: runtime command implemented; daemon get dispatch calls the runtime. +- `is`: runtime command implemented; daemon is dispatch calls the runtime. +- `wait`: runtime command implemented for sleep, text, ref, and selector waits; + daemon wait dispatch calls the runtime. +- `click`: runtime command implemented for point, ref, and selector targets; the + daemon click dispatch calls the runtime. +- `press`: runtime command implemented for point, ref, and selector targets; the + daemon press dispatch calls the runtime. +- `fill`: runtime command implemented for point, ref, and selector targets; the + daemon fill dispatch calls the runtime. +- `type`: runtime command implemented; daemon type dispatch calls the runtime. + +## Boundary Requirements + +- Public command APIs expose only implemented commands. Planned commands belong + in `commandCatalog`, not as methods that throw at runtime. +- Runtime services default to `restrictedCommandPolicy()`. Local input and + output paths require an explicit local policy or adapter decision. +- File inputs and outputs cross the runtime boundary through `agent-device/io` + refs and artifact descriptors; command implementations should not accept + ad-hoc path strings for new file contracts. +- Image-producing or image-reading commands must preserve `maxImagePixels` + enforcement before decoding or comparing untrusted images. +- Backend escape hatches must be named capabilities with a policy gate. Do not + add a freeform backend command bag. +- Command options should carry `session`, `requestId`, `signal`, and `metadata` + through `CommandContext` so hosted adapters can enforce request scope, + cancellation, and audit correlation consistently. +- Runtime command modules should depend on shared `src/utils/*` helpers, not + daemon-only modules. Keep daemon paths as compatibility shims when older + handlers still import them. +- New backend adapters should run `agent-device/testing/conformance` suites for + the command families they claim to support. + +## Backend And Admin Capabilities + +These commands manage devices or app installation. Keep them explicit backend +capabilities so hosted adapters can decide what is supported. + +- `boot` +- `devices` +- `ensure-simulator` +- `install` +- `install-from-source` +- `reinstall` + +## Transport And Session Orchestration + +These are daemon, CLI, or transport concerns. They can construct or call the +runtime, but they are not portable command semantics. + +- `session` +- lease allocation, heartbeat, and release daemon commands + +## Environment Preparation + +These prepare local or remote development environment state. Keep them outside +the portable command runtime. + +- `connect` +- `connection` +- `disconnect` +- `metro` + +## Later Capability-Gated Runtime Commands + +These commands should migrate only after the runtime, backend capability, and IO +contracts are established for their behavior. + +- `batch` +- `logs` +- `network` +- `perf` +- `record` +- `replay` +- `test` +- `trace` + +## Compatibility Helper Subpaths + +These subpaths remain available during migration, but they should not be the +primary boundary for new command behavior: + +- `agent-device/contracts` +- `agent-device/selectors` +- `agent-device/finders` +- `agent-device/install-source` +- `agent-device/android-apps` +- `agent-device/artifacts` +- `agent-device/metro` +- `agent-device/remote-config` diff --git a/package.json b/package.json index a5cd7b86..09f7a182 100644 --- a/package.json +++ b/package.json @@ -12,6 +12,22 @@ "import": "./dist/src/index.js", "types": "./dist/src/index.d.ts" }, + "./commands": { + "import": "./dist/src/commands/index.js", + "types": "./dist/src/commands/index.d.ts" + }, + "./backend": { + "import": "./dist/src/backend.js", + "types": "./dist/src/backend.d.ts" + }, + "./io": { + "import": "./dist/src/io.js", + "types": "./dist/src/io.d.ts" + }, + "./testing/conformance": { + "import": "./dist/src/testing/conformance.js", + "types": "./dist/src/testing/conformance.d.ts" + }, "./artifacts": { "import": "./dist/src/artifacts.js", "types": "./dist/src/artifacts.d.ts" diff --git a/rslib.config.ts b/rslib.config.ts index d2d887d0..4e0d7f42 100644 --- a/rslib.config.ts +++ b/rslib.config.ts @@ -17,6 +17,10 @@ export default defineConfig({ source: { entry: { index: 'src/index.ts', + 'commands/index': 'src/commands/index.ts', + backend: 'src/backend.ts', + io: 'src/io.ts', + 'testing/conformance': 'src/testing/conformance.ts', artifacts: 'src/artifacts.ts', metro: 'src/metro.ts', 'remote-config': 'src/remote-config.ts', diff --git a/src/__tests__/cli-batch.test.ts b/src/__tests__/cli-batch.test.ts index d1653a02..1abc33d6 100644 --- a/src/__tests__/cli-batch.test.ts +++ b/src/__tests__/cli-batch.test.ts @@ -34,6 +34,9 @@ async function runCliCapture( const originalExit = process.exit; const originalStdoutWrite = process.stdout.write.bind(process.stdout); const originalStderrWrite = process.stderr.write.bind(process.stderr); + const originalStateDir = process.env.AGENT_DEVICE_STATE_DIR; + const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-cli-batch-')); + process.env.AGENT_DEVICE_STATE_DIR = stateDir; (process as any).exit = ((nextCode?: number) => { throw new ExitSignal(nextCode ?? 0); @@ -61,6 +64,9 @@ async function runCliCapture( if (error instanceof ExitSignal) code = error.code; else throw error; } finally { + if (originalStateDir === undefined) delete process.env.AGENT_DEVICE_STATE_DIR; + else process.env.AGENT_DEVICE_STATE_DIR = originalStateDir; + fs.rmSync(stateDir, { recursive: true, force: true }); process.exit = originalExit; process.stdout.write = originalStdoutWrite; process.stderr.write = originalStderrWrite; diff --git a/src/__tests__/cli-client-commands.test.ts b/src/__tests__/cli-client-commands.test.ts index 2d4ca338..2e9e8004 100644 --- a/src/__tests__/cli-client-commands.test.ts +++ b/src/__tests__/cli-client-commands.test.ts @@ -3,6 +3,7 @@ import os from 'node:os'; import path from 'node:path'; import { test } from 'vitest'; import assert from 'node:assert/strict'; +import { PNG } from 'pngjs'; import { tryRunClientBackedCommand } from '../cli/commands/router.ts'; import type { AgentDeviceClient, @@ -204,6 +205,59 @@ test('screenshot forwards --overlay-refs to the client capture API', async () => }); }); +test('diff screenshot forwards --surface to live client screenshot capture', async () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-cli-diff-surface-')); + const baseline = path.join(dir, 'baseline.png'); + const out = path.join(dir, 'diff.png'); + fs.writeFileSync(baseline, solidPngBuffer(4, 4, { r: 0, g: 0, b: 0 })); + let observed: Parameters[0] | undefined; + + try { + const client = createStubClient({ + installFromSource: async () => { + throw new Error('unexpected install call'); + }, + screenshot: async (options) => { + if (!options?.path) { + throw new Error('expected runtime to request a live screenshot path'); + } + observed = options; + fs.writeFileSync(options.path, solidPngBuffer(4, 4, { r: 255, g: 255, b: 255 })); + return { + path: options.path, + identifiers: { session: options.session ?? 'default' }, + }; + }, + }); + + await captureStdout(async () => { + const handled = await tryRunClientBackedCommand({ + command: 'diff', + positionals: ['screenshot'], + flags: { + json: true, + help: false, + version: false, + baseline, + out, + platform: 'macos', + session: 'surface-session', + surface: 'menubar', + threshold: '0', + }, + client, + }); + assert.equal(handled, true); + }); + + assert.equal(observed?.session, 'surface-session'); + assert.equal(observed?.surface, 'menubar'); + assert.equal(fs.existsSync(out), true); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } +}); + test('open forwards macOS surface to the client apps API', async () => { let observed: AppOpenOptions | undefined; const client = createStubClient({ @@ -630,3 +684,18 @@ function createThrowingMethodGroup(): T { get: (target, property) => target[property as keyof T] ?? unexpectedCommandCall, }) as T; } + +function solidPngBuffer( + width: number, + height: number, + color: { r: number; g: number; b: number }, +): Buffer { + const png = new PNG({ width, height }); + for (let i = 0; i < png.data.length; i += 4) { + png.data[i] = color.r; + png.data[i + 1] = color.g; + png.data[i + 2] = color.b; + png.data[i + 3] = 255; + } + return PNG.sync.write(png); +} diff --git a/src/__tests__/cli-diagnostics.test.ts b/src/__tests__/cli-diagnostics.test.ts index 72a5889b..fb4fc745 100644 --- a/src/__tests__/cli-diagnostics.test.ts +++ b/src/__tests__/cli-diagnostics.test.ts @@ -35,6 +35,9 @@ async function runCliCapture( const originalExit = process.exit; const originalStdoutWrite = process.stdout.write.bind(process.stdout); const originalStderrWrite = process.stderr.write.bind(process.stderr); + const originalStateDir = process.env.AGENT_DEVICE_STATE_DIR; + const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-cli-diagnostics-')); + process.env.AGENT_DEVICE_STATE_DIR = stateDir; (process as any).exit = ((nextCode?: number) => { throw new ExitSignal(nextCode ?? 0); @@ -59,6 +62,9 @@ async function runCliCapture( if (error instanceof ExitSignal) code = error.code; else throw error; } finally { + if (originalStateDir === undefined) delete process.env.AGENT_DEVICE_STATE_DIR; + else process.env.AGENT_DEVICE_STATE_DIR = originalStateDir; + fs.rmSync(stateDir, { recursive: true, force: true }); process.exit = originalExit; process.stdout.write = originalStdoutWrite; process.stderr.write = originalStderrWrite; diff --git a/src/__tests__/runtime-conformance.test.ts b/src/__tests__/runtime-conformance.test.ts new file mode 100644 index 00000000..a784e5df --- /dev/null +++ b/src/__tests__/runtime-conformance.test.ts @@ -0,0 +1,98 @@ +import assert from 'node:assert/strict'; +import { test } from 'vitest'; +import type { AgentDeviceBackend } from '../backend.ts'; +import { createLocalArtifactAdapter } from '../io.ts'; +import { createAgentDevice, createMemorySessionStore, localCommandPolicy } from '../runtime.ts'; +import { + assertCommandConformance, + commandConformanceSuites, + runCommandConformance, +} from '../testing/conformance.ts'; +import type { SnapshotState } from '../utils/snapshot.ts'; +import { makeSnapshotState } from './test-utils/index.ts'; + +test('command conformance suites run against a fixture backend', async () => { + const calls: string[] = []; + const report = await runCommandConformance({ + name: 'fixture', + createRuntime: () => + createAgentDevice({ + backend: createFixtureBackend(calls), + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([{ name: 'default', snapshot: fixtureSnapshot() }]), + policy: localCommandPolicy(), + }), + }); + + assert.equal(report.target, 'fixture'); + assert.equal(report.failed, 0); + assert.equal(report.passed, commandConformanceSuites.flatMap((suite) => suite.cases).length); + assert.equal(calls.includes('screenshot'), true); + assert.equal(calls.includes('tap'), true); + assert.equal(calls.includes('fill'), true); + assert.equal(calls.includes('typeText'), true); +}); + +test('assertCommandConformance throws when a suite fails', async () => { + await assert.rejects( + () => + assertCommandConformance({ + name: 'missing-screenshot', + createRuntime: () => + createAgentDevice({ + backend: { + ...createFixtureBackend([]), + captureScreenshot: undefined, + }, + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([{ name: 'default', snapshot: fixtureSnapshot() }]), + policy: localCommandPolicy(), + }), + }), + /failed/, + ); +}); + +function createFixtureBackend(calls: string[]): AgentDeviceBackend { + return { + platform: 'ios', + captureScreenshot: async () => { + calls.push('screenshot'); + }, + captureSnapshot: async () => { + calls.push('snapshot'); + return { snapshot: fixtureSnapshot() }; + }, + tap: async () => { + calls.push('tap'); + }, + fill: async () => { + calls.push('fill'); + }, + typeText: async () => { + calls.push('typeText'); + }, + }; +} + +function fixtureSnapshot(): SnapshotState { + return makeSnapshotState([ + { + index: 0, + depth: 0, + type: 'Button', + label: 'Continue', + value: 'Continue', + rect: { x: 10, y: 20, width: 100, height: 40 }, + hittable: true, + }, + { + index: 1, + depth: 0, + type: 'XCUIElementTypeTextField', + label: 'Email', + rect: { x: 20, y: 80, width: 180, height: 40 }, + hittable: true, + }, + ]); +} diff --git a/src/__tests__/runtime-diff-screenshot.test.ts b/src/__tests__/runtime-diff-screenshot.test.ts new file mode 100644 index 00000000..af37d7a4 --- /dev/null +++ b/src/__tests__/runtime-diff-screenshot.test.ts @@ -0,0 +1,219 @@ +import assert from 'node:assert/strict'; +import fs from 'node:fs'; +import os from 'node:os'; +import path from 'node:path'; +import { PNG } from 'pngjs'; +import { test } from 'vitest'; +import type { + AgentDeviceBackend, + BackendScreenshotOptions, + BackendScreenshotResult, +} from '../backend.ts'; +import { createLocalArtifactAdapter } from '../io.ts'; +import { createAgentDevice, localCommandPolicy, type CommandSessionStore } from '../runtime.ts'; + +const sessions = { + get: () => undefined, + set: () => {}, +} satisfies CommandSessionStore; + +test('runtime diff screenshot captures live current image and cleans temporary capture', async () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'runtime-diff-screenshot-')); + const baseline = path.join(dir, 'baseline.png'); + const diffOut = path.join(dir, 'diff.png'); + let capturedCurrentPath: string | undefined; + let capturedOptions: BackendScreenshotOptions | undefined; + + fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 })); + + try { + const device = createAgentDevice({ + backend: createScreenshotBackend((outPath, options) => { + capturedCurrentPath = outPath; + capturedOptions = options; + fs.writeFileSync(outPath, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 })); + return { path: outPath }; + }), + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + }); + + const result = await device.capture.diffScreenshot({ + baseline: { kind: 'path', path: baseline }, + current: { kind: 'live' }, + out: { kind: 'path', path: diffOut }, + threshold: 0, + surface: 'menubar', + }); + + assert.equal(result.match, false); + assert.equal(result.differentPixels, 100); + assert.equal(result.diffPath, diffOut); + assert.equal(fs.existsSync(diffOut), true); + assert.equal(typeof capturedCurrentPath, 'string'); + assert.equal(fs.existsSync(capturedCurrentPath!), false); + assert.equal(capturedOptions?.surface, 'menubar'); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } +}); + +test('runtime diff screenshot compares supplied current image without backend capture', async () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'runtime-diff-screenshot-')); + const baseline = path.join(dir, 'baseline.png'); + const current = path.join(dir, 'current.png'); + fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 })); + fs.writeFileSync(current, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 })); + + try { + const device = createAgentDevice({ + backend: createScreenshotBackend(() => { + throw new Error('capture should not be called'); + }), + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + }); + + const result = await device.capture.diffScreenshot({ + baseline: { kind: 'path', path: baseline }, + current: { kind: 'path', path: current }, + threshold: 0, + }); + + assert.equal(result.match, false); + assert.equal(result.differentPixels, 100); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } +}); + +test('runtime diff screenshot rejects overlay refs with supplied current image', async () => { + const device = createAgentDevice({ + backend: createScreenshotBackend(() => { + throw new Error('capture should not be called'); + }), + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + }); + + await assert.rejects( + () => + device.capture.diffScreenshot({ + baseline: { kind: 'path', path: '/tmp/baseline.png' }, + current: { kind: 'path', path: '/tmp/current.png' }, + overlayRefs: true, + }), + /saved-image comparisons have no live accessibility refs/, + ); +}); + +test('runtime diff screenshot enforces max image pixels policy', async () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'runtime-diff-screenshot-')); + const baseline = path.join(dir, 'baseline.png'); + const current = path.join(dir, 'current.png'); + fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 })); + fs.writeFileSync(current, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 })); + + try { + const device = createAgentDevice({ + backend: createScreenshotBackend(() => { + throw new Error('capture should not be called'); + }), + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy({ maxImagePixels: 50 }), + }); + + await assert.rejects( + () => + device.capture.diffScreenshot({ + baseline: { kind: 'path', path: baseline }, + current: { kind: 'path', path: current }, + }), + /maxImagePixels/, + ); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } +}); + +test('runtime diff screenshot attaches overlay refs to live mismatch regions', async () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'runtime-diff-screenshot-')); + const baseline = path.join(dir, 'baseline.png'); + const diffOut = path.join(dir, 'diff.png'); + const overlayOut = path.join(dir, 'diff.current-overlay.png'); + fs.writeFileSync(baseline, solidPngBuffer(10, 10, { r: 0, g: 0, b: 0 })); + + try { + const device = createAgentDevice({ + backend: createScreenshotBackend((outPath, options) => { + fs.writeFileSync(outPath, solidPngBuffer(10, 10, { r: 255, g: 255, b: 255 })); + return { + path: outPath, + ...(options?.overlayRefs + ? { + overlayRefs: [ + { + ref: 'e1', + label: 'Continue', + rect: { x: 1, y: 2, width: 3, height: 4 }, + overlayRect: { x: 1, y: 2, width: 3, height: 4 }, + center: { x: 3, y: 4 }, + }, + ], + } + : {}), + }; + }), + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + }); + + const result = await device.capture.diffScreenshot({ + baseline: { kind: 'path', path: baseline }, + current: { kind: 'live' }, + out: { kind: 'path', path: diffOut }, + threshold: 0, + overlayRefs: true, + }); + + assert.equal(result.currentOverlayPath, overlayOut); + assert.equal(result.currentOverlayRefCount, 1); + assert.equal(fs.existsSync(overlayOut), true); + assert.equal(result.regions?.[0]?.currentOverlayMatches?.[0]?.ref, 'e1'); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } +}); + +function createScreenshotBackend( + captureScreenshot: ( + outPath: string, + options?: BackendScreenshotOptions, + ) => BackendScreenshotResult | void | Promise, +): AgentDeviceBackend { + return { + platform: 'ios', + captureScreenshot: async (_context, outPath, options) => + await captureScreenshot(outPath, options), + }; +} + +function solidPngBuffer( + width: number, + height: number, + color: { r: number; g: number; b: number }, +): Buffer { + const png = new PNG({ width, height }); + for (let i = 0; i < png.data.length; i += 4) { + png.data[i] = color.r; + png.data[i + 1] = color.g; + png.data[i + 2] = color.b; + png.data[i + 3] = 255; + } + return PNG.sync.write(png); +} diff --git a/src/__tests__/runtime-interactions.test.ts b/src/__tests__/runtime-interactions.test.ts new file mode 100644 index 00000000..de502188 --- /dev/null +++ b/src/__tests__/runtime-interactions.test.ts @@ -0,0 +1,258 @@ +import assert from 'node:assert/strict'; +import { test } from 'vitest'; +import type { AgentDeviceBackend } from '../backend.ts'; +import { commands, ref, selector } from '../commands/index.ts'; +import { createLocalArtifactAdapter } from '../io.ts'; +import { createAgentDevice, createMemorySessionStore, localCommandPolicy } from '../runtime.ts'; +import type { Point, SnapshotState } from '../utils/snapshot.ts'; +import { makeSnapshotState } from './test-utils/index.ts'; + +test('runtime click taps an explicit point without requiring a snapshot', async () => { + const calls: Array<{ point: Point; count?: number }> = []; + const device = createInteractionDevice(selectorSnapshot(), { + tap: async (_context, point, options) => { + calls.push({ point, count: options?.count }); + }, + }); + + const result = await device.interactions.click({ kind: 'point', x: 10, y: 20 }, { count: 2 }); + + assert.deepEqual(calls, [{ point: { x: 10, y: 20 }, count: 2 }]); + assert.deepEqual(result, { kind: 'point', point: { x: 10, y: 20 } }); +}); + +test('runtime interactions pass runtime signal to backend primitives', async () => { + const controller = new AbortController(); + let signal: AbortSignal | undefined; + const device = createAgentDevice({ + backend: { + platform: 'ios', + tap: async (context) => { + signal = context.signal; + }, + typeText: async () => {}, + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + policy: localCommandPolicy(), + signal: controller.signal, + }); + + await device.interactions.click({ kind: 'point', x: 1, y: 2 }); + + assert.equal(signal, controller.signal); +}); + +test('runtime press resolves selector targets to the actionable node center', async () => { + const calls: Point[] = []; + const device = createInteractionDevice(selectorSnapshot(), { + tap: async (_context, point) => { + calls.push(point); + return { ok: true }; + }, + }); + + const result = await device.interactions.press(selector('label=Continue'), { + session: 'default', + }); + + assert.deepEqual(calls, [{ x: 60, y: 40 }]); + assert.equal(result.kind, 'selector'); + assert.deepEqual(result.target, { kind: 'selector', selector: 'label=Continue' }); + assert.equal(result.node?.label, 'Continue'); + assert.deepEqual(result.selectorChain, [ + 'role="button" label="Continue"', + 'label="Continue"', + 'value="Continue"', + ]); + assert.deepEqual(result.backendResult, { ok: true }); +}); + +test('runtime fill resolves refs and forwards text to the backend primitive', async () => { + const calls: Array<{ point: Point; text: string; delayMs?: number }> = []; + const device = createInteractionDevice(fillableSnapshot(), { + captureSnapshot: async () => { + throw new Error('ref fill should use the stored session snapshot'); + }, + fill: async (_context, point, text, options) => { + calls.push({ point, text, delayMs: options?.delayMs }); + }, + }); + + const result = await device.interactions.fill(ref('@e1'), 'hello', { + session: 'default', + delayMs: 25, + }); + + assert.deepEqual(calls, [{ point: { x: 50, y: 30 }, text: 'hello', delayMs: 25 }]); + assert.equal(result.kind, 'ref'); + assert.deepEqual(result.target, { kind: 'ref', ref: '@e1' }); + assert.equal(result.text, 'hello'); + assert.equal(result.warning, undefined); +}); + +test('runtime interactions reject unsupported macOS desktop and menubar surfaces', async () => { + const desktop = createInteractionDevice(selectorSnapshot(), { + platform: 'macos', + sessionMetadata: { surface: 'desktop' }, + tap: async () => { + throw new Error('desktop click should be rejected before backend tap'); + }, + }); + await assert.rejects( + () => desktop.interactions.click({ kind: 'point', x: 1, y: 2 }, { session: 'default' }), + /click is not supported on macOS desktop sessions yet/, + ); + await assert.rejects( + () => + desktop.interactions.click( + { kind: 'point', x: 1, y: 2 }, + { session: 'default', metadata: { surface: 'app' } }, + ), + /click is not supported on macOS desktop sessions yet/, + ); + + const menubar = createInteractionDevice(fillableSnapshot(), { + platform: 'macos', + sessionMetadata: { surface: 'menubar' }, + fill: async () => { + throw new Error('menubar fill should be rejected before backend fill'); + }, + }); + await assert.rejects( + () => menubar.interactions.fill(ref('@e1'), 'hello', { session: 'default' }), + /fill is not supported on macOS menubar sessions yet/, + ); + + let pressed = false; + const menubarPress = createInteractionDevice(fillableSnapshot(), { + platform: 'macos', + sessionMetadata: { surface: 'menubar' }, + tap: async () => { + pressed = true; + }, + }); + + await menubarPress.interactions.press(ref('@e1'), { session: 'default' }); + + assert.equal(pressed, true); +}); + +test('runtime ref interactions refresh the snapshot when a stored ref has no usable rect', async () => { + const staleSnapshot = makeSnapshotState([ + { + index: 0, + depth: 0, + type: 'Button', + label: 'Continue', + hittable: true, + }, + ]); + const freshSnapshot = selectorSnapshot(); + const calls: Point[] = []; + let captures = 0; + const device = createInteractionDevice(staleSnapshot, { + captureSnapshot: async () => { + captures += 1; + return { snapshot: freshSnapshot }; + }, + tap: async (_context, point) => { + calls.push(point); + }, + }); + + const result = await device.interactions.click(ref('@e1'), { session: 'default' }); + + assert.equal(captures, 1); + assert.deepEqual(calls, [{ x: 60, y: 40 }]); + assert.equal(result.kind, 'ref'); + assert.equal(result.node?.rect?.width, 100); +}); + +test('runtime typeText validates refs and forwards text to the backend primitive', async () => { + const calls: Array<{ text: string; delayMs?: number }> = []; + const device = createInteractionDevice(selectorSnapshot(), { + typeText: async (_context, text, options) => { + calls.push({ text, delayMs: options?.delayMs }); + }, + }); + + const result = await device.interactions.typeText('hello', { + session: 'default', + delayMs: 25, + }); + + assert.deepEqual(calls, [{ text: 'hello', delayMs: 25 }]); + assert.equal(result.kind, 'text'); + assert.equal(result.text, 'hello'); + assert.equal(result.delayMs, 25); + assert.equal(result.message, 'Typed 5 chars'); + + await assert.rejects( + () => device.interactions.typeText('@e1 hello', { session: 'default' }), + /type does not accept a target ref/, + ); +}); + +test('runtime interaction commands are available from the command namespace', async () => { + const device = createInteractionDevice(selectorSnapshot(), { + tap: async () => {}, + }); + + const result = await commands.interactions.click(device, { + session: 'default', + target: selector('label=Continue'), + }); + + assert.equal(result.kind, 'selector'); +}); + +function selectorSnapshot(): SnapshotState { + return makeSnapshotState([ + { + index: 0, + depth: 0, + type: 'Button', + label: 'Continue', + value: 'Continue', + rect: { x: 10, y: 20, width: 100, height: 40 }, + hittable: true, + }, + ]); +} + +function fillableSnapshot(): SnapshotState { + return makeSnapshotState([ + { + index: 0, + depth: 0, + type: 'XCUIElementTypeTextField', + label: 'Email', + rect: { x: 20, y: 10, width: 60, height: 40 }, + hittable: true, + }, + ]); +} + +function createInteractionDevice( + snapshot: SnapshotState, + overrides: Partial> & { + platform?: AgentDeviceBackend['platform']; + sessionMetadata?: Record; + } = {}, +) { + return createAgentDevice({ + backend: { + platform: overrides.platform ?? 'ios', + captureSnapshot: async (...args) => + overrides.captureSnapshot ? await overrides.captureSnapshot(...args) : { snapshot }, + tap: async (...args) => await overrides.tap?.(...args), + fill: async (...args) => await overrides.fill?.(...args), + typeText: async (...args) => await overrides.typeText?.(...args), + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([ + { name: 'default', snapshot, metadata: overrides.sessionMetadata }, + ]), + policy: localCommandPolicy(), + }); +} diff --git a/src/__tests__/runtime-public.test.ts b/src/__tests__/runtime-public.test.ts new file mode 100644 index 00000000..b8fb8a89 --- /dev/null +++ b/src/__tests__/runtime-public.test.ts @@ -0,0 +1,424 @@ +import assert from 'node:assert/strict'; +import fs from 'node:fs'; +import os from 'node:os'; +import path from 'node:path'; +import { test } from 'vitest'; +import { + createAgentDevice, + createMemorySessionStore, + createLocalArtifactAdapter, + commands as rootCommands, + assertBackendCapabilityAllowed, + localCommandPolicy, + restrictedCommandPolicy, + selector as rootSelector, + type AgentDevice, + type CommandSessionStore, +} from '../index.ts'; +import { + BACKEND_CAPABILITY_NAMES, + hasBackendCapability, + type AgentDeviceBackend, +} from '../backend.ts'; +import { + commandCatalog, + commands, + createCommandRouter, + type ScreenshotCommandOptions, +} from '../commands/index.ts'; +import type { ArtifactAdapter, FileInputRef, FileOutputRef } from '../io.ts'; +import { + commandConformanceSuites, + runCommandConformance, + type CommandConformanceTarget, +} from '../testing/conformance.ts'; + +const backend = { + platform: 'ios', + captureScreenshot: async () => {}, + typeText: async () => {}, +} satisfies AgentDeviceBackend; + +const artifacts = { + resolveInput: async (ref: FileInputRef) => ({ + path: ref.kind === 'path' ? ref.path : `/tmp/upload-${ref.id}`, + }), + reserveOutput: async (ref: FileOutputRef | undefined, options) => ({ + path: ref?.kind === 'path' ? ref.path : `/tmp/${options.field}${options.ext}`, + visibility: options.visibility ?? 'client-visible', + publish: async () => undefined, + }), + createTempFile: async (options) => ({ + path: `/tmp/${options.prefix}${options.ext}`, + visibility: 'internal', + cleanup: async () => {}, + }), +} satisfies ArtifactAdapter; + +const sessions = { + get: () => undefined, + set: () => {}, +} satisfies CommandSessionStore; + +test('package root exposes command runtime skeleton', async () => { + const device: AgentDevice = createAgentDevice({ + backend, + artifacts, + }); + + assert.equal(device.backend.platform, 'ios'); + assert.equal(device.policy.allowLocalInputPaths, false); + assert.equal(typeof device.capture.screenshot, 'function'); + assert.equal(typeof device.interactions.click, 'function'); + assert.equal('apps' in device, false); + const result = await device.capture.screenshot({}); + assert.equal(result.path, '/tmp/path.png'); +}); + +test('runtime screenshot command reserves output and calls backend primitive', async () => { + let captured: + | { + path: string; + fullscreen?: boolean; + surface?: string; + } + | undefined; + const device = createAgentDevice({ + backend: { + ...backend, + captureScreenshot: async (_context, path, options) => { + captured = { + path, + fullscreen: options?.fullscreen, + surface: options?.surface, + }; + }, + }, + artifacts, + sessions, + policy: localCommandPolicy(), + }); + + const result = await device.capture.screenshot({ + out: { kind: 'path', path: '/tmp/screen.png' }, + fullscreen: true, + surface: 'menubar', + }); + + assert.deepEqual(captured, { + path: '/tmp/screen.png', + fullscreen: true, + surface: 'menubar', + }); + assert.deepEqual(result, { + path: '/tmp/screen.png', + message: 'Saved screenshot: /tmp/screen.png', + }); +}); + +test('runtime screenshot command cleans reserved output when publish fails', async () => { + let cleanupCalled = false; + const device = createAgentDevice({ + backend, + artifacts: { + ...artifacts, + reserveOutput: async (ref: FileOutputRef | undefined, options) => ({ + path: ref?.kind === 'path' ? ref.path : `/tmp/${options.field}${options.ext}`, + visibility: options.visibility ?? 'client-visible', + publish: async () => { + throw new Error('publish failed'); + }, + cleanup: async () => { + cleanupCalled = true; + }, + }), + }, + sessions, + policy: localCommandPolicy(), + }); + + await assert.rejects( + () => device.capture.screenshot({ out: { kind: 'path', path: '/tmp/screen.png' } }), + /publish failed/, + ); + + assert.equal(cleanupCalled, true); +}); + +test('public runtime policy helpers expose local and restricted defaults', async () => { + assert.equal(typeof createLocalArtifactAdapter, 'function'); + assert.equal(rootCommands.capture.screenshot, commands.capture.screenshot); + assert.deepEqual(rootSelector('label=Continue'), { + kind: 'selector', + selector: 'label=Continue', + }); + assert.equal(localCommandPolicy().allowLocalInputPaths, true); + assert.equal(localCommandPolicy().allowLocalOutputPaths, true); + assert.equal(restrictedCommandPolicy().allowLocalInputPaths, false); + assert.equal(restrictedCommandPolicy({ allowLocalInputPaths: true }).allowLocalInputPaths, true); + const store = createMemorySessionStore([{ name: 'default' }]); + assert.equal((await store.get('default'))?.name, 'default'); +}); + +test('local artifact adapter marks command outputs and temp files by visibility', async () => { + const adapter = createLocalArtifactAdapter(); + const output = await adapter.reserveOutput(undefined, { + field: 'path', + ext: '.png', + visibility: 'client-visible', + }); + const temp = await adapter.createTempFile({ + prefix: 'agent-device-test', + ext: '.txt', + }); + + assert.equal(output.visibility, 'client-visible'); + assert.equal(temp.visibility, 'internal'); + + await output.cleanup?.(); + await temp.cleanup(); +}); + +test('local artifact adapter can constrain explicit local paths to a root', async () => { + const root = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-local-root-')); + try { + const adapter = createLocalArtifactAdapter({ cwd: root, rootDir: root }); + + assert.deepEqual( + await adapter.resolveInput({ kind: 'path', path: 'input.png' }, { usage: 'test' }), + { + path: path.join(root, 'input.png'), + }, + ); + await assert.rejects( + () => adapter.resolveInput({ kind: 'path', path: '../outside.png' }, { usage: 'test' }), + /outside the artifact adapter root/, + ); + await assert.rejects( + () => + adapter.reserveOutput( + { kind: 'path', path: path.join(path.dirname(root), 'outside.png') }, + { field: 'path', ext: '.png' }, + ), + /outside the artifact adapter root/, + ); + } finally { + fs.rmSync(root, { recursive: true, force: true }); + } +}); + +test('named backend capabilities require backend support and policy allowance', () => { + const supportedRuntime = createAgentDevice({ + backend: { + platform: 'android', + capabilities: ['android.shell'], + escapeHatches: { + androidShell: async () => ({ exitCode: 0, stdout: '', stderr: '' }), + }, + }, + artifacts, + policy: restrictedCommandPolicy({ allowNamedBackendCapabilities: ['android.shell'] }), + }); + + assert.doesNotThrow(() => assertBackendCapabilityAllowed(supportedRuntime, 'android.shell')); + + const policyBlockedRuntime = createAgentDevice({ + backend: { + platform: 'android', + capabilities: ['android.shell'], + escapeHatches: { + androidShell: async () => ({ exitCode: 0, stdout: '', stderr: '' }), + }, + }, + artifacts, + }); + + assert.throws( + () => assertBackendCapabilityAllowed(policyBlockedRuntime, 'android.shell'), + /not allowed by command policy/, + ); + + assert.throws( + () => assertBackendCapabilityAllowed(supportedRuntime, 'ios.runnerCommand'), + /not supported by this backend/, + ); + + const missingMethodRuntime = createAgentDevice({ + backend: { platform: 'android', capabilities: ['android.shell'] }, + artifacts, + policy: restrictedCommandPolicy({ allowNamedBackendCapabilities: ['android.shell'] }), + }); + + assert.throws( + () => assertBackendCapabilityAllowed(missingMethodRuntime, 'android.shell'), + /does not implement its escape hatch method/, + ); +}); + +test('memory session store does not expose mutable record references', async () => { + const store = createMemorySessionStore([ + { + name: 'default', + appName: 'Demo', + snapshot: { + nodes: [{ ref: 'e1', index: 0, depth: 0, label: 'Initial' }], + createdAt: 1, + }, + }, + ]); + const record = await store.get('default'); + assert.equal(record?.appName, 'Demo'); + + if (record) { + record.appName = 'Mutated'; + if (record.snapshot) record.snapshot.nodes[0]!.label = 'Mutated'; + } + + assert.equal((await store.get('default'))?.appName, 'Demo'); + assert.equal((await store.get('default'))?.snapshot?.nodes[0]?.label, 'Initial'); + + const next = { + name: 'default', + snapshot: { + nodes: [{ ref: 'e1', index: 0, depth: 0, label: 'Stored' }], + createdAt: 2, + }, + }; + await store.set(next); + next.snapshot.nodes[0]!.label = 'Mutated after set'; + + assert.equal((await store.get('default'))?.snapshot?.nodes[0]?.label, 'Stored'); + const list = await store.list?.(); + if (list?.[0]?.snapshot) list[0].snapshot.nodes[0]!.label = 'Mutated from list'; + assert.equal((await store.get('default'))?.snapshot?.nodes[0]?.label, 'Stored'); +}); + +test('runtime commands work with async command session stores', async () => { + const records = new Map>>(); + records.set('default', { name: 'default' }); + const asyncStore = { + get: async (name) => records.get(name), + set: async (record) => { + records.set(record.name, record); + }, + } satisfies CommandSessionStore; + const device = createAgentDevice({ + backend: { + platform: 'ios', + captureSnapshot: async () => ({ + snapshot: { + nodes: [{ ref: 'e1', index: 0, depth: 0, label: 'Ready' }], + createdAt: 1, + }, + }), + }, + artifacts, + sessions: asyncStore, + policy: localCommandPolicy(), + }); + + const result = await device.capture.snapshot({ session: 'default' }); + + assert.equal(result.nodes[0]?.label, 'Ready'); + assert.equal(records.get('default')?.snapshot?.nodes[0]?.label, 'Ready'); +}); + +test('public backend, commands, io, and conformance subpaths are importable', () => { + const options = { + out: { kind: 'path', path: '/tmp/screen.png' }, + } satisfies ScreenshotCommandOptions; + const target = { + name: 'fake', + createRuntime: () => + createAgentDevice({ + backend, + artifacts, + sessions, + }), + } satisfies CommandConformanceTarget; + + assert.equal(BACKEND_CAPABILITY_NAMES.includes('android.shell'), true); + assert.equal(hasBackendCapability(backend, 'android.shell'), false); + assert.equal( + hasBackendCapability({ platform: 'android', capabilities: ['android.shell'] }, 'android.shell'), + true, + ); + assert.equal(options.out.kind, 'path'); + assert.equal(typeof commands.capture.screenshot, 'function'); + assert.equal(typeof commands.capture.diffScreenshot, 'function'); + assert.equal(typeof commands.capture.snapshot, 'function'); + assert.equal(typeof commands.capture.diffSnapshot, 'function'); + assert.equal(typeof commands.selectors.find, 'function'); + assert.equal(typeof commands.selectors.get, 'function'); + assert.equal(typeof commands.selectors.getText, 'function'); + assert.equal(typeof commands.selectors.is, 'function'); + assert.equal(typeof commands.selectors.isVisible, 'function'); + assert.equal(typeof commands.selectors.wait, 'function'); + assert.equal(typeof commands.selectors.waitForText, 'function'); + assert.equal(typeof commands.interactions.click, 'function'); + assert.equal(typeof commands.interactions.press, 'function'); + assert.equal(typeof commands.interactions.fill, 'function'); + assert.equal(typeof commands.interactions.typeText, 'function'); + assert.equal( + commandCatalog.some((entry) => entry.command === 'click' && entry.status === 'implemented'), + true, + ); + assert.equal(commandConformanceSuites.length, 3); + assert.equal(typeof runCommandConformance, 'function'); + assert.equal(target.name, 'fake'); +}); + +test('command router dispatches implemented runtime commands and normalizes errors', async () => { + const router = createCommandRouter({ + createRuntime: () => + createAgentDevice({ + backend, + artifacts, + sessions, + }), + }); + + const ok = await router.dispatch({ + command: 'capture.screenshot', + options: {}, + }); + assert.equal(ok.ok, true); + assert.equal(ok.ok && 'path' in ok.data ? ok.data.path : undefined, '/tmp/path.png'); + + const failure = await router.dispatch({ + command: 'capture.diffScreenshot', + options: { + baseline: { kind: 'path', path: '/tmp/baseline.png' }, + }, + }); + assert.equal(failure.ok, false); + assert.equal(failure.ok ? undefined : failure.error.code, 'INVALID_ARGS'); + + const unsupportedInteraction = await router.dispatch({ + command: 'interactions.click', + options: { + target: { kind: 'point', x: 1, y: 2 }, + }, + }); + assert.equal(unsupportedInteraction.ok, false); + assert.equal( + unsupportedInteraction.ok ? undefined : unsupportedInteraction.error.code, + 'UNSUPPORTED_OPERATION', + ); + + const typed = await router.dispatch({ + command: 'interactions.typeText', + options: { + text: 'hello', + }, + }); + assert.equal(typed.ok, true); + assert.equal(typed.ok && 'text' in typed.data ? typed.data.text : undefined, 'hello'); + + const planned = await router.dispatch({ + command: 'alert', + options: {}, + } as never); + assert.equal(planned.ok, false); + assert.equal(planned.ok ? undefined : planned.error.code, 'NOT_IMPLEMENTED'); +}); diff --git a/src/__tests__/runtime-selector-read.test.ts b/src/__tests__/runtime-selector-read.test.ts new file mode 100644 index 00000000..fe1fc76a --- /dev/null +++ b/src/__tests__/runtime-selector-read.test.ts @@ -0,0 +1,269 @@ +import assert from 'node:assert/strict'; +import { test } from 'vitest'; +import type { + AgentDeviceBackend, + BackendSnapshotOptions, + BackendSnapshotResult, +} from '../backend.ts'; +import { createLocalArtifactAdapter } from '../io.ts'; +import { + createAgentDevice, + createMemorySessionStore, + localCommandPolicy, + type CommandSessionStore, +} from '../runtime.ts'; +import { ref, selector } from '../commands/index.ts'; +import type { SnapshotState } from '../utils/snapshot.ts'; +import { makeSnapshotState } from './test-utils/index.ts'; + +test('runtime get reads text from a selector target', async () => { + const snapshot = selectorSnapshot(); + const device = createSelectorDevice(snapshot, { + readText: 'Backend expanded text', + }); + + const result = await device.selectors.get({ + session: 'default', + property: 'text', + target: { kind: 'selector', selector: 'label=Continue' }, + }); + + assert.equal(result.kind, 'text'); + assert.deepEqual(result.target, { kind: 'selector', selector: 'label=Continue' }); + assert.equal(result.text, 'Backend expanded text'); + assert.equal(result.node.label, 'Continue'); + assert.deepEqual(result.selectorChain, [ + 'role="button" label="Continue"', + 'label="Continue"', + 'value="Continue"', + ]); +}); + +test('runtime get selector target captures fresh snapshot without a stored session snapshot', async () => { + const snapshot = selectorSnapshot(); + const sessions = createMemorySessionStore([{ name: 'default' }]); + let captures = 0; + const device = createAgentDevice({ + backend: { + platform: 'ios', + captureSnapshot: async () => { + captures += 1; + return { snapshot }; + }, + readText: async () => ({ text: 'Fresh text' }), + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + }); + + const result = await device.selectors.getText(selector('label=Continue'), { + session: 'default', + }); + + assert.equal(result.kind, 'text'); + assert.equal(result.text, 'Fresh text'); + assert.equal(captures, 1); + assert.equal((await sessions.get('default'))?.snapshot?.nodes[0]?.label, 'Continue'); +}); + +test('runtime get returns attrs for a ref target without recapturing', async () => { + const snapshot = selectorSnapshot(); + let captures = 0; + const device = createSelectorDevice(snapshot, { + captureSnapshot: () => { + captures += 1; + return { snapshot }; + }, + }); + + const result = await device.selectors.get({ + session: 'default', + property: 'attrs', + target: { kind: 'ref', ref: '@e1' }, + }); + + assert.equal(result.kind, 'attrs'); + assert.deepEqual(result.target, { kind: 'ref', ref: '@e1' }); + assert.equal(result.node.label, 'Continue'); + assert.equal(captures, 0); +}); + +test('runtime selectors pass runtime signal to backend snapshot capture', async () => { + const snapshot = selectorSnapshot(); + const controller = new AbortController(); + let signal: AbortSignal | undefined; + const device = createAgentDevice({ + backend: { + platform: 'ios', + captureSnapshot: async (context) => { + signal = context.signal; + return { snapshot }; + }, + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([{ name: 'default', snapshot }]), + policy: localCommandPolicy(), + signal: controller.signal, + }); + + const result = await device.selectors.getAttrs(selector('label=Continue'), { + session: 'default', + }); + + assert.equal(result.kind, 'attrs'); + assert.equal(signal, controller.signal); +}); + +test('runtime selectors forward public snapshot options to backend capture', async () => { + const snapshot = selectorSnapshot(); + let captureOptions: BackendSnapshotOptions | undefined; + const device = createAgentDevice({ + backend: { + platform: 'ios', + captureSnapshot: async (_context, options) => { + captureOptions = options; + return { snapshot }; + }, + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([{ name: 'default', snapshot }]), + policy: localCommandPolicy(), + }); + + await device.selectors.is({ + session: 'default', + predicate: 'exists', + selector: 'label=Continue', + depth: 2, + scope: 'Login', + raw: true, + }); + + assert.deepEqual(captureOptions, { + interactiveOnly: false, + compact: false, + depth: 2, + scope: 'Login', + raw: true, + }); +}); + +test('runtime is validates selector predicates', async () => { + const device = createSelectorDevice(selectorSnapshot()); + + const result = await device.selectors.is({ + session: 'default', + predicate: 'exists', + selector: 'label=Continue', + }); + + assert.deepEqual(result, { + predicate: 'exists', + pass: true, + selector: 'label=Continue', + matches: 1, + selectorChain: ['label=Continue'], + }); +}); + +test('runtime find get_text reads the matched node', async () => { + const device = createSelectorDevice(selectorSnapshot(), { + readText: 'Continue', + }); + + const result = await device.selectors.find({ + session: 'default', + locator: 'text', + query: 'Continue', + action: 'get_text', + }); + + assert.equal(result.kind, 'text'); + assert.equal(result.ref, '@e1'); + assert.equal(result.text, 'Continue'); + assert.equal(result.node.label, 'Continue'); +}); + +test('runtime wait can use backend text search', async () => { + const device = createSelectorDevice(selectorSnapshot(), { + findText: true, + now: 10, + }); + + const result = await device.selectors.wait({ + session: 'default', + target: { kind: 'text', text: 'Ready', timeoutMs: 100 }, + }); + + assert.deepEqual(result, { kind: 'text', text: 'Ready', waitedMs: 0 }); +}); + +test('runtime selector convenience methods use explicit target helpers', async () => { + const device = createSelectorDevice(selectorSnapshot(), { + readText: 'Continue', + findText: true, + }); + + const text = await device.selectors.getText(selector('label=Continue'), { session: 'default' }); + const attrs = await device.selectors.getAttrs(ref('@e1'), { session: 'default' }); + const visible = await device.selectors.isVisible(selector('label=Continue'), { + session: 'default', + }); + const waited = await device.selectors.waitForText('Ready', { + session: 'default', + timeoutMs: 100, + }); + + assert.equal(text.kind, 'text'); + assert.equal(attrs.kind, 'attrs'); + assert.equal(visible.pass, true); + assert.deepEqual(waited, { kind: 'text', text: 'Ready', waitedMs: 0 }); +}); + +function selectorSnapshot(): SnapshotState { + return makeSnapshotState([ + { + index: 0, + depth: 0, + type: 'Button', + label: 'Continue', + value: 'Continue', + rect: { x: 10, y: 20, width: 100, height: 40 }, + }, + ]); +} + +function createSelectorDevice( + snapshot: SnapshotState, + options: { + readText?: string; + findText?: boolean; + now?: number; + captureSnapshot?: () => BackendSnapshotResult | Promise; + } = {}, +) { + const session = { name: 'default', snapshot }; + const sessions = { + get: () => session, + set: (record) => { + session.snapshot = record.snapshot ?? session.snapshot; + }, + } satisfies CommandSessionStore; + return createAgentDevice({ + backend: { + platform: 'ios', + captureSnapshot: async () => + options.captureSnapshot ? await options.captureSnapshot() : { snapshot }, + readText: async () => ({ text: options.readText ?? '' }), + findText: async () => ({ found: options.findText ?? false }), + } satisfies AgentDeviceBackend, + artifacts: createLocalArtifactAdapter(), + sessions, + policy: localCommandPolicy(), + clock: { + now: () => options.now ?? 0, + sleep: async () => {}, + }, + }); +} diff --git a/src/__tests__/runtime-snapshot.test.ts b/src/__tests__/runtime-snapshot.test.ts new file mode 100644 index 00000000..86e10d40 --- /dev/null +++ b/src/__tests__/runtime-snapshot.test.ts @@ -0,0 +1,259 @@ +import assert from 'node:assert/strict'; +import { test } from 'vitest'; +import type { AgentDeviceBackend, BackendSnapshotResult } from '../backend.ts'; +import { createLocalArtifactAdapter } from '../io.ts'; +import { createAgentDevice, localCommandPolicy, type CommandSessionStore } from '../runtime.ts'; +import { makeSnapshotState } from './test-utils/index.ts'; + +test('runtime snapshot captures nodes and updates the session baseline', async () => { + let stored: Parameters[0] | undefined; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + snapshot: makeSnapshotState([{ index: 0, depth: 0, type: 'Window', label: 'Home' }], { + backend: 'xctest', + }), + appName: 'Demo', + appBundleId: 'com.example.demo', + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => undefined, + set: (record) => { + stored = record; + }, + }, + policy: localCommandPolicy(), + }); + + const result = await device.capture.snapshot({ session: 'default' }); + + assert.equal(result.nodes[0]?.label, 'Home'); + assert.equal(result.truncated, false); + assert.equal(result.appName, 'Demo'); + assert.equal(result.appBundleId, 'com.example.demo'); + assert.equal(stored?.snapshot?.nodes[0]?.label, 'Home'); +}); + +test('runtime diff snapshot initializes and then compares against session baseline', async () => { + const session = { + name: 'default', + snapshot: makeSnapshotState([{ index: 0, depth: 0, type: 'Window', label: 'Before' }]), + }; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + snapshot: makeSnapshotState([{ index: 0, depth: 0, type: 'Window', label: 'After' }]), + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => session, + set: (record) => { + session.snapshot = record.snapshot!; + }, + }, + policy: localCommandPolicy(), + }); + + const result = await device.capture.diffSnapshot({ session: 'default' }); + + assert.equal(result.baselineInitialized, false); + assert.equal(result.summary.additions, 1); + assert.equal(result.summary.removals, 1); + assert.equal(session.snapshot.nodes[0]?.label, 'After'); +}); + +test('runtime diff snapshot initializes baseline when no previous snapshot exists', async () => { + let stored: Parameters[0] | undefined; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + snapshot: makeSnapshotState([{ index: 0, depth: 0, type: 'Window', label: 'Initial' }]), + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => undefined, + set: (record) => { + stored = record; + }, + }, + policy: localCommandPolicy(), + }); + + const result = await device.capture.diffSnapshot({ session: 'default' }); + + assert.equal(result.baselineInitialized, true); + assert.deepEqual(result.summary, { additions: 0, removals: 0, unchanged: 1 }); + assert.deepEqual(result.lines, []); + assert.equal(stored?.snapshot?.nodes[0]?.label, 'Initial'); +}); + +test('runtime snapshot emits filtered Android guidance from backend analysis', async () => { + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + nodes: [], + truncated: false, + backend: 'android', + analysis: { + rawNodeCount: 42, + maxDepth: 6, + }, + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => undefined, + set: () => {}, + }, + policy: localCommandPolicy(), + }); + + const result = await device.capture.snapshot({ + session: 'default', + interactiveOnly: true, + depth: 3, + }); + + assert.deepEqual(result.warnings, [ + 'Interactive snapshot is empty after filtering 42 raw Android nodes. Likely causes: depth too low, transient route change, or collector filtering.', + 'Interactive output is empty at depth 3; retry without -d.', + ]); +}); + +test('runtime snapshot stale-drop warning uses the runtime clock', async () => { + const session = { + name: 'default', + snapshot: makeSnapshotState( + Array.from({ length: 20 }, (_, index) => ({ + index, + depth: 0, + type: 'Text', + label: `Before ${index}`, + })), + { backend: 'android' }, + ), + }; + session.snapshot.createdAt = 1_000; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + nodes: [{ ref: 'e1', index: 0, depth: 0, type: 'Text', label: 'After' }], + truncated: false, + backend: 'android', + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => session, + set: (record) => { + session.snapshot = record.snapshot!; + }, + }, + policy: localCommandPolicy(), + clock: { + now: () => 1_500, + sleep: async () => {}, + }, + }); + + const result = await device.capture.snapshot({ session: 'default' }); + + assert.deepEqual(result.warnings, [ + 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', + ]); +}); + +test('runtime snapshot stale-drop warning uses backend snapshot timestamps when supplied', async () => { + const session = { + name: 'default', + snapshot: makeSnapshotState( + Array.from({ length: 20 }, (_, index) => ({ + index, + depth: 0, + type: 'Text', + label: `Before ${index}`, + })), + { backend: 'android' }, + ), + }; + session.snapshot.createdAt = 10_000; + const currentSnapshot = makeSnapshotState( + [{ index: 0, depth: 0, type: 'Text', label: 'After' }], + { + backend: 'android', + }, + ); + currentSnapshot.createdAt = 11_500; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + snapshot: currentSnapshot, + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => session, + set: (record) => { + session.snapshot = record.snapshot!; + }, + }, + policy: localCommandPolicy(), + clock: { + now: () => 1_000_000, + sleep: async () => {}, + }, + }); + + const result = await device.capture.snapshot({ session: 'default' }); + + assert.deepEqual(result.warnings, [ + 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', + ]); +}); + +test('runtime snapshot stale-drop warning falls back to runtime clock on backend clock skew', async () => { + const session = { + name: 'default', + snapshot: makeSnapshotState( + Array.from({ length: 20 }, (_, index) => ({ + index, + depth: 0, + type: 'Text', + label: `Before ${index}`, + })), + { backend: 'android' }, + ), + }; + session.snapshot.createdAt = 10_000; + const currentSnapshot = makeSnapshotState( + [{ index: 0, depth: 0, type: 'Text', label: 'After' }], + { + backend: 'android', + }, + ); + currentSnapshot.createdAt = 8_500; + const device = createAgentDevice({ + backend: createSnapshotBackend(() => ({ + snapshot: currentSnapshot, + })), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: () => session, + set: (record) => { + session.snapshot = record.snapshot!; + }, + }, + policy: localCommandPolicy(), + clock: { + now: () => 11_500, + sleep: async () => {}, + }, + }); + + const result = await device.capture.snapshot({ session: 'default' }); + + assert.deepEqual(result.warnings, [ + 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', + ]); +}); + +function createSnapshotBackend( + captureSnapshot: () => BackendSnapshotResult | Promise, +): AgentDeviceBackend { + return { + platform: 'ios', + captureSnapshot: async () => await captureSnapshot(), + }; +} diff --git a/src/backend.ts b/src/backend.ts new file mode 100644 index 00000000..784659e4 --- /dev/null +++ b/src/backend.ts @@ -0,0 +1,194 @@ +import type { + Point, + ScreenshotOverlayRef, + SnapshotNode, + SnapshotOptions, + SnapshotState, +} from './utils/snapshot.ts'; + +export type AgentDeviceBackendPlatform = 'ios' | 'android' | 'macos' | 'linux'; + +export const BACKEND_CAPABILITY_NAMES = [ + 'android.shell', + 'ios.runnerCommand', + 'macos.desktopScreenshot', +] as const; + +export type BackendCapabilityName = (typeof BACKEND_CAPABILITY_NAMES)[number]; + +export type BackendCapabilitySet = readonly BackendCapabilityName[]; + +export type BackendCommandContext = { + session?: string; + requestId?: string; + appId?: string; + appBundleId?: string; + signal?: AbortSignal; + metadata?: Record; +}; + +export type BackendSnapshotResult = { + nodes?: SnapshotNode[]; + truncated?: boolean; + backend?: string; + snapshot?: SnapshotState; + analysis?: BackendSnapshotAnalysis; + freshness?: BackendSnapshotFreshness; + warnings?: string[]; + appName?: string; + appBundleId?: string; +}; + +export type BackendSnapshotOptions = SnapshotOptions & { + outPath?: string; +}; + +export type BackendSnapshotAnalysis = { + rawNodeCount?: number; + maxDepth?: number; +}; + +export type BackendSnapshotFreshness = { + action: string; + retryCount: number; + staleAfterRetries: boolean; + reason?: 'empty-interactive' | 'sharp-drop' | 'stuck-route'; +}; + +export type BackendReadTextResult = { + text: string; +}; + +export type BackendFindTextResult = { + found: boolean; +}; + +export type BackendScreenshotOptions = { + fullscreen?: boolean; + overlayRefs?: boolean; + surface?: 'app' | 'frontmost-app' | 'desktop' | 'menubar'; +}; + +export type BackendScreenshotResult = { + path?: string; + overlayRefs?: ScreenshotOverlayRef[]; +}; + +export type BackendActionResult = Record | void; + +export type BackendTapOptions = { + button?: 'primary' | 'secondary' | 'middle'; + count?: number; + intervalMs?: number; + holdMs?: number; + jitterPx?: number; + doubleTap?: boolean; +}; + +export type BackendFillOptions = { + delayMs?: number; +}; + +export type BackendOpenTarget = { + app?: string; + url?: string; + activity?: string; +}; + +export type BackendInstallTarget = { + app: string; + artifactPath: string; +}; + +export type BackendShellResult = { + exitCode: number; + stdout: string; + stderr: string; +}; + +export type BackendRunnerCommand = { + command: string; + args?: readonly string[]; + payload?: Record; +}; + +export type BackendEscapeHatches = { + androidShell?( + context: BackendCommandContext, + args: readonly string[], + ): Promise; + iosRunnerCommand?( + context: BackendCommandContext, + command: BackendRunnerCommand, + ): Promise; + macosDesktopScreenshot?( + context: BackendCommandContext, + outPath: string, + options?: BackendScreenshotOptions, + ): Promise; +}; + +export const BACKEND_CAPABILITY_ESCAPE_HATCH_METHODS = { + 'android.shell': 'androidShell', + 'ios.runnerCommand': 'iosRunnerCommand', + 'macos.desktopScreenshot': 'macosDesktopScreenshot', +} as const satisfies Record; + +export type AgentDeviceBackend = { + platform: AgentDeviceBackendPlatform; + capabilities?: BackendCapabilitySet; + escapeHatches?: BackendEscapeHatches; + captureSnapshot?( + context: BackendCommandContext, + options?: BackendSnapshotOptions, + ): Promise; + captureScreenshot?( + context: BackendCommandContext, + outPath: string, + options?: BackendScreenshotOptions, + ): Promise; + readText?(context: BackendCommandContext, node: SnapshotNode): Promise; + findText?(context: BackendCommandContext, text: string): Promise; + tap?( + context: BackendCommandContext, + point: Point, + options?: BackendTapOptions, + ): Promise; + fill?( + context: BackendCommandContext, + point: Point, + text: string, + options?: BackendFillOptions, + ): Promise; + typeText?( + context: BackendCommandContext, + text: string, + options?: { delayMs?: number }, + ): Promise; + pressKey?( + context: BackendCommandContext, + key: string, + options?: { modifiers?: string[] }, + ): Promise; + openApp?(context: BackendCommandContext, target: BackendOpenTarget): Promise; + closeApp?(context: BackendCommandContext, app?: string): Promise; + installApp?( + context: BackendCommandContext, + target: BackendInstallTarget, + ): Promise; +}; + +export function hasBackendCapability( + backend: Pick, + capability: BackendCapabilityName, +): boolean { + return backend.capabilities?.includes(capability) ?? false; +} + +export function hasBackendEscapeHatch( + backend: Pick, + capability: BackendCapabilityName, +): boolean { + const method = BACKEND_CAPABILITY_ESCAPE_HATCH_METHODS[capability]; + return typeof backend.escapeHatches?.[method] === 'function'; +} diff --git a/src/cli/commands/screenshot.ts b/src/cli/commands/screenshot.ts index 0f6077f1..acf8d85a 100644 --- a/src/cli/commands/screenshot.ts +++ b/src/cli/commands/screenshot.ts @@ -1,11 +1,11 @@ -import fs from 'node:fs'; -import os from 'node:os'; -import path from 'node:path'; import { formatScreenshotDiffText, formatSnapshotDiffText } from '../../utils/output.ts'; import { AppError } from '../../utils/errors.ts'; -import { compareScreenshots, type ScreenshotDiffResult } from '../../utils/screenshot-diff.ts'; -import { attachCurrentOverlayMatches } from '../../utils/screenshot-diff-overlay-matches.ts'; import { resolveUserPath } from '../../utils/path-resolution.ts'; +import type { AgentDeviceBackend } from '../../backend.ts'; +import type { AgentDeviceClient } from '../../client.ts'; +import { createLocalArtifactAdapter } from '../../io.ts'; +import { createAgentDevice, localCommandPolicy } from '../../runtime.ts'; +import type { CliFlags } from '../../utils/command-schema.ts'; import { buildSelectionOptions, writeCommandOutput } from './shared.ts'; import type { ClientCommandHandler } from './router.ts'; @@ -60,87 +60,66 @@ export const diffCommand: ClientCommandHandler = async ({ positionals, flags, cl ); } - let thresholdNum = 0.1; - if (flags.threshold != null && flags.threshold !== '') { - thresholdNum = Number(flags.threshold); - if (Number.isNaN(thresholdNum) || thresholdNum < 0 || thresholdNum > 1) { - throw new AppError('INVALID_ARGS', '--threshold must be a number between 0 and 1'); - } - } - - if (currentRaw) { - if (flags.overlayRefs) { - throw new AppError( - 'INVALID_ARGS', - 'diff screenshot cannot use --overlay-refs because saved-image comparisons have no live accessibility refs', - ); - } - const result = await compareScreenshots(baselinePath, resolveUserPath(currentRaw), { - threshold: thresholdNum, - outputPath, - }); - writeCommandOutput(flags, result, () => formatScreenshotDiffText(result)); - return true; - } - - const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-diff-current-')); - const tmpScreenshotPath = path.join(tmpDir, `current-${Date.now()}.png`); - const screenshotResult = await client.capture.screenshot({ path: tmpScreenshotPath }); - const currentPath = screenshotResult.path; + const runtime = createAgentDevice({ + backend: createClientScreenshotBackend(client, flags), + artifacts: createLocalArtifactAdapter(), + sessions: { + get: (name) => ({ name }), + set: () => {}, + }, + policy: localCommandPolicy(), + }); - let result: ScreenshotDiffResult; - try { - result = await compareScreenshots(baselinePath, currentPath, { - threshold: thresholdNum, - outputPath, - }); - if (flags.overlayRefs && !result.match && !result.dimensionMismatch) { - const overlayResult = await client.capture.screenshot({ - path: outputPath ? deriveCurrentOverlayPath(outputPath) : undefined, - overlayRefs: true, - }); - result = { - ...result, - currentOverlayPath: overlayResult.path, - ...(overlayResult.overlayRefs - ? { currentOverlayRefCount: overlayResult.overlayRefs.length } - : {}), - ...(result.regions && overlayResult.overlayRefs - ? { - regions: attachCurrentOverlayMatches(result.regions, overlayResult.overlayRefs), - } - : {}), - }; - } else if (flags.overlayRefs && outputPath) { - removeStaleCurrentOverlay(outputPath); - } - } finally { - try { - fs.unlinkSync(currentPath); - } catch {} - try { - fs.rmSync(tmpDir, { recursive: true, force: true }); - } catch {} - } + const result = await runtime.capture.diffScreenshot({ + session: flags.session, + baseline: { kind: 'path', path: baselinePath }, + current: currentRaw ? { kind: 'path', path: resolveUserPath(currentRaw) } : { kind: 'live' }, + ...(outputPath ? { out: { kind: 'path', path: outputPath } } : {}), + threshold: parseCliThreshold(flags.threshold), + overlayRefs: flags.overlayRefs, + surface: flags.surface, + }); writeCommandOutput(flags, result, () => formatScreenshotDiffText(result)); return true; }; -function deriveCurrentOverlayPath(outputPath: string): string { - const extension = path.extname(outputPath); - const base = extension ? outputPath.slice(0, -extension.length) : outputPath; - return `${base}.current-overlay${extension || '.png'}`; +function createClientScreenshotBackend( + client: AgentDeviceClient, + flags: CliFlags, +): AgentDeviceBackend { + return { + platform: resolveClientBackendPlatform(flags), + captureScreenshot: async (context, outPath, options) => { + const result = await client.capture.screenshot({ + path: outPath, + session: context.session, + overlayRefs: options?.overlayRefs, + fullscreen: options?.fullscreen, + surface: options?.surface, + }); + return { + path: result.path, + ...(result.overlayRefs ? { overlayRefs: result.overlayRefs } : {}), + }; + }, + }; } -function removeStaleCurrentOverlay(outputPath: string): void { - try { - fs.unlinkSync(deriveCurrentOverlayPath(outputPath)); - } catch (error) { - if (!isFsError(error, 'ENOENT')) throw error; +function resolveClientBackendPlatform(flags: CliFlags): AgentDeviceBackend['platform'] { + switch (flags.platform) { + case 'android': + case 'linux': + case 'macos': + return flags.platform; + case 'ios': + case 'apple': + default: + return 'ios'; } } -function isFsError(error: unknown, code: string): error is NodeJS.ErrnoException { - return typeof error === 'object' && error !== null && 'code' in error && error.code === code; +function parseCliThreshold(threshold: string | undefined): number | undefined { + if (threshold == null || threshold === '') return undefined; + return Number(threshold); } diff --git a/src/client-types.ts b/src/client-types.ts index 3c3cc050..3d1e3102 100644 --- a/src/client-types.ts +++ b/src/client-types.ts @@ -311,6 +311,7 @@ export type CaptureScreenshotOptions = AgentDeviceRequestOverrides & { path?: string; overlayRefs?: boolean; fullscreen?: boolean; + surface?: 'app' | 'frontmost-app' | 'desktop' | 'menubar'; }; export type CaptureScreenshotResult = { diff --git a/src/commands/capture-diff-screenshot.ts b/src/commands/capture-diff-screenshot.ts new file mode 100644 index 00000000..810719fd --- /dev/null +++ b/src/commands/capture-diff-screenshot.ts @@ -0,0 +1,257 @@ +import { promises as fs } from 'node:fs'; +import path from 'node:path'; +import type { BackendScreenshotOptions, BackendScreenshotResult } from '../backend.ts'; +import type { + ArtifactDescriptor, + FileInputRef, + FileOutputRef, + ReservedOutputFile, + ResolvedInputFile, +} from '../io.ts'; +import type { AgentDeviceRuntime, CommandContext } from '../runtime.ts'; +import { AppError } from '../utils/errors.ts'; +import { compareScreenshots, type ScreenshotDiffResult } from '../utils/screenshot-diff.ts'; +import { attachCurrentOverlayMatches } from '../utils/screenshot-diff-overlay-matches.ts'; +import type { RuntimeCommand } from './index.ts'; +import { createCommandTempFile, reserveCommandOutput, resolveCommandInput } from './io-policy.ts'; + +export type LiveScreenshotInputRef = { + kind: 'live'; +}; + +export type DiffScreenshotCommandOptions = CommandContext & { + baseline: FileInputRef; + current?: FileInputRef | LiveScreenshotInputRef; + out?: FileOutputRef; + currentOverlayOut?: FileOutputRef; + threshold?: number; + overlayRefs?: boolean; + surface?: BackendScreenshotOptions['surface']; +}; + +export type DiffScreenshotCommandResult = ScreenshotDiffResult & { + artifacts?: ArtifactDescriptor[]; +}; + +const DEFAULT_SCREENSHOT_DIFF_THRESHOLD = 0.1; + +export const diffScreenshotCommand: RuntimeCommand< + DiffScreenshotCommandOptions, + DiffScreenshotCommandResult +> = async (runtime, options): Promise => { + if (!options.baseline) { + throw new AppError('INVALID_ARGS', 'diff screenshot requires a baseline image'); + } + + const threshold = normalizeThreshold(options.threshold); + const currentRef = options.current ?? { kind: 'live' }; + if (options.overlayRefs && !isLiveCurrentRef(currentRef)) { + throw new AppError( + 'INVALID_ARGS', + 'diff screenshot cannot use --overlay-refs because saved-image comparisons have no live accessibility refs', + ); + } + + const baseline = await resolveCommandInput(runtime, options.baseline, { + usage: 'diff screenshot baseline', + field: 'baseline', + }); + let current: ResolvedInputFile | undefined; + let liveCurrent: ResolvedInputFile | undefined; + let output: ReservedOutputFile | undefined; + const artifacts: ArtifactDescriptor[] = []; + + try { + let currentPath: string; + if (isLiveCurrentRef(currentRef)) { + liveCurrent = await captureLiveCurrentScreenshot(runtime, options); + currentPath = liveCurrent.path; + } else { + current = await resolveCommandInput(runtime, currentRef, { + usage: 'diff screenshot current', + field: 'current', + }); + currentPath = current.path; + } + + output = options.out + ? await reserveCommandOutput(runtime, options.out, { + field: 'diffPath', + ext: '.png', + }) + : undefined; + + let result: ScreenshotDiffResult = await compareScreenshots(baseline.path, currentPath, { + threshold, + outputPath: output?.path, + maxPixels: runtime.policy.maxImagePixels, + }); + + if (isLiveCurrentRef(currentRef)) { + result = await maybeAttachCurrentOverlay(runtime, options, output?.path, result, artifacts); + } + + const diffArtifact = result.diffPath ? await output?.publish() : undefined; + if (diffArtifact) artifacts.push(diffArtifact); + if (!result.diffPath) await output?.cleanup?.(); + + return { + ...result, + ...(artifacts.length > 0 ? { artifacts } : {}), + }; + } catch (error) { + await output?.cleanup?.(); + throw error; + } finally { + await baseline.cleanup?.(); + await current?.cleanup?.(); + await liveCurrent?.cleanup?.(); + } +}; + +function normalizeThreshold(threshold: unknown): number { + if (threshold == null || threshold === '') return DEFAULT_SCREENSHOT_DIFF_THRESHOLD; + const value = Number(threshold); + if (Number.isNaN(value) || value < 0 || value > 1) { + throw new AppError('INVALID_ARGS', '--threshold must be a number between 0 and 1'); + } + return value; +} + +async function captureLiveCurrentScreenshot( + runtime: AgentDeviceRuntime, + options: DiffScreenshotCommandOptions, +): Promise { + const temp = await createCommandTempFile(runtime, { + prefix: 'agent-device-diff-current', + ext: '.png', + }); + try { + await captureScreenshot(runtime, options, temp.path, screenshotSurfaceOptions(options)); + } catch (error) { + await temp.cleanup(); + throw error; + } + return temp; +} + +async function maybeAttachCurrentOverlay( + runtime: AgentDeviceRuntime, + options: DiffScreenshotCommandOptions, + diffOutputPath: string | undefined, + result: ScreenshotDiffResult, + artifacts: ArtifactDescriptor[], +): Promise { + if (!options.overlayRefs) return result; + if (result.match || result.dimensionMismatch) { + if (diffOutputPath) await removeStaleCurrentOverlay(diffOutputPath); + return result; + } + + const overlayOutputRef = resolveCurrentOverlayOutputRef(options, diffOutputPath); + const overlayOutput = await reserveCommandOutput(runtime, overlayOutputRef, { + field: 'currentOverlayPath', + ext: '.png', + }); + + try { + const overlayResult = await captureScreenshot(runtime, options, overlayOutput.path, { + overlayRefs: true, + ...screenshotSurfaceOptions(options), + }); + const overlayArtifact = await overlayOutput.publish(); + if (overlayArtifact) artifacts.push(overlayArtifact); + + return { + ...result, + currentOverlayPath: overlayResult.path ?? overlayOutput.path, + ...(overlayResult.overlayRefs + ? { currentOverlayRefCount: overlayResult.overlayRefs.length } + : {}), + ...(result.regions && overlayResult.overlayRefs + ? { + regions: attachCurrentOverlayMatches(result.regions, overlayResult.overlayRefs), + } + : {}), + }; + } catch (error) { + await overlayOutput.cleanup?.(); + throw error; + } +} + +async function captureScreenshot( + runtime: AgentDeviceRuntime, + options: CommandContext, + outPath: string, + screenshotOptions: BackendScreenshotOptions = {}, +): Promise { + if (!runtime.backend.captureScreenshot) { + throw new AppError('UNSUPPORTED_OPERATION', 'screenshot is not supported by this backend'); + } + return ( + (await runtime.backend.captureScreenshot( + { + session: options.session, + requestId: options.requestId, + signal: options.signal ?? runtime.signal, + metadata: options.metadata, + }, + outPath, + screenshotOptions, + )) ?? {} + ); +} + +function screenshotSurfaceOptions( + options: Pick, +): BackendScreenshotOptions { + return options.surface ? { surface: options.surface } : {}; +} + +function resolveCurrentOverlayOutputRef( + options: DiffScreenshotCommandOptions, + diffOutputPath: string | undefined, +): FileOutputRef | undefined { + if (options.currentOverlayOut) return options.currentOverlayOut; + if (options.out?.kind === 'path') { + return { + kind: 'path', + path: deriveCurrentOverlayPath(diffOutputPath ?? options.out.path), + }; + } + if (options.out?.kind === 'downloadableArtifact') { + return { + kind: 'downloadableArtifact', + ...(options.out.clientPath + ? { clientPath: deriveCurrentOverlayPath(options.out.clientPath) } + : {}), + ...(options.out.fileName ? { fileName: deriveCurrentOverlayPath(options.out.fileName) } : {}), + }; + } + return undefined; +} + +function deriveCurrentOverlayPath(outputPath: string): string { + const extension = path.extname(outputPath); + const base = extension ? outputPath.slice(0, -extension.length) : outputPath; + return `${base}.current-overlay${extension || '.png'}`; +} + +async function removeStaleCurrentOverlay(outputPath: string): Promise { + try { + await fs.unlink(deriveCurrentOverlayPath(outputPath)); + } catch (error) { + if (!isFsError(error, 'ENOENT')) throw error; + } +} + +function isLiveCurrentRef( + inputRef: FileInputRef | LiveScreenshotInputRef, +): inputRef is LiveScreenshotInputRef { + return inputRef.kind === 'live'; +} + +function isFsError(error: unknown, code: string): error is NodeJS.ErrnoException { + return typeof error === 'object' && error !== null && 'code' in error && error.code === code; +} diff --git a/src/commands/capture-screenshot.ts b/src/commands/capture-screenshot.ts new file mode 100644 index 00000000..882f2c7a --- /dev/null +++ b/src/commands/capture-screenshot.ts @@ -0,0 +1,55 @@ +import { AppError } from '../utils/errors.ts'; +import { successText } from '../utils/success-text.ts'; +import type { ArtifactDescriptor } from '../io.ts'; +import type { RuntimeCommand, ScreenshotCommandOptions } from './index.ts'; +import { reserveCommandOutput } from './io-policy.ts'; + +export type ScreenshotCommandResult = { + path: string; + artifacts?: ArtifactDescriptor[]; + message?: string; +}; + +export const screenshotCommand: RuntimeCommand< + ScreenshotCommandOptions, + ScreenshotCommandResult +> = async (runtime, options): Promise => { + if (!runtime.backend.captureScreenshot) { + throw new AppError('UNSUPPORTED_OPERATION', 'screenshot is not supported by this backend'); + } + + const reserved = await reserveCommandOutput(runtime, options.out, { + field: 'path', + ext: '.png', + }); + + let artifact: ArtifactDescriptor | undefined; + try { + await runtime.backend.captureScreenshot( + { + session: options.session, + requestId: options.requestId, + appId: options.appId, + appBundleId: options.appBundleId, + signal: options.signal ?? runtime.signal, + metadata: options.metadata, + }, + reserved.path, + { + fullscreen: options.fullscreen, + overlayRefs: options.overlayRefs, + surface: options.surface, + }, + ); + artifact = await reserved.publish(); + } catch (error) { + await reserved.cleanup?.(); + throw error; + } + + return { + path: reserved.path, + ...(artifact ? { artifacts: [artifact] } : {}), + ...successText(`Saved screenshot: ${reserved.path}`), + }; +}; diff --git a/src/commands/capture-snapshot.ts b/src/commands/capture-snapshot.ts new file mode 100644 index 00000000..8767b8f4 --- /dev/null +++ b/src/commands/capture-snapshot.ts @@ -0,0 +1,268 @@ +import type { BackendSnapshotResult } from '../backend.ts'; +import type { AgentDeviceRuntime, CommandSessionRecord } from '../runtime.ts'; +import { AppError } from '../utils/errors.ts'; +import { buildSnapshotDiff, countSnapshotComparableLines } from '../utils/snapshot-diff.ts'; +import type { SnapshotNode, SnapshotState, SnapshotVisibility } from '../utils/snapshot.ts'; +import { buildSnapshotVisibility } from '../utils/snapshot-visibility.ts'; +import type { + DiffSnapshotCommandOptions, + RuntimeCommand, + SnapshotCommandOptions, +} from './index.ts'; + +export type SnapshotCommandResult = { + nodes: SnapshotNode[]; + truncated: boolean; + appName?: string; + appBundleId?: string; + visibility?: SnapshotVisibility; + warnings?: string[]; +}; + +export type SnapshotDiffLine = { + kind: 'added' | 'removed' | 'unchanged'; + text: string; +}; + +export type SnapshotDiffSummary = { + additions: number; + removals: number; + unchanged: number; +}; + +export type DiffSnapshotCommandResult = { + mode: 'snapshot'; + baselineInitialized: boolean; + summary: SnapshotDiffSummary; + lines: SnapshotDiffLine[]; + warnings?: string[]; +}; + +type SnapshotCapture = { + snapshot: SnapshotState; + result: BackendSnapshotResult; + session: CommandSessionRecord | undefined; + warnings: string[]; +}; + +export const snapshotCommand: RuntimeCommand< + SnapshotCommandOptions, + SnapshotCommandResult +> = async (runtime, options): Promise => { + const capture = await captureRuntimeSnapshot(runtime, options); + await runtime.sessions.set(nextSnapshotSession(options.session, capture)); + return { + nodes: capture.snapshot.nodes, + truncated: capture.snapshot.truncated ?? false, + visibility: buildSnapshotVisibility({ + nodes: capture.snapshot.nodes, + backend: capture.snapshot.backend, + snapshotRaw: options.raw, + }), + ...(capture.warnings.length > 0 ? { warnings: capture.warnings } : {}), + ...snapshotAppFields(capture), + }; +}; + +export const diffSnapshotCommand: RuntimeCommand< + DiffSnapshotCommandOptions, + DiffSnapshotCommandResult +> = async (runtime, options): Promise => { + const capture = await captureRuntimeSnapshot(runtime, options); + const flattenForDiff = options.interactiveOnly === true; + const previousSnapshot = capture.session?.snapshot; + const nextSession = nextSnapshotSession(options.session, capture); + + if (!previousSnapshot) { + const unchanged = countSnapshotComparableLines(capture.snapshot.nodes, { + flatten: flattenForDiff, + }); + await runtime.sessions.set(nextSession); + return { + mode: 'snapshot', + baselineInitialized: true, + summary: { + additions: 0, + removals: 0, + unchanged, + }, + lines: [], + ...(capture.warnings.length > 0 ? { warnings: capture.warnings } : {}), + }; + } + + const diff = buildSnapshotDiff(previousSnapshot.nodes, capture.snapshot.nodes, { + flatten: flattenForDiff, + }); + await runtime.sessions.set(nextSession); + return { + mode: 'snapshot', + baselineInitialized: false, + summary: diff.summary, + lines: diff.lines, + ...(capture.warnings.length > 0 ? { warnings: capture.warnings } : {}), + }; +}; + +async function captureRuntimeSnapshot( + runtime: AgentDeviceRuntime, + options: SnapshotCommandOptions, +): Promise { + if (!runtime.backend.captureSnapshot) { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot is not supported by this backend'); + } + + const sessionName = options.session ?? 'default'; + const session = await runtime.sessions.get(sessionName); + const result = await runtime.backend.captureSnapshot( + { + session: sessionName, + requestId: options.requestId, + appId: session?.appId, + appBundleId: session?.appBundleId, + signal: options.signal ?? runtime.signal, + metadata: options.metadata, + }, + { + interactiveOnly: options.interactiveOnly, + compact: options.compact, + depth: options.depth, + scope: options.scope, + raw: options.raw, + }, + ); + const snapshot = normalizeBackendSnapshot(result, runtime); + const warningTime = now(runtime); + return { + snapshot, + result, + session, + warnings: buildSnapshotWarnings({ + result, + snapshot, + options, + session, + capturedAt: snapshot.createdAt ?? warningTime, + runtimeNow: warningTime, + }), + }; +} + +function normalizeBackendSnapshot( + result: BackendSnapshotResult, + runtime: AgentDeviceRuntime, +): SnapshotState { + if (result.snapshot) return result.snapshot; + return { + nodes: result.nodes ?? [], + truncated: result.truncated, + backend: result.backend as SnapshotState['backend'], + createdAt: now(runtime), + }; +} + +function nextSnapshotSession( + requestedName: string | undefined, + capture: SnapshotCapture, +): CommandSessionRecord { + const name = capture.session?.name ?? requestedName ?? 'default'; + return { + ...(capture.session ?? { name }), + name, + snapshot: capture.snapshot, + appName: capture.result.appName ?? capture.session?.appName, + appBundleId: capture.result.appBundleId ?? capture.session?.appBundleId, + }; +} + +function snapshotAppFields(capture: SnapshotCapture): { + appName?: string; + appBundleId?: string; +} { + const appName = capture.result.appName ?? capture.session?.appName; + const appBundleId = capture.result.appBundleId ?? capture.session?.appBundleId; + return { + ...(appName || appBundleId ? { appName: appName ?? appBundleId } : {}), + ...(appBundleId ? { appBundleId } : {}), + }; +} + +function buildSnapshotWarnings(params: { + result: BackendSnapshotResult; + snapshot: SnapshotState; + options: SnapshotCommandOptions; + session: CommandSessionRecord | undefined; + capturedAt: number; + runtimeNow: number; +}): string[] { + const warnings = [...(params.result.warnings ?? [])]; + const interactiveOnly = params.options.interactiveOnly === true; + const analysis = params.result.analysis; + + if ( + params.snapshot.backend === 'android' && + interactiveOnly && + params.snapshot.nodes.length === 0 && + analysis && + (analysis.rawNodeCount ?? 0) >= 12 + ) { + warnings.push( + `Interactive snapshot is empty after filtering ${analysis.rawNodeCount} raw Android nodes. Likely causes: depth too low, transient route change, or collector filtering.`, + ); + if ( + typeof params.options.depth === 'number' && + typeof analysis.maxDepth === 'number' && + analysis.maxDepth >= params.options.depth + 2 + ) { + warnings.push( + `Interactive output is empty at depth ${params.options.depth}; retry without -d.`, + ); + } + } + + const previousSnapshot = params.session?.snapshot; + const isRecentSnapshot = previousSnapshot + ? [params.capturedAt, params.runtimeNow].some((timestamp) => { + const elapsed = timestamp - previousSnapshot.createdAt; + return elapsed >= 0 && elapsed <= 2_000; + }) + : false; + if ( + !params.result.freshness && + previousSnapshot && + isRecentSnapshot && + isLikelyStaleSnapshotDrop(previousSnapshot.nodes.length, params.snapshot.nodes.length) + ) { + warnings.push( + 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', + ); + } + + const freshness = params.result.freshness; + if (freshness?.staleAfterRetries && params.snapshot.backend === 'android') { + if (freshness.reason === 'stuck-route') { + warnings.push( + `Recent ${freshness.action} was followed by a nearly identical snapshot after ${freshness.retryCount} automatic retr${freshness.retryCount === 1 ? 'y' : 'ies'}. If you expected navigation or submit, the tree may still be stale. Use screenshot as visual truth, wait briefly, then re-snapshot once.`, + ); + } else if (freshness.reason === 'sharp-drop') { + warnings.push( + 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', + ); + } + } + + return uniqueStrings(warnings); +} + +function isLikelyStaleSnapshotDrop(previousCount: number, currentCount: number): boolean { + if (previousCount < 12) return false; + return currentCount <= Math.floor(previousCount * 0.2); +} + +function now(runtime: AgentDeviceRuntime): number { + return runtime.clock?.now() ?? Date.now(); +} + +function uniqueStrings(values: string[]): string[] { + return Array.from(new Set(values)); +} diff --git a/src/commands/catalog.ts b/src/commands/catalog.ts new file mode 100644 index 00000000..2b1230d7 --- /dev/null +++ b/src/commands/catalog.ts @@ -0,0 +1,63 @@ +export type CommandCatalogEntry = { + command: string; + category: + | 'portable-runtime' + | 'backend-admin' + | 'transport-session' + | 'environment' + | 'capability-gated'; + status: 'implemented' | 'planned'; +}; + +export const commandCatalog: readonly CommandCatalogEntry[] = [ + { command: 'screenshot', category: 'portable-runtime', status: 'implemented' }, + { command: 'diff screenshot', category: 'portable-runtime', status: 'implemented' }, + { command: 'snapshot', category: 'portable-runtime', status: 'implemented' }, + { command: 'diff snapshot', category: 'portable-runtime', status: 'implemented' }, + { command: 'find read-only', category: 'portable-runtime', status: 'implemented' }, + { command: 'get', category: 'portable-runtime', status: 'implemented' }, + { command: 'is', category: 'portable-runtime', status: 'implemented' }, + { command: 'wait', category: 'portable-runtime', status: 'implemented' }, + { command: 'alert', category: 'portable-runtime', status: 'planned' }, + { command: 'click', category: 'portable-runtime', status: 'implemented' }, + { command: 'press', category: 'portable-runtime', status: 'implemented' }, + { command: 'fill', category: 'portable-runtime', status: 'implemented' }, + { command: 'longpress', category: 'portable-runtime', status: 'planned' }, + { command: 'swipe', category: 'portable-runtime', status: 'planned' }, + { command: 'focus', category: 'portable-runtime', status: 'planned' }, + { command: 'type', category: 'portable-runtime', status: 'implemented' }, + { command: 'scroll', category: 'portable-runtime', status: 'planned' }, + { command: 'pinch', category: 'portable-runtime', status: 'planned' }, + { command: 'open', category: 'portable-runtime', status: 'planned' }, + { command: 'close', category: 'portable-runtime', status: 'planned' }, + { command: 'apps', category: 'portable-runtime', status: 'planned' }, + { command: 'appstate', category: 'portable-runtime', status: 'planned' }, + { command: 'back', category: 'portable-runtime', status: 'planned' }, + { command: 'home', category: 'portable-runtime', status: 'planned' }, + { command: 'rotate', category: 'portable-runtime', status: 'planned' }, + { command: 'app-switcher', category: 'portable-runtime', status: 'planned' }, + { command: 'keyboard', category: 'portable-runtime', status: 'planned' }, + { command: 'clipboard', category: 'portable-runtime', status: 'planned' }, + { command: 'settings', category: 'portable-runtime', status: 'planned' }, + { command: 'push', category: 'portable-runtime', status: 'planned' }, + { command: 'trigger-app-event', category: 'portable-runtime', status: 'planned' }, + { command: 'devices', category: 'backend-admin', status: 'planned' }, + { command: 'boot', category: 'backend-admin', status: 'planned' }, + { command: 'ensure-simulator', category: 'backend-admin', status: 'planned' }, + { command: 'install', category: 'backend-admin', status: 'planned' }, + { command: 'reinstall', category: 'backend-admin', status: 'planned' }, + { command: 'install-from-source', category: 'backend-admin', status: 'planned' }, + { command: 'session', category: 'transport-session', status: 'planned' }, + { command: 'connect', category: 'environment', status: 'planned' }, + { command: 'disconnect', category: 'environment', status: 'planned' }, + { command: 'connection', category: 'environment', status: 'planned' }, + { command: 'metro', category: 'environment', status: 'planned' }, + { command: 'record', category: 'capability-gated', status: 'planned' }, + { command: 'trace', category: 'capability-gated', status: 'planned' }, + { command: 'replay', category: 'capability-gated', status: 'planned' }, + { command: 'test', category: 'capability-gated', status: 'planned' }, + { command: 'batch', category: 'capability-gated', status: 'planned' }, + { command: 'logs', category: 'capability-gated', status: 'planned' }, + { command: 'network', category: 'capability-gated', status: 'planned' }, + { command: 'perf', category: 'capability-gated', status: 'planned' }, +]; diff --git a/src/commands/index.ts b/src/commands/index.ts new file mode 100644 index 00000000..8a66aa7c --- /dev/null +++ b/src/commands/index.ts @@ -0,0 +1,279 @@ +import type { FileOutputRef } from '../io.ts'; +import type { AgentDeviceRuntime, CommandContext } from '../runtime.ts'; +import { screenshotCommand, type ScreenshotCommandResult } from './capture-screenshot.ts'; +import { + diffScreenshotCommand, + type DiffScreenshotCommandOptions, + type DiffScreenshotCommandResult, +} from './capture-diff-screenshot.ts'; +import { + diffSnapshotCommand, + snapshotCommand, + type DiffSnapshotCommandResult, + type SnapshotCommandResult, +} from './capture-snapshot.ts'; +import { + findCommand, + getAttrsCommand, + getCommand, + getTextCommand, + isHiddenCommand, + isVisibleCommand, + isCommand, + waitCommand, + waitForTextCommand, + type ElementTarget, + type FindReadCommandOptions, + type FindReadCommandResult, + type GetAttrsCommandOptions, + type GetCommandOptions, + type GetCommandResult, + type GetTextCommandOptions, + type IsCommandOptions, + type IsCommandResult, + type IsSelectorCommandOptions, + type SelectorTarget, + type WaitCommandOptions, + type WaitCommandResult, + type WaitForTextCommandOptions, +} from './selector-read.ts'; +import { + clickCommand, + fillCommand, + pressCommand, + typeTextCommand, + type ClickCommandOptions, + type FillCommandOptions, + type FillCommandResult, + type InteractionTarget, + type PressCommandOptions, + type PressCommandResult, + type TypeTextCommandOptions, + type TypeTextCommandResult, +} from './interactions.ts'; + +export type { ScreenshotCommandResult } from './capture-screenshot.ts'; +export type { + DiffScreenshotCommandOptions, + DiffScreenshotCommandResult, + LiveScreenshotInputRef, +} from './capture-diff-screenshot.ts'; +export type { + DiffSnapshotCommandResult, + SnapshotCommandResult, + SnapshotDiffLine, + SnapshotDiffSummary, +} from './capture-snapshot.ts'; +export type { + FindReadCommandOptions, + FindReadCommandResult, + GetAttrsCommandOptions, + GetCommandOptions, + GetCommandResult, + GetTextCommandOptions, + IsCommandOptions, + IsCommandResult, + IsSelectorCommandOptions, + ElementTarget, + RefTarget, + ResolvedTarget, + SelectorTarget, + SelectorSnapshotOptions, + WaitCommandOptions, + WaitCommandResult, + WaitForTextCommandOptions, +} from './selector-read.ts'; +export type { + ClickCommandOptions, + FillCommandOptions, + FillCommandResult, + InteractionTarget, + PointTarget, + PressCommandOptions, + PressCommandResult, + TypeTextCommandOptions, + TypeTextCommandResult, +} from './interactions.ts'; +export { ref, selector } from './selector-read.ts'; +export { commandCatalog } from './catalog.ts'; +export type { CommandCatalogEntry } from './catalog.ts'; +export { createCommandRouter } from './router.ts'; +export type { + CommandRouter, + CommandRouterConfig, + CommandRouterRequest, + CommandRouterResponse, + CommandRouterResult, +} from './router.ts'; + +export type CommandResult = Record; +export type RuntimeCommand, TResult = CommandResult> = ( + runtime: AgentDeviceRuntime, + options: TOptions, +) => Promise; +export type BoundRuntimeCommand, TResult = CommandResult> = ( + options: TOptions, +) => Promise; + +export type ScreenshotCommandOptions = CommandContext & { + out?: FileOutputRef; + fullscreen?: boolean; + overlayRefs?: boolean; + appId?: string; + appBundleId?: string; + surface?: 'app' | 'frontmost-app' | 'desktop' | 'menubar'; +}; + +export type SnapshotCommandOptions = CommandContext & { + interactiveOnly?: boolean; + compact?: boolean; + depth?: number; + scope?: string; + raw?: boolean; +}; + +export type DiffSnapshotCommandOptions = SnapshotCommandOptions; + +export type AgentDeviceCommands = { + capture: { + screenshot: RuntimeCommand; + diffScreenshot: RuntimeCommand; + snapshot: RuntimeCommand; + diffSnapshot: RuntimeCommand; + }; + selectors: { + find: RuntimeCommand; + get: RuntimeCommand; + getText: RuntimeCommand>; + getAttrs: RuntimeCommand>; + is: RuntimeCommand; + isVisible: RuntimeCommand; + isHidden: RuntimeCommand; + wait: RuntimeCommand; + waitForText: RuntimeCommand< + WaitForTextCommandOptions, + Extract + >; + }; + interactions: { + click: RuntimeCommand; + press: RuntimeCommand; + fill: RuntimeCommand; + typeText: RuntimeCommand; + }; +}; + +export type BoundAgentDeviceCommands = { + capture: { + screenshot: BoundRuntimeCommand; + diffScreenshot: BoundRuntimeCommand; + snapshot: BoundRuntimeCommand; + diffSnapshot: BoundRuntimeCommand; + }; + selectors: { + find: BoundRuntimeCommand; + get: BoundRuntimeCommand; + getText: ( + target: ElementTarget, + options?: Omit, + ) => Promise>; + getAttrs: ( + target: ElementTarget, + options?: Omit, + ) => Promise>; + is: BoundRuntimeCommand; + isVisible: ( + target: SelectorTarget, + options?: Omit, + ) => Promise; + isHidden: ( + target: SelectorTarget, + options?: Omit, + ) => Promise; + wait: BoundRuntimeCommand; + waitForText: ( + text: string, + options?: Omit, + ) => Promise>; + }; + interactions: { + click: ( + target: InteractionTarget, + options?: Omit, + ) => Promise; + press: ( + target: InteractionTarget, + options?: Omit, + ) => Promise; + fill: ( + target: InteractionTarget, + text: string, + options?: Omit, + ) => Promise; + typeText: ( + text: string, + options?: Omit, + ) => Promise; + }; +}; + +export const commands: AgentDeviceCommands = { + capture: { + screenshot: screenshotCommand, + diffScreenshot: diffScreenshotCommand, + snapshot: snapshotCommand, + diffSnapshot: diffSnapshotCommand, + }, + selectors: { + find: findCommand, + get: getCommand, + getText: getTextCommand, + getAttrs: getAttrsCommand, + is: isCommand, + isVisible: isVisibleCommand, + isHidden: isHiddenCommand, + wait: waitCommand, + waitForText: waitForTextCommand, + }, + interactions: { + click: clickCommand, + press: pressCommand, + fill: fillCommand, + typeText: typeTextCommand, + }, +}; + +export function bindCommands(runtime: AgentDeviceRuntime): BoundAgentDeviceCommands { + return { + capture: { + screenshot: (options) => commands.capture.screenshot(runtime, options), + diffScreenshot: (options) => commands.capture.diffScreenshot(runtime, options), + snapshot: (options) => commands.capture.snapshot(runtime, options), + diffSnapshot: (options) => commands.capture.diffSnapshot(runtime, options), + }, + selectors: { + find: (options) => commands.selectors.find(runtime, options), + get: (options) => commands.selectors.get(runtime, options), + getText: (target, options = {}) => + commands.selectors.getText(runtime, { ...options, target }), + getAttrs: (target, options = {}) => + commands.selectors.getAttrs(runtime, { ...options, target }), + is: (options) => commands.selectors.is(runtime, options), + isVisible: (target, options = {}) => + commands.selectors.isVisible(runtime, { ...options, target }), + isHidden: (target, options = {}) => + commands.selectors.isHidden(runtime, { ...options, target }), + wait: (options) => commands.selectors.wait(runtime, options), + waitForText: (text, options = {}) => + commands.selectors.waitForText(runtime, { ...options, text }), + }, + interactions: { + click: (target, options = {}) => commands.interactions.click(runtime, { ...options, target }), + press: (target, options = {}) => commands.interactions.press(runtime, { ...options, target }), + fill: (target, text, options = {}) => + commands.interactions.fill(runtime, { ...options, target, text }), + typeText: (text, options = {}) => + commands.interactions.typeText(runtime, { ...options, text }), + }, + }; +} diff --git a/src/commands/interaction-targeting.ts b/src/commands/interaction-targeting.ts new file mode 100644 index 00000000..039256c4 --- /dev/null +++ b/src/commands/interaction-targeting.ts @@ -0,0 +1,139 @@ +import type { Rect, SnapshotNode } from '../utils/snapshot.ts'; +import { centerOfRect } from '../utils/snapshot.ts'; +import { containsPoint, pickLargestRect } from '../utils/rect-visibility.ts'; +import { findNearestHittableAncestor } from '../utils/snapshot-processing.ts'; + +export function resolveActionableTouchNode( + nodes: SnapshotNode[], + node: SnapshotNode, +): SnapshotNode { + const descendant = findPreferredActionableDescendant(nodes, node); + if (descendant?.rect && resolveRectCenter(descendant.rect)) { + return descendant; + } + const ancestor = findNearestHittableAncestor(nodes, node); + if (ancestor?.rect && resolveRectCenter(ancestor.rect)) { + if (isOverlyBroadAncestor(node, ancestor, nodes)) { + return node; + } + return ancestor; + } + return node; +} + +function resolveRectCenter(rect: Rect | undefined): { x: number; y: number } | null { + const normalized = normalizeRect(rect); + if (!normalized) return null; + const center = centerOfRect(normalized); + if (!Number.isFinite(center.x) || !Number.isFinite(center.y)) return null; + return center; +} + +function normalizeRect(rect: Rect | undefined): Rect | null { + if (!rect) return null; + const x = Number(rect.x); + const y = Number(rect.y); + const width = Number(rect.width); + const height = Number(rect.height); + if ( + !Number.isFinite(x) || + !Number.isFinite(y) || + !Number.isFinite(width) || + !Number.isFinite(height) + ) { + return null; + } + if (width < 0 || height < 0) return null; + return { x, y, width, height }; +} + +function findPreferredActionableDescendant( + nodes: SnapshotNode[], + node: SnapshotNode, +): SnapshotNode | null { + const targetRect = normalizeRect(node.rect); + if (!targetRect) return null; + + let current = node; + const visited = new Set(); + while (!visited.has(current.ref)) { + visited.add(current.ref); + const sameRectChildren = nodes.filter((candidate) => { + if (candidate.parentIndex !== current.index || !candidate.hittable) { + return false; + } + const candidateRect = normalizeRect(candidate.rect); + return candidateRect ? areRectsApproximatelyEqual(candidateRect, targetRect) : false; + }); + if (sameRectChildren.length !== 1) { + break; + } + current = sameRectChildren[0]; + } + + return current === node ? null : current; +} + +function areRectsApproximatelyEqual(left: Rect, right: Rect): boolean { + const tolerance = 0.5; + return ( + Math.abs(left.x - right.x) <= tolerance && + Math.abs(left.y - right.y) <= tolerance && + Math.abs(left.width - right.width) <= tolerance && + Math.abs(left.height - right.height) <= tolerance + ); +} + +function isOverlyBroadAncestor( + node: SnapshotNode, + ancestor: SnapshotNode, + nodes: SnapshotNode[], +): boolean { + const nodeRect = normalizeRect(node.rect); + const ancestorRect = normalizeRect(ancestor.rect); + if (!nodeRect || !ancestorRect) return false; + const rootViewportRect = resolveRootViewportRect(nodes, nodeRect); + if (!rootViewportRect) return false; + if (!isRectViewportSized(ancestorRect, rootViewportRect)) return false; + return !areRectsApproximatelyEqual(nodeRect, ancestorRect); +} + +function resolveRootViewportRect(nodes: SnapshotNode[], targetRect: Rect): Rect | null { + const targetCenter = centerOfRect(targetRect); + const viewportRects = nodes + .filter((node) => { + const type = (node.type ?? '').toLowerCase(); + return type.includes('application') || type.includes('window'); + }) + .map((node) => normalizeRect(node.rect)) + .filter((rect): rect is Rect => rect !== null); + if (viewportRects.length === 0) return null; + + const containingRects = viewportRects.filter((rect) => + containsPoint(rect, targetCenter.x, targetCenter.y), + ); + return pickLargestRect(containingRects.length > 0 ? containingRects : viewportRects); +} + +function isRectViewportSized(rect: Rect, viewportRect: Rect): boolean { + const overlapArea = intersectionArea(rect, viewportRect); + const rectArea = rect.width * rect.height; + const viewportArea = viewportRect.width * viewportRect.height; + if (overlapArea <= 0 || rectArea <= 0 || viewportArea <= 0) return false; + + const viewportCoverage = overlapArea / viewportArea; + const rectCoverage = overlapArea / rectArea; + return viewportCoverage >= 0.9 && rectCoverage >= 0.8; +} + +function intersectionArea(left: Rect, right: Rect): number { + const xOverlap = Math.max( + 0, + Math.min(left.x + left.width, right.x + right.width) - Math.max(left.x, right.x), + ); + const yOverlap = Math.max( + 0, + Math.min(left.y + left.height, right.y + right.height) - Math.max(left.y, right.y), + ); + return xOverlap * yOverlap; +} diff --git a/src/commands/interactions.ts b/src/commands/interactions.ts new file mode 100644 index 00000000..1d3c789f --- /dev/null +++ b/src/commands/interactions.ts @@ -0,0 +1,452 @@ +import { AppError } from '../utils/errors.ts'; +import type { Point, SnapshotNode, SnapshotState } from '../utils/snapshot.ts'; +import { centerOfRect, findNodeByRef, normalizeRef } from '../utils/snapshot.ts'; +import type { AgentDeviceRuntime, CommandContext } from '../runtime.ts'; +import { formatSelectorFailure, parseSelectorChain, resolveSelectorChain } from '../selectors.ts'; +import { buildSelectorChainForNode } from '../utils/selector-build.ts'; +import { findNodeByLabel, isFillableType, resolveRefLabel } from '../utils/snapshot-processing.ts'; +import { requireIntInRange } from '../utils/validation.ts'; +import { + isNodeVisibleInEffectiveViewport, + resolveEffectiveViewportRect, +} from '../utils/mobile-snapshot-semantics.ts'; +import { successText } from '../utils/success-text.ts'; +import { resolveActionableTouchNode } from './interaction-targeting.ts'; +import type { ElementTarget, ResolvedTarget } from './selector-read.ts'; +import { now, toBackendContext } from './selector-read-utils.ts'; +import type { RuntimeCommand } from './index.ts'; + +export type PointTarget = { + kind: 'point'; + x: number; + y: number; +}; + +export type InteractionTarget = ElementTarget | PointTarget; + +export type PressCommandOptions = CommandContext & { + target: InteractionTarget; + button?: 'primary' | 'secondary' | 'middle'; + count?: number; + intervalMs?: number; + holdMs?: number; + jitterPx?: number; + doubleTap?: boolean; +}; + +export type ClickCommandOptions = PressCommandOptions; + +type ResolvedInteractionTarget = + | { + kind: 'point'; + point: Point; + } + | { + kind: 'ref'; + point: Point; + target: Extract; + node: SnapshotNode; + selectorChain: string[]; + refLabel?: string; + } + | { + kind: 'selector'; + point: Point; + target: Extract; + node: SnapshotNode; + selectorChain: string[]; + refLabel?: string; + }; + +export type PressCommandResult = ResolvedInteractionTarget & { + backendResult?: Record; +}; + +export type FillCommandOptions = CommandContext & { + target: InteractionTarget; + text: string; + delayMs?: number; +}; + +export type FillCommandResult = ResolvedInteractionTarget & { + text: string; + warning?: string; + backendResult?: Record; +}; + +export type TypeTextCommandOptions = CommandContext & { + text: string; + delayMs?: number; +}; + +export type TypeTextCommandResult = { + kind: 'text'; + text: string; + delayMs: number; + backendResult?: Record; + message?: string; +}; + +type CapturedSnapshot = { + snapshot: SnapshotState; +}; + +export const pressCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => await tapCommand(runtime, options, 'press'); + +export const clickCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => await tapCommand(runtime, options, 'click'); + +async function tapCommand( + runtime: AgentDeviceRuntime, + options: PressCommandOptions, + action: 'click' | 'press', +): Promise { + const resolved = await resolveInteractionTarget(runtime, options, { + action, + requireInteractive: true, + promoteToHittableAncestor: true, + }); + if (!runtime.backend.tap) { + throw new AppError('UNSUPPORTED_OPERATION', 'tap is not supported by this backend'); + } + const backendResult = await runtime.backend.tap( + toBackendContext(runtime, options), + resolved.point, + { + button: options.button, + count: options.count, + intervalMs: options.intervalMs, + holdMs: options.holdMs, + jitterPx: options.jitterPx, + doubleTap: options.doubleTap, + }, + ); + return { + ...resolved, + ...(toBackendResult(backendResult) ? { backendResult: toBackendResult(backendResult) } : {}), + }; +} + +export const fillCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => { + if (!options.text) throw new AppError('INVALID_ARGS', 'fill requires text'); + const resolved = await resolveInteractionTarget(runtime, options, { + action: 'fill', + requireInteractive: true, + promoteToHittableAncestor: false, + }); + if (!runtime.backend.fill) { + throw new AppError('UNSUPPORTED_OPERATION', 'fill is not supported by this backend'); + } + const backendResult = await runtime.backend.fill( + toBackendContext(runtime, options), + resolved.point, + options.text, + { delayMs: options.delayMs }, + ); + const nodeType = 'node' in resolved ? (resolved.node.type ?? '') : ''; + const warning = + nodeType && !isFillableType(nodeType, runtime.backend.platform) + ? `fill target ${formatTargetForWarning(resolved)} resolved to "${nodeType}", attempting fill anyway.` + : undefined; + return { + ...resolved, + text: options.text, + ...(warning ? { warning } : {}), + ...(toBackendResult(backendResult) ? { backendResult: toBackendResult(backendResult) } : {}), + }; +}; + +export const typeTextCommand: RuntimeCommand< + TypeTextCommandOptions, + TypeTextCommandResult +> = async (runtime, options): Promise => { + const text = options.text; + if (!text) throw new AppError('INVALID_ARGS', 'type requires text'); + const mistargetedRef = findMistargetedTypeRef(text); + if (mistargetedRef) { + throw new AppError( + 'INVALID_ARGS', + `type does not accept a target ref like "${mistargetedRef}"`, + { + hint: `Use fill ${mistargetedRef} "text" to target that field, or press ${mistargetedRef} then type "text" to append.`, + }, + ); + } + if (!runtime.backend.typeText) { + throw new AppError('UNSUPPORTED_OPERATION', 'type is not supported by this backend'); + } + const delayMs = requireIntInRange(options.delayMs ?? 0, 'delay-ms', 0, 10_000); + const backendResult = await runtime.backend.typeText(toBackendContext(runtime, options), text, { + delayMs, + }); + const message = formatTextLengthMessage('Typed', text); + return { + kind: 'text', + text, + delayMs, + ...(toBackendResult(backendResult) ? { backendResult: toBackendResult(backendResult) } : {}), + ...successText(message), + }; +}; + +async function resolveInteractionTarget( + runtime: AgentDeviceRuntime, + options: CommandContext & { target: InteractionTarget }, + params: { + action: 'click' | 'press' | 'fill'; + requireInteractive: boolean; + promoteToHittableAncestor: boolean; + }, +): Promise { + await assertSupportedInteractionSurface(runtime, options, params.action); + + if (options.target.kind === 'point') { + return { + kind: 'point', + point: { x: options.target.x, y: options.target.y }, + }; + } + + if (options.target.kind === 'ref') { + const capture = await resolveSnapshotForRef(runtime, options, options.target); + const resolved = capture.resolved; + const node = params.promoteToHittableAncestor + ? resolveActionableTouchNode(capture.snapshot.nodes, resolved.node) + : resolved.node; + assertVisibleRefTarget(node, capture.snapshot.nodes, options.target.ref, params.action); + const point = resolveNodeCenter( + node, + `Ref ${options.target.ref} not found or has invalid bounds`, + ); + return { + kind: 'ref', + point, + target: { kind: 'ref', ref: `@${resolved.ref}` }, + node, + selectorChain: buildSelectorChainForNode(node, runtime.backend.platform, { + action: params.action === 'fill' ? 'fill' : 'click', + }), + refLabel: resolveRefLabel(node, capture.snapshot.nodes), + }; + } + + const capture = await captureInteractionSnapshot(runtime, options, params.requireInteractive); + const chain = parseSelectorChain(options.target.selector); + const resolved = resolveSelectorChain(capture.snapshot.nodes, chain, { + platform: runtime.backend.platform, + requireRect: true, + requireUnique: true, + disambiguateAmbiguous: true, + }); + if (!resolved || !resolved.node.rect) { + throw new AppError( + 'COMMAND_FAILED', + formatSelectorFailure(chain, resolved?.diagnostics ?? [], { unique: true }), + ); + } + const node = params.promoteToHittableAncestor + ? resolveActionableTouchNode(capture.snapshot.nodes, resolved.node) + : resolved.node; + const point = resolveNodeCenter( + node, + `Selector ${resolved.selector.raw} resolved to invalid bounds`, + ); + return { + kind: 'selector', + point, + target: { kind: 'selector', selector: resolved.selector.raw }, + node, + selectorChain: buildSelectorChainForNode(node, runtime.backend.platform, { + action: params.action === 'fill' ? 'fill' : 'click', + }), + refLabel: resolveRefLabel(node, capture.snapshot.nodes), + }; +} + +async function assertSupportedInteractionSurface( + runtime: AgentDeviceRuntime, + options: CommandContext, + action: 'click' | 'press' | 'fill', +): Promise { + if (runtime.backend.platform !== 'macos') return; + const surface = await resolveInteractionSurface(runtime, options); + if (surface !== 'desktop' && surface !== 'menubar') return; + // Menu bar button activation is supported by the existing daemon path; text entry is not. + if (surface === 'menubar' && (action === 'click' || action === 'press')) return; + throw new AppError( + 'UNSUPPORTED_OPERATION', + `${action} is not supported on macOS ${surface} sessions yet. Open an app session to act, or use the ${surface} surface to inspect.`, + ); +} + +async function resolveInteractionSurface( + runtime: AgentDeviceRuntime, + options: CommandContext, +): Promise { + const session = await runtime.sessions.get(options.session ?? 'default'); + return session?.metadata?.surface; +} + +async function captureInteractionSnapshot( + runtime: AgentDeviceRuntime, + options: CommandContext, + interactiveOnly: boolean, +): Promise { + if (!runtime.backend.captureSnapshot) { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot is not supported by this backend'); + } + const sessionName = options.session ?? 'default'; + const session = await runtime.sessions.get(sessionName); + if (!session) throw new AppError('SESSION_NOT_FOUND', 'No active session. Run open first.'); + const result = await runtime.backend.captureSnapshot(toBackendContext(runtime, options), { + interactiveOnly, + compact: interactiveOnly, + }); + const snapshot = + result.snapshot ?? + ({ + nodes: result.nodes ?? [], + truncated: result.truncated, + backend: result.backend as SnapshotState['backend'], + createdAt: now(runtime), + } satisfies SnapshotState); + await runtime.sessions.set({ ...session, snapshot }); + return { snapshot }; +} + +async function resolveSnapshotForRef( + runtime: AgentDeviceRuntime, + options: CommandContext, + target: Extract, +): Promise { + const sessionName = options.session ?? 'default'; + const session = await runtime.sessions.get(sessionName); + if (!session) throw new AppError('SESSION_NOT_FOUND', 'No active session. Run open first.'); + if (!session.snapshot) { + throw new AppError('INVALID_ARGS', 'No snapshot in session. Run snapshot first.'); + } + + const fallbackLabel = target.fallbackLabel ?? ''; + const stored = tryResolveRefNode(session.snapshot.nodes, target.ref, { + fallbackLabel, + requireRect: true, + }); + if (stored) { + return { snapshot: session.snapshot, resolved: stored }; + } + + const capture = await captureInteractionSnapshot(runtime, options, true); + const refreshed = tryResolveRefNode(capture.snapshot.nodes, target.ref, { + fallbackLabel, + requireRect: true, + }); + if (!refreshed) { + throw new AppError('COMMAND_FAILED', `Ref ${target.ref} not found or has no bounds`); + } + return { ...capture, resolved: refreshed }; +} + +function tryResolveRefNode( + nodes: SnapshotState['nodes'], + refInput: string, + options: { + fallbackLabel: string; + requireRect: boolean; + }, +): { ref: string; node: SnapshotNode } | null { + const ref = normalizeRef(refInput); + if (!ref) throw new AppError('INVALID_ARGS', `Invalid ref: ${refInput}`); + const refNode = findNodeByRef(nodes, ref); + if (isUsableResolvedNode(refNode, options.requireRect)) return { ref, node: refNode }; + const fallbackNode = + options.fallbackLabel.length > 0 ? findNodeByLabel(nodes, options.fallbackLabel) : null; + if (isUsableResolvedNode(fallbackNode, options.requireRect)) { + return { ref, node: fallbackNode }; + } + return null; +} + +function resolveNodeCenter(node: SnapshotNode, message: string): Point { + if (!node.rect) throw new AppError('COMMAND_FAILED', message); + const point = centerOfRect(node.rect); + if (!Number.isFinite(point.x) || !Number.isFinite(point.y)) { + throw new AppError('COMMAND_FAILED', message); + } + return point; +} + +function isUsableResolvedNode( + node: SnapshotNode | null | undefined, + requireRect: boolean, +): node is SnapshotNode { + if (!node) return false; + if (!requireRect) return true; + if (!node.rect) return false; + const { x, y, width, height } = node.rect; + if ( + !Number.isFinite(Number(x)) || + !Number.isFinite(Number(y)) || + !Number.isFinite(Number(width)) || + !Number.isFinite(Number(height)) || + Number(width) < 0 || + Number(height) < 0 + ) { + return false; + } + const point = centerOfRect(node.rect); + return Number.isFinite(point.x) && Number.isFinite(point.y); +} + +function assertVisibleRefTarget( + node: SnapshotNode, + nodes: SnapshotState['nodes'], + refInput: string, + action: 'click' | 'press' | 'fill', +): void { + const viewport = node.rect ? resolveEffectiveViewportRect(node, nodes) : null; + if (!node.rect || !viewport || isNodeVisibleInEffectiveViewport(node, nodes)) return; + throw new AppError('COMMAND_FAILED', `Ref ${refInput} is off-screen and not safe to ${action}`, { + reason: 'offscreen_ref', + ref: normalizeRef(refInput), + rect: node.rect, + viewport, + hint: `Use scroll with the direction from the off-screen summary, take a fresh snapshot, then retry ${action} with the new ref or a selector.`, + }); +} + +function toBackendResult(result: unknown): Record | undefined { + return result && typeof result === 'object' ? (result as Record) : undefined; +} + +function formatTargetForWarning(result: { + kind: FillCommandResult['kind']; + target?: ResolvedTarget; +}): string { + if (result.target?.kind === 'ref') return result.target.ref; + if (result.target?.kind === 'selector') return result.target.selector; + return 'point'; +} + +function findMistargetedTypeRef(text: string): string | null { + const first = text.trim().split(/\s+/, 1)[0]; + if (!first || !first.startsWith('@') || first.length < 3) { + return null; + } + const body = first.slice(1); + if (/^[A-Za-z_-]*\d[\w-]*$/i.test(body) || /^(?:ref|node|element|el)[\w-]*$/i.test(body)) { + return first; + } + return null; +} + +function formatTextLengthMessage(action: 'Typed' | 'Filled', text: string): string { + return `${action} ${Array.from(text).length} chars`; +} diff --git a/src/commands/io-policy.ts b/src/commands/io-policy.ts new file mode 100644 index 00000000..e08e79bb --- /dev/null +++ b/src/commands/io-policy.ts @@ -0,0 +1,63 @@ +import type { + CreateTempFileOptions, + FileInputRef, + FileOutputRef, + ReserveOutputOptions, + ReservedOutputFile, + ResolvedInputFile, + ResolveInputOptions, + TemporaryFile, +} from '../io.ts'; +import type { AgentDeviceRuntime } from '../runtime.ts'; +import { AppError, asAppError } from '../utils/errors.ts'; + +export async function resolveCommandInput( + runtime: AgentDeviceRuntime, + ref: FileInputRef, + options: ResolveInputOptions, +): Promise { + if (ref.kind === 'path' && !runtime.policy.allowLocalInputPaths) { + throw new AppError( + 'INVALID_ARGS', + `Local ${options.field ?? 'input'} paths are not allowed by command policy`, + ); + } + try { + return await runtime.artifacts.resolveInput(ref, options); + } catch (error) { + throw asAppError(error); + } +} + +export async function reserveCommandOutput( + runtime: AgentDeviceRuntime, + ref: FileOutputRef | undefined, + options: ReserveOutputOptions, +): Promise { + if (ref?.kind === 'path' && !runtime.policy.allowLocalOutputPaths) { + throw new AppError('INVALID_ARGS', 'Local output paths are not allowed by command policy'); + } + try { + return await runtime.artifacts.reserveOutput(ref, { + ...options, + visibility: options.visibility ?? 'client-visible', + requestedClientPath: + ref?.kind === 'downloadableArtifact' + ? (ref.clientPath ?? options.requestedClientPath) + : options.requestedClientPath, + }); + } catch (error) { + throw asAppError(error); + } +} + +export async function createCommandTempFile( + runtime: AgentDeviceRuntime, + options: CreateTempFileOptions, +): Promise { + try { + return await runtime.artifacts.createTempFile(options); + } catch (error) { + throw asAppError(error); + } +} diff --git a/src/commands/router.ts b/src/commands/router.ts new file mode 100644 index 00000000..1d6ffa1b --- /dev/null +++ b/src/commands/router.ts @@ -0,0 +1,226 @@ +import type { AgentDeviceRuntime } from '../runtime.ts'; +import { AppError, normalizeAgentDeviceError, type NormalizedError } from '../utils/errors.ts'; +import { screenshotCommand, type ScreenshotCommandResult } from './capture-screenshot.ts'; +import { + diffScreenshotCommand, + type DiffScreenshotCommandOptions, + type DiffScreenshotCommandResult, +} from './capture-diff-screenshot.ts'; +import { + diffSnapshotCommand, + snapshotCommand, + type DiffSnapshotCommandResult, + type SnapshotCommandResult, +} from './capture-snapshot.ts'; +import { + findCommand, + getCommand, + isCommand, + waitCommand, + type FindReadCommandOptions, + type FindReadCommandResult, + type GetCommandOptions, + type GetCommandResult, + type IsCommandOptions, + type IsCommandResult, + type WaitCommandOptions, + type WaitCommandResult, +} from './selector-read.ts'; +import { + clickCommand, + fillCommand, + pressCommand, + typeTextCommand, + type ClickCommandOptions, + type FillCommandOptions, + type FillCommandResult, + type PressCommandOptions, + type PressCommandResult, + type TypeTextCommandOptions, + type TypeTextCommandResult, +} from './interactions.ts'; +import type { + DiffSnapshotCommandOptions, + ScreenshotCommandOptions, + SnapshotCommandOptions, +} from './index.ts'; +import { commandCatalog } from './catalog.ts'; + +export type CommandRouterRequest = + | { + command: 'capture.screenshot'; + options: ScreenshotCommandOptions; + context?: TContext; + } + | { + command: 'capture.diffScreenshot'; + options: DiffScreenshotCommandOptions; + context?: TContext; + } + | { + command: 'capture.snapshot'; + options: SnapshotCommandOptions; + context?: TContext; + } + | { + command: 'capture.diffSnapshot'; + options: DiffSnapshotCommandOptions; + context?: TContext; + } + | { + command: 'selectors.find'; + options: FindReadCommandOptions; + context?: TContext; + } + | { + command: 'selectors.get'; + options: GetCommandOptions; + context?: TContext; + } + | { + command: 'selectors.is'; + options: IsCommandOptions; + context?: TContext; + } + | { + command: 'selectors.wait'; + options: WaitCommandOptions; + context?: TContext; + } + | { + command: 'interactions.click'; + options: ClickCommandOptions; + context?: TContext; + } + | { + command: 'interactions.press'; + options: PressCommandOptions; + context?: TContext; + } + | { + command: 'interactions.fill'; + options: FillCommandOptions; + context?: TContext; + } + | { + command: 'interactions.typeText'; + options: TypeTextCommandOptions; + context?: TContext; + }; + +export type CommandRouterResult = + | ScreenshotCommandResult + | DiffScreenshotCommandResult + | SnapshotCommandResult + | DiffSnapshotCommandResult + | FindReadCommandResult + | GetCommandResult + | IsCommandResult + | WaitCommandResult + | PressCommandResult + | FillCommandResult + | TypeTextCommandResult; + +export type CommandRouterResponse = + | { + ok: true; + data: CommandRouterResult; + } + | { + ok: false; + error: NormalizedError; + }; + +export type CommandRouter = { + dispatch(request: CommandRouterRequest): Promise; +}; + +export type CommandRouterConfig = { + createRuntime( + request: CommandRouterRequest, + ): AgentDeviceRuntime | Promise; + beforeDispatch?(request: CommandRouterRequest): void | Promise; + formatError?(error: unknown, request: CommandRouterRequest): NormalizedError; +}; + +export function createCommandRouter( + config: CommandRouterConfig, +): CommandRouter { + return { + dispatch: async (request) => { + try { + assertRouterCommandImplemented(request); + await config.beforeDispatch?.(request); + const runtime = await config.createRuntime(request); + return { ok: true, data: await dispatchRuntimeCommand(runtime, request) }; + } catch (error) { + return { + ok: false, + error: config.formatError?.(error, request) ?? normalizeAgentDeviceError(error), + }; + } + }, + }; +} + +const implementedRouterCommands = new Set([ + 'capture.screenshot', + 'capture.diffScreenshot', + 'capture.snapshot', + 'capture.diffSnapshot', + 'selectors.find', + 'selectors.get', + 'selectors.is', + 'selectors.wait', + 'interactions.click', + 'interactions.press', + 'interactions.fill', + 'interactions.typeText', +]); + +function assertRouterCommandImplemented(request: { command: string }): void { + if (implementedRouterCommands.has(request.command)) return; + const catalogEntry = commandCatalog.find((entry) => entry.command === request.command); + if (catalogEntry?.status === 'planned') { + throw new AppError( + 'NOT_IMPLEMENTED', + `Command ${request.command} is planned but not implemented in the runtime router yet`, + { command: request.command }, + ); + } + throw new AppError('UNSUPPORTED_OPERATION', `Unknown runtime command: ${request.command}`, { + command: request.command, + }); +} + +async function dispatchRuntimeCommand( + runtime: AgentDeviceRuntime, + request: CommandRouterRequest, +): Promise { + switch (request.command) { + case 'capture.screenshot': + return await screenshotCommand(runtime, request.options); + case 'capture.diffScreenshot': + return await diffScreenshotCommand(runtime, request.options); + case 'capture.snapshot': + return await snapshotCommand(runtime, request.options); + case 'capture.diffSnapshot': + return await diffSnapshotCommand(runtime, request.options); + case 'selectors.find': + return await findCommand(runtime, request.options); + case 'selectors.get': + return await getCommand(runtime, request.options); + case 'selectors.is': + return await isCommand(runtime, request.options); + case 'selectors.wait': + return await waitCommand(runtime, request.options); + case 'interactions.click': + return await clickCommand(runtime, request.options); + case 'interactions.press': + return await pressCommand(runtime, request.options); + case 'interactions.fill': + return await fillCommand(runtime, request.options); + case 'interactions.typeText': + return await typeTextCommand(runtime, request.options); + } +} diff --git a/src/commands/selector-read-shared.ts b/src/commands/selector-read-shared.ts new file mode 100644 index 00000000..e9b10f66 --- /dev/null +++ b/src/commands/selector-read-shared.ts @@ -0,0 +1,97 @@ +import type { AgentDeviceRuntime, CommandContext, CommandSessionRecord } from '../runtime.ts'; +import { AppError } from '../utils/errors.ts'; +import type { SnapshotNode, SnapshotState } from '../utils/snapshot.ts'; +import { findNodeByRef, normalizeRef } from '../utils/snapshot.ts'; +import { extractReadableText } from '../utils/text-surface.ts'; +import { findNodeByLabel, now, toBackendContext } from './selector-read-utils.ts'; + +export type CapturedSnapshot = { + sessionName: string; + session?: CommandSessionRecord; + snapshot: SnapshotState; +}; + +export type SelectorSnapshotOptions = { + depth?: number; + scope?: string; + raw?: boolean; +}; + +export async function requireSnapshotSession( + runtime: AgentDeviceRuntime, + requestedName: string | undefined, +): Promise { + const sessionName = requestedName ?? 'default'; + const session = await runtime.sessions.get(sessionName); + if (!session) throw new AppError('SESSION_NOT_FOUND', 'No active session. Run open first.'); + if (!session.snapshot) { + throw new AppError('INVALID_ARGS', 'No snapshot in session. Run snapshot first.'); + } + return { sessionName, session, snapshot: session.snapshot }; +} + +export async function captureSelectorSnapshot( + runtime: AgentDeviceRuntime, + options: CommandContext & SelectorSnapshotOptions, + captureOptions: { updateSession: boolean; scope?: string } = { updateSession: true }, +): Promise { + if (!runtime.backend.captureSnapshot) { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot is not supported by this backend'); + } + const sessionName = options.session ?? 'default'; + const session = await runtime.sessions.get(sessionName); + const result = await runtime.backend.captureSnapshot(toBackendContext(runtime, options), { + interactiveOnly: false, + compact: false, + depth: options.depth, + scope: captureOptions.scope ?? options.scope, + raw: options.raw, + }); + const snapshot = + result.snapshot ?? + ({ + nodes: result.nodes ?? [], + truncated: result.truncated, + backend: result.backend as SnapshotState['backend'], + createdAt: now(runtime), + } satisfies SnapshotState); + if (captureOptions.updateSession && session) { + await runtime.sessions.set({ ...session, snapshot }); + } + return { sessionName, session, snapshot }; +} + +export async function readText( + runtime: AgentDeviceRuntime, + capture: CapturedSnapshot, + node: SnapshotNode, +): Promise { + if (runtime.backend.readText) { + const result = await runtime.backend.readText( + toBackendContext(runtime, { + session: capture.sessionName, + }), + node, + ); + if (result.text.trim()) return result.text; + } + return extractReadableText(node); +} + +export function resolveRefNode( + nodes: SnapshotState['nodes'], + refInput: string, + options: { + fallbackLabel: string; + invalidRefMessage: string; + notFoundMessage: string; + }, +): { ref: string; node: SnapshotNode } { + const ref = normalizeRef(refInput); + if (!ref) throw new AppError('INVALID_ARGS', options.invalidRefMessage); + const node = + findNodeByRef(nodes, ref) ?? + (options.fallbackLabel.length > 0 ? findNodeByLabel(nodes, options.fallbackLabel) : null); + if (!node) throw new AppError('COMMAND_FAILED', options.notFoundMessage); + return { ref, node }; +} diff --git a/src/commands/selector-read-utils.ts b/src/commands/selector-read-utils.ts new file mode 100644 index 00000000..254bb54b --- /dev/null +++ b/src/commands/selector-read-utils.ts @@ -0,0 +1,29 @@ +import type { BackendCommandContext } from '../backend.ts'; +import type { AgentDeviceRuntime, CommandContext } from '../runtime.ts'; + +export { findNodeByLabel, resolveRefLabel } from '../utils/snapshot-processing.ts'; + +export function shouldScopeFind(locator: string): boolean { + return locator === 'text' || locator === 'label' || locator === 'any'; +} + +export function toBackendContext( + runtime: Pick, + options: CommandContext, +): BackendCommandContext { + return { + session: options.session, + requestId: options.requestId, + signal: options.signal ?? runtime.signal, + metadata: options.metadata, + }; +} + +export function now(runtime: AgentDeviceRuntime): number { + return runtime.clock?.now() ?? Date.now(); +} + +export async function sleep(runtime: AgentDeviceRuntime, ms: number): Promise { + if (runtime.clock) await runtime.clock.sleep(ms); + else await new Promise((resolve) => setTimeout(resolve, ms)); +} diff --git a/src/commands/selector-read.ts b/src/commands/selector-read.ts new file mode 100644 index 00000000..8c29056b --- /dev/null +++ b/src/commands/selector-read.ts @@ -0,0 +1,493 @@ +import type { FindAction, FindLocator } from '../utils/finders.ts'; +import { findBestMatchesByLocator } from '../utils/finders.ts'; +import type { SnapshotNode } from '../utils/snapshot.ts'; +import { findNodeByRef, normalizeRef } from '../utils/snapshot.ts'; +import type { AgentDeviceRuntime, CommandContext } from '../runtime.ts'; +import { AppError } from '../utils/errors.ts'; +import { + findSelectorChainMatch, + formatSelectorFailure, + parseSelectorChain, + resolveSelectorChain, +} from '../selectors.ts'; +import { buildSelectorChainForNode } from '../utils/selector-build.ts'; +import { evaluateIsPredicate, isSupportedPredicate } from '../utils/selector-is-predicates.ts'; +import type { RuntimeCommand } from './index.ts'; +import { + type CapturedSnapshot, + type SelectorSnapshotOptions, + captureSelectorSnapshot, + readText, + requireSnapshotSession, + resolveRefNode, +} from './selector-read-shared.ts'; +import { + findNodeByLabel, + now, + resolveRefLabel, + shouldScopeFind, + sleep, + toBackendContext, +} from './selector-read-utils.ts'; + +export type { SelectorSnapshotOptions } from './selector-read-shared.ts'; + +export type FindReadCommandOptions = CommandContext & { + locator?: FindLocator; + query: string; + action: Extract; + timeoutMs?: number; +} & SelectorSnapshotOptions; + +export type FindReadCommandResult = + | { kind: 'found'; found: true; waitedMs?: number } + | { kind: 'text'; ref: string; text: string; node: SnapshotNode } + | { kind: 'attrs'; ref: string; node: SnapshotNode }; + +export type SelectorTarget = { + kind: 'selector'; + selector: string; +}; + +export type RefTarget = { + kind: 'ref'; + ref: string; + fallbackLabel?: string; +}; + +export type ElementTarget = SelectorTarget | RefTarget; + +export type ResolvedTarget = + | { + kind: 'selector'; + selector: string; + } + | { + kind: 'ref'; + ref: string; + }; + +export type GetCommandOptions = CommandContext & + SelectorSnapshotOptions & { + property: 'text' | 'attrs'; + target: ElementTarget; + }; + +export type GetCommandResult = + | { + kind: 'text'; + target: ResolvedTarget; + text: string; + node: SnapshotNode; + selectorChain?: string[]; + } + | { + kind: 'attrs'; + target: ResolvedTarget; + node: SnapshotNode; + selectorChain?: string[]; + }; + +export type GetTextCommandOptions = CommandContext & + SelectorSnapshotOptions & { + target: ElementTarget; + }; + +export type GetAttrsCommandOptions = CommandContext & + SelectorSnapshotOptions & { + target: ElementTarget; + }; + +export type IsCommandOptions = CommandContext & + SelectorSnapshotOptions & { + predicate: 'visible' | 'hidden' | 'exists' | 'editable' | 'selected' | 'text'; + selector: string; + expectedText?: string; + }; + +export type IsCommandResult = { + predicate: IsCommandOptions['predicate']; + pass: true; + selector: string; + matches?: number; + text?: string; + selectorChain?: string[]; +}; + +export type WaitCommandOptions = CommandContext & + SelectorSnapshotOptions & { + target: + | { kind: 'sleep'; durationMs: number } + | { kind: 'text'; text: string; timeoutMs?: number | null } + | { kind: 'ref'; ref: string; timeoutMs?: number | null } + | { kind: 'selector'; selector: string; timeoutMs?: number | null }; + }; + +export type WaitCommandResult = + | { kind: 'sleep'; waitedMs: number } + | { kind: 'text'; waitedMs: number; text: string } + | { kind: 'selector'; waitedMs: number; selector: string }; + +export type WaitForTextCommandOptions = CommandContext & + SelectorSnapshotOptions & { + text: string; + timeoutMs?: number | null; + }; + +export type IsSelectorCommandOptions = CommandContext & + SelectorSnapshotOptions & { + target: SelectorTarget; + }; + +export function selector(expression: string): SelectorTarget { + return { kind: 'selector', selector: expression }; +} + +export function ref(refInput: string, options: { fallbackLabel?: string } = {}): RefTarget { + return { + kind: 'ref', + ref: refInput, + ...(options.fallbackLabel ? { fallbackLabel: options.fallbackLabel } : {}), + }; +} + +const DEFAULT_TIMEOUT_MS = 10_000; +const POLL_INTERVAL_MS = 300; + +export const findCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => { + const locator = options.locator ?? 'any'; + if (!options.query) { + throw new AppError('INVALID_ARGS', 'find requires a value'); + } + if (options.action === 'wait') { + return await waitForFindMatch(runtime, options, locator); + } + + const capture = await captureSelectorSnapshot(runtime, options, { + updateSession: true, + scope: shouldScopeFind(locator) ? options.query : undefined, + }); + const match = findBestMatchesByLocator(capture.snapshot.nodes, locator, options.query, { + requireRect: false, + }).matches[0]; + if (!match) { + throw new AppError('COMMAND_FAILED', 'find did not match any element'); + } + + if (options.action === 'exists') return { kind: 'found', found: true }; + const ref = `@${match.ref}`; + if (options.action === 'get_attrs') return { kind: 'attrs', ref, node: match }; + const text = await readText(runtime, capture, match); + return { kind: 'text', ref, text, node: match }; +}; + +export const getCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => { + if (options.target.kind === 'ref') { + const capture = await requireSnapshotSession(runtime, options.session); + const resolved = resolveRefNode(capture.snapshot.nodes, options.target.ref, { + fallbackLabel: options.target.fallbackLabel ?? '', + invalidRefMessage: 'get text requires a ref like @e2', + notFoundMessage: `Ref ${options.target.ref} not found`, + }); + const selectorChain = buildSelectorChainForNode(resolved.node, runtime.backend.platform, { + action: 'get', + }); + const target = { kind: 'ref' as const, ref: `@${resolved.ref}` }; + if (options.property === 'attrs') { + return { kind: 'attrs', target, node: resolved.node, selectorChain }; + } + const text = await readText(runtime, capture, resolved.node); + return { kind: 'text', target, text, node: resolved.node, selectorChain }; + } + + const resolved = await resolveSelectorNode(runtime, options, options.session ?? 'default', { + selector: options.target.selector, + disambiguateAmbiguous: options.property === 'text', + }); + + const selectorChain = buildSelectorChainForNode(resolved.node, runtime.backend.platform, { + action: 'get', + }); + + if (options.property === 'attrs') { + return { + kind: 'attrs', + target: { kind: 'selector', selector: resolved.selector }, + node: resolved.node, + selectorChain, + }; + } + + const text = await readText(runtime, resolved.capture, resolved.node); + return { + kind: 'text', + target: { kind: 'selector', selector: resolved.selector }, + text, + node: resolved.node, + selectorChain, + }; +}; + +export const getTextCommand: RuntimeCommand< + GetTextCommandOptions, + Extract +> = async (runtime, options): Promise> => { + const result = await getCommand(runtime, { + ...options, + property: 'text', + target: options.target, + }); + if (result.kind !== 'text') { + throw new AppError('COMMAND_FAILED', 'getText returned non-text result'); + } + return result; +}; + +export const getAttrsCommand: RuntimeCommand< + GetAttrsCommandOptions, + Extract +> = async (runtime, options): Promise> => { + const result = await getCommand(runtime, { + ...options, + property: 'attrs', + target: options.target, + }); + if (result.kind !== 'attrs') { + throw new AppError('COMMAND_FAILED', 'getAttrs returned non-attrs result'); + } + return result; +}; + +export const isCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => { + if (!isSupportedPredicate(options.predicate)) { + throw new AppError( + 'INVALID_ARGS', + 'is requires predicate: visible|hidden|exists|editable|selected|text', + ); + } + if (options.predicate === 'text' && !options.expectedText) { + throw new AppError('INVALID_ARGS', 'is text requires expected text value'); + } + const capture = await captureSelectorSnapshot(runtime, options, { updateSession: true }); + const chain = parseSelectorChain(options.selector); + + if (options.predicate === 'exists') { + const matched = findSelectorChainMatch(capture.snapshot.nodes, chain, { + platform: runtime.backend.platform, + }); + if (!matched) { + throw new AppError('COMMAND_FAILED', formatSelectorFailure(chain, [], { unique: false })); + } + return { + predicate: options.predicate, + pass: true, + selector: matched.selector.raw, + matches: matched.matches, + selectorChain: chain.selectors.map((entry) => entry.raw), + }; + } + + const resolved = resolveSelectorChain(capture.snapshot.nodes, chain, { + platform: runtime.backend.platform, + requireRect: false, + requireUnique: true, + disambiguateAmbiguous: false, + }); + if (!resolved) { + throw new AppError('COMMAND_FAILED', formatSelectorFailure(chain, [], { unique: true })); + } + const result = evaluateIsPredicate({ + predicate: options.predicate, + node: resolved.node, + nodes: capture.snapshot.nodes, + expectedText: options.expectedText, + platform: runtime.backend.platform, + }); + if (!result.pass) { + throw new AppError( + 'COMMAND_FAILED', + `is ${options.predicate} failed for selector ${resolved.selector.raw}: ${result.details}`, + ); + } + return { + predicate: options.predicate, + pass: true, + selector: resolved.selector.raw, + ...(options.predicate === 'text' ? { text: result.actualText } : {}), + selectorChain: chain.selectors.map((entry) => entry.raw), + }; +}; + +export const isVisibleCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => + await isCommand(runtime, { + ...options, + predicate: 'visible', + selector: options.target.selector, + }); + +export const isHiddenCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => + await isCommand(runtime, { + ...options, + predicate: 'hidden', + selector: options.target.selector, + }); + +export const waitCommand: RuntimeCommand = async ( + runtime, + options, +): Promise => { + if (options.target.kind === 'sleep') { + await sleep(runtime, options.target.durationMs); + return { kind: 'sleep', waitedMs: options.target.durationMs }; + } + if (options.target.kind === 'ref') { + const capture = await requireSnapshotSession(runtime, options.session); + const ref = normalizeRef(options.target.ref); + if (!ref) throw new AppError('INVALID_ARGS', `Invalid ref: ${options.target.ref}`); + const node = findNodeByRef(capture.snapshot.nodes, ref); + const text = node ? resolveRefLabel(node, capture.snapshot.nodes) : undefined; + if (!text) { + throw new AppError('COMMAND_FAILED', `Ref ${options.target.ref} not found or has no label`); + } + return await waitForText(runtime, options, text, options.target.timeoutMs); + } + if (options.target.kind === 'selector') { + return await waitForSelector( + runtime, + options, + options.target.selector, + options.target.timeoutMs, + ); + } + if (!options.target.text) throw new AppError('INVALID_ARGS', 'wait requires text'); + return await waitForText(runtime, options, options.target.text, options.target.timeoutMs); +}; + +export const waitForTextCommand: RuntimeCommand< + WaitForTextCommandOptions, + Extract +> = async (runtime, options): Promise> => { + const result = await waitCommand(runtime, { + ...options, + target: { kind: 'text', text: options.text, timeoutMs: options.timeoutMs }, + }); + if (result.kind !== 'text') { + throw new AppError('COMMAND_FAILED', 'waitForText returned non-text result'); + } + return result; +}; + +async function waitForFindMatch( + runtime: AgentDeviceRuntime, + options: FindReadCommandOptions, + locator: FindLocator, +): Promise { + const timeout = options.timeoutMs ?? DEFAULT_TIMEOUT_MS; + const start = now(runtime); + while (now(runtime) - start < timeout) { + const capture = await captureSelectorSnapshot(runtime, options, { + updateSession: true, + scope: shouldScopeFind(locator) ? options.query : undefined, + }); + const match = findBestMatchesByLocator(capture.snapshot.nodes, locator, options.query, { + requireRect: false, + }).matches[0]; + if (match) return { kind: 'found', found: true, waitedMs: now(runtime) - start }; + await sleep(runtime, POLL_INTERVAL_MS); + } + throw new AppError('COMMAND_FAILED', 'find wait timed out'); +} + +async function waitForSelector( + runtime: AgentDeviceRuntime, + options: WaitCommandOptions, + selectorExpression: string, + timeoutMs: number | null | undefined, +): Promise { + const timeout = timeoutMs ?? DEFAULT_TIMEOUT_MS; + const start = now(runtime); + const chain = parseSelectorChain(selectorExpression); + while (now(runtime) - start < timeout) { + const capture = await captureSelectorSnapshot(runtime, options, { updateSession: true }); + const match = findSelectorChainMatch(capture.snapshot.nodes, chain, { + platform: runtime.backend.platform, + }); + if (match) + return { kind: 'selector', selector: match.selector.raw, waitedMs: now(runtime) - start }; + await sleep(runtime, POLL_INTERVAL_MS); + } + throw new AppError('COMMAND_FAILED', `wait timed out for selector: ${selectorExpression}`); +} + +async function waitForText( + runtime: AgentDeviceRuntime, + options: WaitCommandOptions, + text: string, + timeoutMs: number | null | undefined, +): Promise { + const timeout = timeoutMs ?? DEFAULT_TIMEOUT_MS; + const start = now(runtime); + while (now(runtime) - start < timeout) { + const found = runtime.backend.findText + ? (await runtime.backend.findText(toBackendContext(runtime, options), text)).found + : await snapshotContainsText(runtime, options, text); + if (found) return { kind: 'text', text, waitedMs: now(runtime) - start }; + await sleep(runtime, POLL_INTERVAL_MS); + } + throw new AppError('COMMAND_FAILED', `wait timed out for text: ${text}`); +} + +async function snapshotContainsText( + runtime: AgentDeviceRuntime, + options: WaitCommandOptions, + text: string, +): Promise { + const capture = await captureSelectorSnapshot(runtime, options, { updateSession: true }); + return Boolean(findNodeByLabel(capture.snapshot.nodes, text)); +} + +async function resolveSelectorNode( + runtime: AgentDeviceRuntime, + options: GetCommandOptions, + sessionName: string, + params: { selector: string; disambiguateAmbiguous: boolean }, +): Promise<{ capture: CapturedSnapshot; node: SnapshotNode; selector: string; ref: string }> { + const capture = await captureSelectorSnapshot( + runtime, + { ...options, session: sessionName }, + { + updateSession: true, + }, + ); + const chain = parseSelectorChain(params.selector); + const resolved = resolveSelectorChain(capture.snapshot.nodes, chain, { + platform: runtime.backend.platform, + requireRect: false, + requireUnique: true, + disambiguateAmbiguous: params.disambiguateAmbiguous, + }); + if (!resolved) { + throw new AppError('COMMAND_FAILED', formatSelectorFailure(chain, [], { unique: true })); + } + return { + capture, + node: resolved.node, + selector: resolved.selector.raw, + ref: `@${resolved.node.ref}`, + }; +} diff --git a/src/core/dispatch-series.ts b/src/core/dispatch-series.ts index 45c639aa..92b77a39 100644 --- a/src/core/dispatch-series.ts +++ b/src/core/dispatch-series.ts @@ -1,5 +1,5 @@ -import { AppError } from '../utils/errors.ts'; import type { DeviceInfo } from '../utils/device.ts'; +export { requireIntInRange } from '../utils/validation.ts'; const DETERMINISTIC_JITTER_PATTERN: ReadonlyArray = [ [0, 0], @@ -13,13 +13,6 @@ const DETERMINISTIC_JITTER_PATTERN: ReadonlyArray = [ [-1, -1], ]; -export function requireIntInRange(value: number, name: string, min: number, max: number): number { - if (!Number.isFinite(value) || !Number.isInteger(value) || value < min || value > max) { - throw new AppError('INVALID_ARGS', `${name} must be an integer between ${min} and ${max}`); - } - return value; -} - export function clampIosSwipeDuration(durationMs: number): number { // Keep iOS swipes stable while allowing explicit fast durations for scroll-heavy flows. return Math.min(60, Math.max(16, Math.round(durationMs))); diff --git a/src/daemon/__tests__/request-router-screenshot.test.ts b/src/daemon/__tests__/request-router-screenshot.test.ts index 553e51ad..548abae2 100644 --- a/src/daemon/__tests__/request-router-screenshot.test.ts +++ b/src/daemon/__tests__/request-router-screenshot.test.ts @@ -10,6 +10,7 @@ vi.mock('../../core/dispatch.ts', async (importOriginal) => { import { dispatchCommand } from '../../core/dispatch.ts'; import { createRequestHandler } from '../request-router.ts'; +import { dispatchScreenshotViaRuntime } from '../screenshot-runtime.ts'; import type { SessionState } from '../types.ts'; import { LeaseRegistry } from '../lease-registry.ts'; import { attachRefs } from '../../utils/snapshot.ts'; @@ -94,6 +95,28 @@ test('screenshot resolves relative positional path against request cwd', async ( expect(recordedAction?.positionals).toEqual([path.join(callerCwd, 'evidence/test.png')]); }); +test('default screenshot temp directory is cleaned when capture fails', async () => { + const session = makeSession('default'); + let capturedPath: string | undefined; + mockDispatch.mockImplementation(async (_device, command, positionals) => { + if (command === 'screenshot') capturedPath = positionals[0]; + throw new Error('capture failed'); + }); + + await expect( + dispatchScreenshotViaRuntime({ + session, + sessionName: session.name, + outputPlacement: 'default', + dispatchContext: {}, + }), + ).rejects.toThrow(/capture failed/); + + expect(capturedPath).toBeTruthy(); + expect(path.basename(capturedPath!)).toBe('screenshot.png'); + expect(fs.existsSync(path.dirname(capturedPath!))).toBe(false); +}); + test('router serializes concurrent commands for the same device across sessions', async () => { const sessionStore = makeSessionStore('agent-device-router-screenshot-'); sessionStore.set('session-a', makeSession('session-a')); @@ -260,6 +283,42 @@ test('screenshot keeps absolute positional path unchanged', async () => { expect(recordedAction?.positionals).toEqual([absolutePath]); }); +test('screenshot runtime supplies default output path when none is requested', async () => { + const sessionStore = makeSessionStore('agent-device-router-screenshot-'); + sessionStore.set('default', makeSession('default')); + + let capturedPath: string | undefined; + mockDispatch.mockImplementation(async (_device, command, positionals) => { + if (command === 'screenshot') { + capturedPath = positionals[0]; + } + return {}; + }); + + const handler = createRequestHandler({ + logPath: path.join(os.tmpdir(), 'daemon.log'), + token: 'test-token', + sessionStore, + leaseRegistry: new LeaseRegistry(), + trackDownloadableArtifact: () => 'artifact-id', + }); + + const response = await handler({ + token: 'test-token', + session: 'default', + command: 'screenshot', + positionals: [], + meta: { requestId: 'req-default-screenshot' }, + }); + + expect(response.ok).toBe(true); + expect(capturedPath).toContain('agent-device-screenshot-'); + expect(path.basename(capturedPath ?? '')).toBe('screenshot.png'); + if (response.ok) { + expect(response.data?.path).toBe(capturedPath); + } +}); + test('screenshot resolves --out flag path against request cwd', async () => { const callerCwd = fs.mkdtempSync(path.join(os.tmpdir(), 'agent-device-screenshot-out-cwd-')); const sessionStore = makeSessionStore('agent-device-router-screenshot-'); diff --git a/src/daemon/handlers/__tests__/find.test.ts b/src/daemon/handlers/__tests__/find.test.ts index 29d81f53..373aaa46 100644 --- a/src/daemon/handlers/__tests__/find.test.ts +++ b/src/daemon/handlers/__tests__/find.test.ts @@ -334,6 +334,19 @@ test('handleFindCommands wait bypasses snapshot cache while Android freshness re expect(mockDispatch).toHaveBeenCalledTimes(2); }); +test('handleFindCommands wait reuses rapid selector snapshots', async () => { + const { response } = await runFindClickScenario({ + positionals: ['text', 'Never appears', 'wait', '350'], + nodes: [{ index: 0, depth: 0, type: 'StaticText', label: 'Other text' }], + }); + + expect(response.ok).toBe(false); + if (!response.ok) { + expect(response.error.message).toContain('find wait timed out'); + } + expect(mockDispatch).toHaveBeenCalledTimes(1); +}); + test('handleFindCommands uses helper-backed snapshots for macOS desktop sessions', async () => { await withMockedMacOsHelper( [ diff --git a/src/daemon/handlers/__tests__/interaction.test.ts b/src/daemon/handlers/__tests__/interaction.test.ts index 1fde8e56..5d840bb0 100644 --- a/src/daemon/handlers/__tests__/interaction.test.ts +++ b/src/daemon/handlers/__tests__/interaction.test.ts @@ -266,6 +266,37 @@ test('press coordinates dispatches press and records as press', async () => { expect(session?.actions[0]?.positionals).toEqual(['100', '200']); }); +test('type dispatches through runtime and records as type', async () => { + const sessionStore = makeSessionStore(); + const sessionName = 'default'; + sessionStore.set(sessionName, makeSession(sessionName)); + + mockDispatch.mockResolvedValue({ ok: true, message: 'Typed 5 chars' }); + + const response = await handleInteractionCommands({ + req: { + token: 't', + session: sessionName, + command: 'type', + positionals: ['hello'], + flags: { delayMs: 3 }, + }, + sessionName, + sessionStore, + contextFromFlags, + }); + + expect(response?.ok).toBe(true); + expect(mockDispatch).toHaveBeenCalledTimes(1); + expect(mockDispatch.mock.calls[0]?.[1]).toBe('type'); + expect(mockDispatch.mock.calls[0]?.[2]).toEqual(['hello']); + const context = mockDispatch.mock.calls[0]?.[4] as Record | undefined; + expect(context?.delayMs).toBe(3); + const session = sessionStore.get(sessionName); + expect(session?.actions.at(-1)?.command).toBe('type'); + expect(session?.actions.at(-1)?.positionals).toEqual(['hello']); +}); + test('click rejects macOS desktop surface interactions until helper routing exists', async () => { const sessionStore = makeSessionStore(); const sessionName = 'macos-desktop-click'; @@ -430,7 +461,11 @@ test('press coordinates appends touch-visualization events while recording', asy }; sessionStore.set(sessionName, session); - mockDispatch.mockResolvedValue({ ok: true }); + mockDispatch.mockResolvedValue({ + ok: true, + videoPath: '/tmp/demo.mp4', + artifactUri: 'agent-device://artifacts/demo.mp4', + }); const response = await handleInteractionCommands({ req: { @@ -454,6 +489,13 @@ test('press coordinates appends touch-visualization events while recording', asy expect(recorded?.gestureEvents[0]?.y).toBe(200); expect(recorded?.gestureEvents[0]?.referenceWidth).toBe(402); expect(recorded?.gestureEvents[0]?.referenceHeight).toBe(874); + const actionResult = sessionStore.get(sessionName)?.actions[0]?.result; + expect(actionResult?.videoPath).toBe('/tmp/demo.mp4'); + expect(actionResult?.artifactUri).toBe('agent-device://artifacts/demo.mp4'); + if (response?.ok) { + expect(response.data?.videoPath).toBe('/tmp/demo.mp4'); + expect(response.data?.artifactUri).toBe('agent-device://artifacts/demo.mp4'); + } }); test('press coordinates on Android recording uses physical screen size when no snapshot exists', async () => { @@ -740,6 +782,71 @@ test('press @ref resolves snapshot node and records press action', async () => { expect(Array.isArray(result.selectorChain)).toBe(true); }); +test('press @ref refreshes stale stored refs and syncs the daemon session snapshot', async () => { + const sessionStore = makeSessionStore(); + const sessionName = 'stale-ref-refresh'; + const session = makeSession(sessionName); + session.snapshot = { + nodes: attachRefs([ + { + index: 0, + type: 'XCUIElementTypeButton', + label: 'Continue', + enabled: true, + hittable: true, + }, + ]), + createdAt: Date.now(), + backend: 'xctest', + }; + sessionStore.set(sessionName, session); + + mockDispatch.mockImplementation(async (_device, command) => { + if (command === 'snapshot') { + return { + nodes: [ + { + index: 0, + type: 'XCUIElementTypeButton', + label: 'Continue', + rect: { x: 10, y: 20, width: 100, height: 40 }, + enabled: true, + hittable: true, + }, + ], + backend: 'xctest', + }; + } + return { pressed: true }; + }); + + const response = await handleInteractionCommands({ + req: { + token: 't', + session: sessionName, + command: 'press', + positionals: ['@e1'], + flags: {}, + }, + sessionName, + sessionStore, + contextFromFlags, + }); + + expect(response?.ok).toBe(true); + if (response?.ok) { + expect(response.data?.x).toBe(60); + expect(response.data?.y).toBe(40); + } + expect(mockDispatch.mock.calls.map((call) => call[1])).toEqual(['snapshot', 'press']); + expect(sessionStore.get(sessionName)?.snapshot?.nodes[0]?.rect).toEqual({ + x: 10, + y: 20, + width: 100, + height: 40, + }); +}); + test('press @ref fails when Android tap escapes to launcher', async () => { const sessionStore = makeSessionStore(); const sessionName = 'android-escape'; @@ -783,6 +890,7 @@ test('press @ref fails when Android tap escapes to launcher', async () => { code: 'COMMAND_FAILED', message: expect.stringContaining('tap likely escaped the app'), }); + expect(sessionStore.get(sessionName)?.actions).toEqual([]); }); test('press @ref fails when Android tap escapes to Settings', async () => { @@ -1013,6 +1121,43 @@ test('fill @ref preserves fallback coordinates for recording when platform resul expect(event?.y).toBe(40); }); +test('fill coordinates dispatches point fill and records the action', async () => { + const sessionStore = makeSessionStore(); + const sessionName = 'default'; + sessionStore.set(sessionName, makeSession(sessionName)); + + mockDispatch.mockResolvedValue({ filled: true }); + + const response = await handleInteractionCommands({ + req: { + token: 't', + session: sessionName, + command: 'fill', + positionals: ['25', '75', 'hello world'], + flags: { delayMs: 40 }, + }, + sessionName, + sessionStore, + contextFromFlags, + }); + + expect(response).toBeTruthy(); + expect(response?.ok).toBe(true); + if (response?.ok) { + expect(response.data?.filled).toBe(true); + expect(response.data?.x).toBe(25); + expect(response.data?.y).toBe(75); + expect(response.data?.text).toBe('hello world'); + } + expect(mockDispatch).toHaveBeenCalledTimes(1); + expect(mockDispatch.mock.calls[0]?.[1]).toBe('fill'); + expect(mockDispatch.mock.calls[0]?.[2]).toEqual(['25', '75', 'hello world']); + expect((mockDispatch.mock.calls[0]?.[4] as Record | undefined)?.delayMs).toBe( + 40, + ); + expect(sessionStore.get(sessionName)?.actions.length).toBe(1); +}); + test('fill @ref keeps the original editable node when its parent is the hittable ancestor', async () => { const sessionStore = makeSessionStore(); const sessionName = 'default'; @@ -1574,6 +1719,62 @@ test('is visible captures one snapshot before evaluating selector predicate', as } }); +test('is visible preserves CLI snapshot flags during runtime snapshot capture', async () => { + const sessionStore = makeSessionStore(); + const sessionName = 'snapshot-flags'; + sessionStore.set(sessionName, makeSession(sessionName)); + + mockDispatch.mockImplementation(async (_device, command) => { + if (command !== 'snapshot') throw new Error(`unexpected command: ${command}`); + return { + nodes: [ + { + index: 0, + depth: 0, + type: 'XCUIElementTypeWindow', + label: 'Login', + rect: { x: 0, y: 0, width: 390, height: 844 }, + }, + { + index: 1, + depth: 1, + parentIndex: 0, + type: 'XCUIElementTypeButton', + label: 'Continue', + identifier: 'auth_continue', + rect: { x: 10, y: 20, width: 100, height: 40 }, + enabled: true, + hittable: true, + visible: true, + }, + ], + backend: 'xctest', + }; + }); + + const response = await handleInteractionCommands({ + req: { + token: 't', + session: sessionName, + command: 'is', + positionals: ['visible', 'id=auth_continue'], + flags: { snapshotDepth: 2, snapshotScope: 'Login', snapshotRaw: true }, + }, + sessionName, + sessionStore, + contextFromFlags, + }); + + expect(response?.ok).toBe(true); + expect(mockDispatch.mock.calls[0]?.[4]).toMatchObject({ + snapshotDepth: 2, + snapshotScope: 'Login', + snapshotRaw: true, + snapshotInteractiveOnly: false, + snapshotCompact: false, + }); +}); + test('is visible passes for list text that inherits viewport visibility from an ancestor', async () => { const sessionStore = makeSessionStore(); const sessionName = 'visible-list-item'; diff --git a/src/daemon/handlers/find.ts b/src/daemon/handlers/find.ts index c81dd1ea..aafbdceb 100644 --- a/src/daemon/handlers/find.ts +++ b/src/daemon/handlers/find.ts @@ -10,6 +10,7 @@ import { readTextForNode } from './interaction-read.ts'; import { captureSnapshot } from './snapshot-capture.ts'; import { errorResponse } from './response.ts'; import { getActiveAndroidSnapshotFreshness } from '../android-snapshot-freshness.ts'; +import { dispatchFindReadOnlyViaRuntime } from '../selector-runtime.ts'; export { parseFindArgs } from '../../utils/finders.ts'; @@ -56,6 +57,13 @@ export async function handleFindCommands(params: { if (req.flags?.findFirst && req.flags?.findLast) { return errorResponse('INVALID_ARGS', 'find accepts only one of --first or --last'); } + const runtimeResponse = await dispatchFindReadOnlyViaRuntime({ + req, + sessionName, + logPath, + sessionStore, + }); + if (runtimeResponse) return runtimeResponse; const session = sessionStore.get(sessionName); const isReadOnly = action === 'exists' || action === 'wait' || action === 'get_text' || action === 'get_attrs'; diff --git a/src/daemon/handlers/interaction-android-escape.ts b/src/daemon/handlers/interaction-android-escape.ts new file mode 100644 index 00000000..14d73ff6 --- /dev/null +++ b/src/daemon/handlers/interaction-android-escape.ts @@ -0,0 +1,43 @@ +import { getAndroidAppState } from '../../platforms/android/index.ts'; +import { AppError } from '../../utils/errors.ts'; +import type { SessionState } from '../types.ts'; + +export async function assertAndroidPressStayedInApp( + session: SessionState, + targetLabel: string, +): Promise { + if (session.device.platform !== 'android' || !session.appBundleId) return; + + const foreground = await getAndroidAppState(session.device); + const foregroundPackage = foreground.package?.trim(); + if (!foregroundPackage || foregroundPackage === session.appBundleId) return; + if (!looksLikeAndroidEscapeSurface(foregroundPackage)) return; + + throw new AppError( + 'COMMAND_FAILED', + `press ${targetLabel} left ${session.appBundleId} and foregrounded ${foregroundPackage}. The tap likely escaped the app.`, + { + expectedPackage: session.appBundleId, + foregroundPackage, + activity: foreground.activity, + hint: 'Use screenshot as visual truth, then take a fresh snapshot -i before retrying.', + }, + ); +} + +export function isAndroidEscapeError(error: AppError): boolean { + return ( + error.code === 'COMMAND_FAILED' && + typeof error.details?.expectedPackage === 'string' && + typeof error.details?.foregroundPackage === 'string' + ); +} + +function looksLikeAndroidEscapeSurface(packageName: string): boolean { + return ( + packageName === 'com.android.settings' || + packageName === 'com.android.systemui' || + packageName === 'com.google.android.permissioncontroller' || + packageName.includes('launcher') + ); +} diff --git a/src/daemon/handlers/interaction-common.ts b/src/daemon/handlers/interaction-common.ts index fe5f7235..428a4057 100644 --- a/src/daemon/handlers/interaction-common.ts +++ b/src/daemon/handlers/interaction-common.ts @@ -18,6 +18,7 @@ export type ContextFromFlags = ( export type InteractionHandlerParams = { req: DaemonRequest; sessionName: string; + logPath?: string; sessionStore: SessionStore; contextFromFlags: ContextFromFlags; }; @@ -148,7 +149,7 @@ async function dispatchInteractionCommand(params: { return { data, actionStartedAt, actionFinishedAt }; } -function finalizeTouchInteraction(params: { +export function finalizeTouchInteraction(params: { session: SessionState; sessionStore: SessionStore; command: string; diff --git a/src/daemon/handlers/interaction-fill.ts b/src/daemon/handlers/interaction-fill.ts deleted file mode 100644 index 170af4a8..00000000 --- a/src/daemon/handlers/interaction-fill.ts +++ /dev/null @@ -1,219 +0,0 @@ -import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; -import { centerOfRect } from '../../utils/snapshot.ts'; -import type { DaemonRequest, DaemonResponse } from '../types.ts'; -import { - buildSelectorChainForNode, - formatSelectorFailure, - parseSelectorChain, - resolveSelectorChain, - splitSelectorFromArgs, -} from '../selectors.ts'; -import { withDiagnosticTimer } from '../../utils/diagnostics.ts'; -import type { SessionStore } from '../session-store.ts'; -import { isFillableType, resolveRefLabel } from '../snapshot-processing.ts'; -import { - buildTouchVisualizationResult, - dispatchRecordedTouchInteraction, - type ContextFromFlags, -} from './interaction-common.ts'; -import { type CaptureSnapshotForSession } from './interaction-snapshot.ts'; -import { readSnapshotNodesReferenceFrame } from './interaction-touch-reference-frame.ts'; -import { resolveRefTargetWithRectRefresh, type ResolveRefTarget } from './interaction-targeting.ts'; -import { unsupportedMacOsDesktopSurfaceInteraction } from './interaction-touch-policy.ts'; -import type { RefSnapshotFlagGuardResponse } from './interaction-flags.ts'; -import { errorResponse } from './response.ts'; - -export async function handleFillCommand(params: { - req: DaemonRequest; - sessionName: string; - sessionStore: SessionStore; - contextFromFlags: ContextFromFlags; - captureSnapshotForSession: CaptureSnapshotForSession; - resolveRefTarget: ResolveRefTarget; - refSnapshotFlagGuardResponse: RefSnapshotFlagGuardResponse; -}): Promise { - const { - req, - sessionName, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - resolveRefTarget, - refSnapshotFlagGuardResponse, - } = params; - const session = sessionStore.get(sessionName); - - if (session) { - const unsupportedSurfaceResponse = unsupportedMacOsDesktopSurfaceInteraction(session, 'fill'); - if (unsupportedSurfaceResponse) { - return unsupportedSurfaceResponse; - } - } - if (session && !isCommandSupportedOnDevice('fill', session.device)) { - return errorResponse('UNSUPPORTED_OPERATION', 'fill is not supported on this device'); - } - if (req.positionals?.[0]?.startsWith('@')) { - if (!session) { - return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); - } - const invalidRefFlagsResponse = refSnapshotFlagGuardResponse('fill', req.flags); - if (invalidRefFlagsResponse) return invalidRefFlagsResponse; - - const labelCandidate = req.positionals.length >= 3 ? req.positionals[1] : ''; - const text = - req.positionals.length >= 3 - ? req.positionals.slice(2).join(' ') - : req.positionals.slice(1).join(' '); - if (!text) { - return errorResponse('INVALID_ARGS', 'fill requires text after ref'); - } - - const resolvedRefFillTarget = await resolveRefTargetWithRectRefresh({ - session, - refInput: req.positionals[0], - fallbackLabel: labelCandidate, - commandLabel: 'fill', - promoteToHittableAncestor: false, - invalidRefMessage: 'fill requires a ref like @e2', - missingBoundsMessage: `Ref ${req.positionals[0]} not found or has no bounds`, - invalidBoundsMessage: `Ref ${req.positionals[0]} not found or has invalid bounds`, - reqFlags: req.flags, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - resolveRefTarget, - }); - if (!resolvedRefFillTarget.ok) return resolvedRefFillTarget; - - const { ref, node, snapshotNodes, point } = resolvedRefFillTarget.target; - const nodeType = node.type ?? ''; - const fillWarning = - nodeType && !isFillableType(nodeType, session.device.platform) - ? `fill target ${req.positionals[0]} resolved to "${nodeType}", attempting fill anyway.` - : undefined; - const refLabel = resolveRefLabel(node, snapshotNodes); - const selectorChain = buildSelectorChainForNode(node, session.device.platform, { - action: 'fill', - }); - const { x, y } = point; - return dispatchRecordedTouchInteraction({ - session, - sessionStore, - requestCommand: req.command, - requestPositionals: req.positionals ?? [], - flags: req.flags, - contextFromFlags, - interactionCommand: 'fill', - interactionPositionals: [String(x), String(y), text], - outPath: req.flags?.out, - buildPayloads: (data) => { - const result = buildTouchVisualizationResult({ - data, - fallbackX: x, - fallbackY: y, - referenceFrame: readSnapshotNodesReferenceFrame(snapshotNodes), - extra: { - ref, - refLabel, - selectorChain, - text, - }, - }); - const responseData: Record = { - ...(data ?? { ref, x, y }), - }; - if (fillWarning) { - result.warning = fillWarning; - responseData.warning = fillWarning; - } - return { result, responseData }; - }, - }); - } - - if (!session) { - return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); - } - - const selectorArgs = splitSelectorFromArgs(req.positionals ?? [], { - preferTrailingValue: true, - }); - if (selectorArgs) { - if (selectorArgs.rest.length === 0) { - return errorResponse('INVALID_ARGS', 'fill requires text after selector'); - } - const text = selectorArgs.rest.join(' ').trim(); - if (!text) { - return errorResponse('INVALID_ARGS', 'fill requires text after selector'); - } - - const chain = parseSelectorChain(selectorArgs.selectorExpression); - const snapshot = await captureSnapshotForSession( - session, - req.flags, - sessionStore, - contextFromFlags, - { interactiveOnly: true }, - ); - const resolved = await withDiagnosticTimer( - 'selector_resolve', - () => - resolveSelectorChain(snapshot.nodes, chain, { - platform: session.device.platform, - requireRect: true, - requireUnique: true, - disambiguateAmbiguous: true, - }), - { command: req.command }, - ); - if (!resolved || !resolved.node.rect) { - return errorResponse( - 'COMMAND_FAILED', - formatSelectorFailure(chain, resolved?.diagnostics ?? [], { unique: true }), - ); - } - - const node = resolved.node; - const rect = resolved.node.rect; - const nodeType = node.type ?? ''; - const fillWarning = - nodeType && !isFillableType(nodeType, session.device.platform) - ? `fill target ${resolved.selector.raw} resolved to "${nodeType}", attempting fill anyway.` - : undefined; - const { x, y } = centerOfRect(rect); - const selectorChain = buildSelectorChainForNode(node, session.device.platform, { - action: 'fill', - }); - return dispatchRecordedTouchInteraction({ - session, - sessionStore, - requestCommand: req.command, - requestPositionals: req.positionals ?? [], - flags: req.flags, - contextFromFlags, - interactionCommand: 'fill', - interactionPositionals: [String(x), String(y), text], - outPath: req.flags?.out, - buildPayloads: (data) => { - const result = buildTouchVisualizationResult({ - data, - fallbackX: x, - fallbackY: y, - referenceFrame: readSnapshotNodesReferenceFrame(snapshot.nodes), - extra: { - text, - selector: resolved.selector.raw, - selectorChain, - refLabel: resolveRefLabel(node, snapshot.nodes), - }, - }); - if (fillWarning) { - result.warning = fillWarning; - } - return { result, responseData: result }; - }, - }); - } - - return errorResponse('INVALID_ARGS', 'fill requires x y text, @ref text, or selector text'); -} diff --git a/src/daemon/handlers/interaction-press.ts b/src/daemon/handlers/interaction-press.ts deleted file mode 100644 index e40c64ce..00000000 --- a/src/daemon/handlers/interaction-press.ts +++ /dev/null @@ -1,308 +0,0 @@ -import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; -import { - buttonTag, - getClickButtonValidationError, - resolveClickButton, -} from '../../core/click-button.ts'; -import type { DaemonRequest, DaemonResponse } from '../types.ts'; -import { - buildSelectorChainForNode, - formatSelectorFailure, - parseSelectorChain, - resolveSelectorChain, -} from '../selectors.ts'; -import { withDiagnosticTimer } from '../../utils/diagnostics.ts'; -import { - buildTouchVisualizationResult, - dispatchRecordedTouchInteraction, - type ContextFromFlags, -} from './interaction-common.ts'; -import type { SessionStore } from '../session-store.ts'; -import { - parseCoordinateTarget, - resolveActionableTouchNode, - resolveRectCenter, - resolveRefTargetWithRectRefresh, - type ResolveRefTarget, -} from './interaction-targeting.ts'; -import { type CaptureSnapshotForSession } from './interaction-snapshot.ts'; -import { - readSnapshotNodesReferenceFrame, - resolveDirectTouchReferenceFrameSafely, -} from './interaction-touch-reference-frame.ts'; -import { unsupportedMacOsDesktopSurfaceInteraction } from './interaction-touch-policy.ts'; -import type { RefSnapshotFlagGuardResponse } from './interaction-flags.ts'; -import { resolveRefLabel } from '../snapshot-processing.ts'; -import { errorResponse } from './response.ts'; -import { AppError } from '../../utils/errors.ts'; -import { getAndroidAppState } from '../../platforms/android/index.ts'; - -export async function handlePressCommand(params: { - req: DaemonRequest; - sessionName: string; - sessionStore: SessionStore; - contextFromFlags: ContextFromFlags; - captureSnapshotForSession: CaptureSnapshotForSession; - resolveRefTarget: ResolveRefTarget; - refSnapshotFlagGuardResponse: RefSnapshotFlagGuardResponse; -}): Promise { - const { - req, - sessionName, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - resolveRefTarget, - refSnapshotFlagGuardResponse, - } = params; - const session = sessionStore.get(sessionName); - const command = req.command; - const commandLabel = command === 'click' ? 'click' : 'press'; - if (!session) { - return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); - } - - const unsupportedSurfaceResponse = unsupportedMacOsDesktopSurfaceInteraction( - session, - commandLabel, - ); - if (unsupportedSurfaceResponse) { - return unsupportedSurfaceResponse; - } - if (!isCommandSupportedOnDevice('press', session.device)) { - return errorResponse('UNSUPPORTED_OPERATION', 'press is not supported on this device'); - } - - const clickButton = resolveClickButton(req.flags); - const resultButtonTag = buttonTag(clickButton); - if (clickButton !== 'primary') { - const validationError = getClickButtonValidationError({ - commandLabel, - platform: session.device.platform, - button: clickButton, - count: req.flags?.count, - intervalMs: req.flags?.intervalMs, - holdMs: req.flags?.holdMs, - jitterPx: req.flags?.jitterPx, - doubleTap: req.flags?.doubleTap, - }); - if (validationError) { - return errorResponse(validationError.code, validationError.message, validationError.details); - } - } - - const directCoordinates = parseCoordinateTarget(req.positionals ?? []); - if (directCoordinates) { - return dispatchRecordedTouchInteraction({ - session, - sessionStore, - requestCommand: command, - requestPositionals: req.positionals ?? [ - String(directCoordinates.x), - String(directCoordinates.y), - ], - flags: req.flags, - contextFromFlags, - interactionCommand: 'press', - interactionPositionals: [String(directCoordinates.x), String(directCoordinates.y)], - outPath: req.flags?.out, - afterDispatch: async () => { - await assertAndroidPressStayedInApp(session, 'coordinate tap'); - }, - buildPayloads: async (data) => { - const visualizationFrame = await resolveDirectTouchReferenceFrameSafely({ - session, - flags: req.flags, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - }); - const result = buildTouchVisualizationResult({ - data, - fallbackX: directCoordinates.x, - fallbackY: directCoordinates.y, - referenceFrame: visualizationFrame, - extra: resultButtonTag, - }); - return { result, responseData: result }; - }, - }); - } - - const selectorAction = 'click'; - const refInput = req.positionals?.[0] ?? ''; - if (refInput.startsWith('@')) { - const invalidRefFlagsResponse = refSnapshotFlagGuardResponse('press', req.flags); - if (invalidRefFlagsResponse) return invalidRefFlagsResponse; - const fallbackLabel = - req.positionals.length > 1 ? req.positionals.slice(1).join(' ').trim() : ''; - const resolvedRefPressTarget = await resolveRefTargetWithRectRefresh({ - session, - refInput, - fallbackLabel, - commandLabel, - promoteToHittableAncestor: true, - invalidRefMessage: `${commandLabel} requires a ref like @e2`, - missingBoundsMessage: `Ref ${refInput} not found or has no bounds`, - invalidBoundsMessage: `Ref ${refInput} not found or has invalid bounds`, - reqFlags: req.flags, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - resolveRefTarget, - }); - if (!resolvedRefPressTarget.ok) return resolvedRefPressTarget; - - const { ref, node, snapshotNodes, point: pressPoint } = resolvedRefPressTarget.target; - const refLabel = resolveRefLabel(node, snapshotNodes); - const selectorChain = buildSelectorChainForNode(node, session.device.platform, { - action: selectorAction, - }); - const { x, y } = pressPoint; - return dispatchRecordedTouchInteraction({ - session, - sessionStore, - requestCommand: command, - requestPositionals: req.positionals ?? [], - flags: req.flags, - contextFromFlags, - interactionCommand: 'press', - interactionPositionals: [String(x), String(y)], - outPath: req.flags?.out, - afterDispatch: async () => { - await assertAndroidPressStayedInApp(session, `@${ref}`); - }, - buildPayloads: (data) => { - const result = buildTouchVisualizationResult({ - data, - fallbackX: x, - fallbackY: y, - referenceFrame: readSnapshotNodesReferenceFrame(snapshotNodes), - extra: { - ref, - refLabel, - selectorChain, - ...resultButtonTag, - }, - }); - return { result, responseData: result }; - }, - }); - } - - const selectorExpression = (req.positionals ?? []).join(' ').trim(); - if (!selectorExpression) { - return errorResponse( - 'INVALID_ARGS', - `${commandLabel} requires @ref, selector expression, or x y coordinates`, - ); - } - - const chain = parseSelectorChain(selectorExpression); - const snapshot = await captureSnapshotForSession( - session, - req.flags, - sessionStore, - contextFromFlags, - { interactiveOnly: true }, - ); - const resolved = await withDiagnosticTimer( - 'selector_resolve', - () => - resolveSelectorChain(snapshot.nodes, chain, { - platform: session.device.platform, - requireRect: true, - requireUnique: true, - disambiguateAmbiguous: true, - }), - { command }, - ); - if (!resolved || !resolved.node.rect) { - return errorResponse( - 'COMMAND_FAILED', - formatSelectorFailure(chain, resolved?.diagnostics ?? [], { unique: true }), - ); - } - - const actionableNode = resolveActionableTouchNode(snapshot.nodes, resolved.node); - const pressPoint = resolveRectCenter(actionableNode.rect); - if (!pressPoint) { - return errorResponse( - 'COMMAND_FAILED', - `Selector ${resolved.selector.raw} resolved to invalid bounds`, - ); - } - - const { x, y } = pressPoint; - const selectorChain = buildSelectorChainForNode(actionableNode, session.device.platform, { - action: selectorAction, - }); - const refLabel = resolveRefLabel(actionableNode, snapshot.nodes); - return dispatchRecordedTouchInteraction({ - session, - sessionStore, - requestCommand: command, - requestPositionals: req.positionals ?? [], - flags: req.flags, - contextFromFlags, - interactionCommand: 'press', - interactionPositionals: [String(x), String(y)], - outPath: req.flags?.out, - afterDispatch: async () => { - await assertAndroidPressStayedInApp(session, resolved.selector.raw); - }, - buildPayloads: (data) => { - const result = buildTouchVisualizationResult({ - data, - fallbackX: x, - fallbackY: y, - referenceFrame: readSnapshotNodesReferenceFrame(snapshot.nodes), - extra: { - selector: resolved.selector.raw, - selectorChain, - refLabel, - ...resultButtonTag, - }, - }); - return { result, responseData: result }; - }, - }); -} - -async function assertAndroidPressStayedInApp( - session: Parameters[0]['session'], - targetLabel: string, -): Promise { - if (session.device.platform !== 'android' || !session.appBundleId) { - return; - } - - const foreground = await getAndroidAppState(session.device); - const foregroundPackage = foreground.package?.trim(); - if (!foregroundPackage || foregroundPackage === session.appBundleId) { - return; - } - if (!looksLikeAndroidEscapeSurface(foregroundPackage)) { - return; - } - - throw new AppError( - 'COMMAND_FAILED', - `press ${targetLabel} left ${session.appBundleId} and foregrounded ${foregroundPackage}. The tap likely escaped the app.`, - { - expectedPackage: session.appBundleId, - foregroundPackage, - activity: foreground.activity, - hint: 'Use screenshot as visual truth, then take a fresh snapshot -i before retrying.', - }, - ); -} - -function looksLikeAndroidEscapeSurface(packageName: string): boolean { - return ( - packageName === 'com.android.settings' || - packageName === 'com.android.systemui' || - packageName === 'com.google.android.permissioncontroller' || - packageName.includes('launcher') - ); -} diff --git a/src/daemon/handlers/interaction-runtime.ts b/src/daemon/handlers/interaction-runtime.ts new file mode 100644 index 00000000..40c8eaac --- /dev/null +++ b/src/daemon/handlers/interaction-runtime.ts @@ -0,0 +1,115 @@ +import { dispatchCommand } from '../../core/dispatch.ts'; +import type { + AgentDeviceBackend, + BackendActionResult, + BackendSnapshotResult, +} from '../../backend.ts'; +import { createAgentDevice, localCommandPolicy } from '../../runtime.ts'; +import { AppError } from '../../utils/errors.ts'; +import type { SessionState } from '../types.ts'; +import type { InteractionHandlerParams } from './interaction-common.ts'; +import type { CaptureSnapshotForSession } from './interaction-snapshot.ts'; + +export function createInteractionRuntime( + params: InteractionHandlerParams & { + captureSnapshotForSession: CaptureSnapshotForSession; + }, +) { + const session = params.sessionStore.get(params.sessionName); + if (!session) throw new AppError('SESSION_NOT_FOUND', 'No active session. Run open first.'); + return createAgentDevice({ + backend: createInteractionBackend({ ...params, session }), + artifacts: { + resolveInput: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'interaction commands do not resolve input artifacts', + ); + }, + reserveOutput: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'interaction commands do not reserve output artifacts', + ); + }, + createTempFile: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'interaction commands do not create temporary files', + ); + }, + }, + sessions: { + get: (name) => + name === params.sessionName + ? { + name: params.sessionName, + appBundleId: session.appBundleId, + appName: session.appName, + snapshot: session.snapshot, + metadata: { surface: session.surface }, + } + : undefined, + set: (record) => { + if (!record.snapshot) return; + session.snapshot = record.snapshot; + params.sessionStore.set(params.sessionName, session); + }, + }, + policy: localCommandPolicy(), + }); +} + +function createInteractionBackend( + params: InteractionHandlerParams & { session: SessionState } & { + captureSnapshotForSession: CaptureSnapshotForSession; + }, +): AgentDeviceBackend { + const { req, session } = params; + return { + platform: session.device.platform, + captureSnapshot: async (_context, options): Promise => ({ + snapshot: await params.captureSnapshotForSession( + session, + req.flags, + params.sessionStore, + params.contextFromFlags, + { interactiveOnly: options?.interactiveOnly === true }, + ), + }), + tap: async (_context, point): Promise => + toBackendActionResult( + await dispatchCommand( + session.device, + 'press', + [String(point.x), String(point.y)], + req.flags?.out, + params.contextFromFlags(req.flags, session.appBundleId, session.trace?.outPath), + ), + ), + fill: async (_context, point, text): Promise => + toBackendActionResult( + await dispatchCommand( + session.device, + 'fill', + [String(point.x), String(point.y), text], + req.flags?.out, + params.contextFromFlags(req.flags, session.appBundleId, session.trace?.outPath), + ), + ), + typeText: async (_context, text): Promise => + toBackendActionResult( + await dispatchCommand( + session.device, + 'type', + [text], + req.flags?.out, + params.contextFromFlags(req.flags, session.appBundleId, session.trace?.outPath), + ), + ), + }; +} + +function toBackendActionResult(data: unknown): BackendActionResult { + return data && typeof data === 'object' ? (data as Record) : undefined; +} diff --git a/src/daemon/handlers/interaction-targeting.ts b/src/daemon/handlers/interaction-targeting.ts index 28a2de73..04cd9f13 100644 --- a/src/daemon/handlers/interaction-targeting.ts +++ b/src/daemon/handlers/interaction-targeting.ts @@ -1,4 +1,3 @@ -import type { CommandFlags } from '../../core/dispatch.ts'; import { centerOfRect, findNodeByRef, @@ -6,19 +5,9 @@ import { type Rect, type SnapshotNode, } from '../../utils/snapshot.ts'; -import { findNearestHittableAncestor, findNodeByLabel } from '../snapshot-processing.ts'; -import type { SessionStore } from '../session-store.ts'; +import { findNodeByLabel } from '../snapshot-processing.ts'; import type { SessionState } from '../types.ts'; import { errorResponse, type DaemonFailureResponse } from './response.ts'; -import type { CaptureSnapshotForSession } from './interaction-snapshot.ts'; -import type { ContextFromFlags } from './interaction-common.ts'; -import { - isNodeVisibleInEffectiveViewport, - resolveEffectiveViewportRect, -} from '../../utils/mobile-snapshot-semantics.ts'; -import { containsPoint, pickLargestRect } from '../../utils/rect-visibility.ts'; - -export type ResolveRefTarget = typeof resolveRefTarget; export function parseCoordinateTarget(positionals: string[]): { x: number; y: number } | null { if (positionals.length < 2) return null; @@ -82,238 +71,3 @@ function normalizeRect(rect: Rect | undefined): Rect | null { if (width < 0 || height < 0) return null; return { x, y, width, height }; } - -export async function resolveRefTargetWithRectRefresh(params: { - session: SessionState; - refInput: string; - fallbackLabel: string; - commandLabel: string; - promoteToHittableAncestor: boolean; - invalidRefMessage: string; - missingBoundsMessage: string; - invalidBoundsMessage: string; - reqFlags: CommandFlags | undefined; - sessionStore: SessionStore; - contextFromFlags: ContextFromFlags; - captureSnapshotForSession: CaptureSnapshotForSession; - resolveRefTarget: ResolveRefTarget; -}): Promise< - | { - ok: true; - target: { - ref: string; - node: SnapshotNode; - snapshotNodes: SnapshotNode[]; - point: { x: number; y: number }; - }; - } - | DaemonFailureResponse -> { - const { - session, - refInput, - fallbackLabel, - commandLabel, - promoteToHittableAncestor, - invalidRefMessage, - missingBoundsMessage, - invalidBoundsMessage, - reqFlags, - sessionStore, - contextFromFlags, - captureSnapshotForSession, - resolveRefTarget, - } = params; - const resolvedRefTarget = resolveRefTarget({ - session, - refInput, - fallbackLabel, - requireRect: true, - invalidRefMessage, - notFoundMessage: missingBoundsMessage, - }); - if (!resolvedRefTarget.ok) return resolvedRefTarget; - - const { ref } = resolvedRefTarget.target; - let node = promoteToHittableAncestor - ? resolveActionableTouchNode( - resolvedRefTarget.target.snapshotNodes, - resolvedRefTarget.target.node, - ) - : resolvedRefTarget.target.node; - let snapshotNodes = resolvedRefTarget.target.snapshotNodes; - let point = resolveRectCenter(node.rect); - - if (!point) { - const refreshed = await captureSnapshotForSession( - session, - reqFlags, - sessionStore, - contextFromFlags, - { interactiveOnly: true }, - ); - const refNode = findNodeByRef(refreshed.nodes, ref); - const fallbackNode = - fallbackLabel.length > 0 ? findNodeByLabel(refreshed.nodes, fallbackLabel) : null; - const resolvedRefNode = - refNode && promoteToHittableAncestor - ? resolveActionableTouchNode(refreshed.nodes, refNode) - : refNode; - const resolvedFallbackNode = - fallbackNode && promoteToHittableAncestor - ? resolveActionableTouchNode(refreshed.nodes, fallbackNode) - : fallbackNode; - const fallbackNodePoint = resolveRectCenter(resolvedFallbackNode?.rect); - const refNodePoint = resolveRectCenter(resolvedRefNode?.rect); - const refreshedNode = refNodePoint - ? resolvedRefNode - : fallbackNodePoint - ? resolvedFallbackNode - : (resolvedRefNode ?? resolvedFallbackNode); - const refreshedPoint = resolveRectCenter(refreshedNode?.rect); - if (refreshedNode && refreshedPoint) { - node = refreshedNode; - snapshotNodes = refreshed.nodes; - point = refreshedPoint; - } - } - - if (!point) { - return errorResponse('COMMAND_FAILED', invalidBoundsMessage); - } - - const viewport = node.rect ? resolveEffectiveViewportRect(node, snapshotNodes) : null; - if (node.rect && viewport && !isNodeVisibleInEffectiveViewport(node, snapshotNodes)) { - return { - ok: false, - error: { - code: 'COMMAND_FAILED', - message: `Ref ${refInput} is off-screen and not safe to ${commandLabel}`, - hint: `Use scroll with the direction from the off-screen summary, take a fresh snapshot, then retry ${commandLabel} with the new ref or a selector.`, - details: { - reason: 'offscreen_ref', - ref, - rect: node.rect, - viewport, - }, - }, - }; - } - - return { ok: true, target: { ref, node, snapshotNodes, point } }; -} - -export function resolveActionableTouchNode( - nodes: SnapshotNode[], - node: SnapshotNode, -): SnapshotNode { - const descendant = findPreferredActionableDescendant(nodes, node); - if (descendant?.rect && resolveRectCenter(descendant.rect)) { - return descendant; - } - const ancestor = findNearestHittableAncestor(nodes, node); - if (ancestor?.rect && resolveRectCenter(ancestor.rect)) { - if (isOverlyBroadAncestor(node, ancestor, nodes)) { - return node; - } - return ancestor; - } - return node; -} - -function findPreferredActionableDescendant( - nodes: SnapshotNode[], - node: SnapshotNode, -): SnapshotNode | null { - const targetRect = normalizeRect(node.rect); - if (!targetRect) return null; - - let current = node; - const visited = new Set(); - while (!visited.has(current.ref)) { - visited.add(current.ref); - const sameRectChildren = nodes.filter((candidate) => { - if (candidate.parentIndex !== current.index || !candidate.hittable) { - return false; - } - const candidateRect = normalizeRect(candidate.rect); - return candidateRect ? areRectsApproximatelyEqual(candidateRect, targetRect) : false; - }); - if (sameRectChildren.length !== 1) { - break; - } - current = sameRectChildren[0]; - } - - return current === node ? null : current; -} - -function areRectsApproximatelyEqual(left: Rect, right: Rect): boolean { - // 0.5 px tolerance absorbs sub-pixel rounding differences that are common in - // accessibility tree coordinates across iOS and Android DPI scales. - const tolerance = 0.5; - return ( - Math.abs(left.x - right.x) <= tolerance && - Math.abs(left.y - right.y) <= tolerance && - Math.abs(left.width - right.width) <= tolerance && - Math.abs(left.height - right.height) <= tolerance - ); -} - -function isOverlyBroadAncestor( - node: SnapshotNode, - ancestor: SnapshotNode, - nodes: SnapshotNode[], -): boolean { - const nodeRect = normalizeRect(node.rect); - const ancestorRect = normalizeRect(ancestor.rect); - if (!nodeRect || !ancestorRect) return false; - const rootViewportRect = resolveRootViewportRect(nodes, nodeRect); - if (!rootViewportRect) return false; - if (!isRectViewportSized(ancestorRect, rootViewportRect)) return false; - return !areRectsApproximatelyEqual(nodeRect, ancestorRect); -} - -function resolveRootViewportRect(nodes: SnapshotNode[], targetRect: Rect): Rect | null { - const targetCenter = centerOfRect(targetRect); - const viewportRects = nodes - .filter((node) => { - const type = (node.type ?? '').toLowerCase(); - return type.includes('application') || type.includes('window'); - }) - .map((node) => normalizeRect(node.rect)) - .filter((rect): rect is Rect => rect !== null); - if (viewportRects.length === 0) return null; - - const containingRects = viewportRects.filter((rect) => - containsPoint(rect, targetCenter.x, targetCenter.y), - ); - return pickLargestRect(containingRects.length > 0 ? containingRects : viewportRects); -} - -// An ancestor is "viewport-sized" when it covers ≥90% of the viewport area and -// at least 80% of its own area overlaps. This catches full-screen containers -// (navigation bars, root views) that are technically hittable but would produce -// imprecise taps if used as the touch target. -function isRectViewportSized(rect: Rect, viewportRect: Rect): boolean { - const overlapArea = intersectionArea(rect, viewportRect); - const rectArea = rect.width * rect.height; - const viewportArea = viewportRect.width * viewportRect.height; - if (overlapArea <= 0 || rectArea <= 0 || viewportArea <= 0) return false; - - const viewportCoverage = overlapArea / viewportArea; - const rectCoverage = overlapArea / rectArea; - return viewportCoverage >= 0.9 && rectCoverage >= 0.8; -} - -function intersectionArea(left: Rect, right: Rect): number { - const xOverlap = Math.max( - 0, - Math.min(left.x + left.width, right.x + right.width) - Math.max(left.x, right.x), - ); - const yOverlap = Math.max( - 0, - Math.min(left.y + left.height, right.y + right.height) - Math.max(left.y, right.y), - ); - return xOverlap * yOverlap; -} diff --git a/src/daemon/handlers/interaction-touch-targets.ts b/src/daemon/handlers/interaction-touch-targets.ts new file mode 100644 index 00000000..3e6b3e99 --- /dev/null +++ b/src/daemon/handlers/interaction-touch-targets.ts @@ -0,0 +1,151 @@ +import type { + FillCommandResult, + InteractionTarget, + PressCommandResult, +} from '../../commands/index.ts'; +import type { DaemonResponse } from '../types.ts'; +import { splitSelectorFromArgs } from '../selectors.ts'; +import { parseCoordinateTarget } from './interaction-targeting.ts'; +import { errorResponse } from './response.ts'; + +export type ParsedPressTarget = + | { ok: true; target: InteractionTarget } + | { ok: false; response: DaemonResponse }; + +export function parsePressTarget(positionals: string[], commandLabel: string): ParsedPressTarget { + const coordinates = parseCoordinateTarget(positionals); + if (coordinates) { + return { ok: true, target: { kind: 'point', x: coordinates.x, y: coordinates.y } }; + } + const first = positionals[0] ?? ''; + if (first.startsWith('@')) { + return { + ok: true, + target: { + kind: 'ref', + ref: first, + fallbackLabel: positionals.length > 1 ? positionals.slice(1).join(' ').trim() : '', + }, + }; + } + const selector = positionals.join(' ').trim(); + if (!selector) { + return { + ok: false, + response: errorResponse( + 'INVALID_ARGS', + `${commandLabel} requires @ref, selector expression, or x y coordinates`, + ), + }; + } + return { ok: true, target: { kind: 'selector', selector } }; +} + +export type ParsedFillTarget = + | { ok: true; target: InteractionTarget; text: string } + | { ok: false; response: DaemonResponse }; + +export function parseFillTarget(positionals: string[]): ParsedFillTarget { + const first = positionals[0] ?? ''; + if (first.startsWith('@')) { + const labelCandidate = positionals.length >= 3 ? positionals[1] : ''; + const text = + positionals.length >= 3 ? positionals.slice(2).join(' ') : positionals.slice(1).join(' '); + if (!text) + return { ok: false, response: errorResponse('INVALID_ARGS', 'fill requires text after ref') }; + return { + ok: true, + target: { + kind: 'ref', + ref: first, + fallbackLabel: labelCandidate, + }, + text, + }; + } + + const coordinates = parseCoordinateTarget(positionals); + if (coordinates) { + const text = positionals.slice(2).join(' '); + if (!text) + return { + ok: false, + response: errorResponse('INVALID_ARGS', 'fill requires text after coordinates'), + }; + return { ok: true, target: { kind: 'point', x: coordinates.x, y: coordinates.y }, text }; + } + + const selectorArgs = splitSelectorFromArgs(positionals, { preferTrailingValue: true }); + if (!selectorArgs) { + return { + ok: false, + response: errorResponse( + 'INVALID_ARGS', + 'fill requires x y text, @ref text, or selector text', + ), + }; + } + const text = selectorArgs.rest.join(' ').trim(); + if (!text) { + return { + ok: false, + response: errorResponse('INVALID_ARGS', 'fill requires text after selector'), + }; + } + return { + ok: true, + target: { kind: 'selector', selector: selectorArgs.selectorExpression }, + text, + }; +} + +export function pressResultExtra(result: PressCommandResult): Record { + if (result.kind === 'ref') { + return { + ref: stripAtPrefix(result.target?.kind === 'ref' ? result.target.ref : undefined), + refLabel: result.refLabel, + selectorChain: result.selectorChain, + }; + } + if (result.kind === 'selector') { + return { + selector: result.target?.kind === 'selector' ? result.target.selector : undefined, + selectorChain: result.selectorChain, + refLabel: result.refLabel, + }; + } + return {}; +} + +export function fillResultExtra(result: FillCommandResult): Record { + if (result.kind === 'ref') { + return { + ref: stripAtPrefix(result.target?.kind === 'ref' ? result.target.ref : undefined), + refLabel: result.refLabel, + selectorChain: result.selectorChain, + }; + } + if (result.kind === 'selector') { + return { + selector: result.target?.kind === 'selector' ? result.target.selector : undefined, + selectorChain: result.selectorChain, + refLabel: result.refLabel, + }; + } + return {}; +} + +export function formatPressTargetLabel( + target: InteractionTarget, + result: PressCommandResult, +): string { + if (target.kind === 'point') return 'coordinate tap'; + if (result.kind === 'ref' && result.target?.kind === 'ref') return result.target.ref; + if (result.kind === 'selector' && result.target?.kind === 'selector') + return result.target.selector; + return 'target'; +} + +export function stripAtPrefix(ref: string | undefined): string | undefined { + return ref?.startsWith('@') ? ref.slice(1) : ref; +} diff --git a/src/daemon/handlers/interaction-touch.ts b/src/daemon/handlers/interaction-touch.ts index d2cf9447..40e30f08 100644 --- a/src/daemon/handlers/interaction-touch.ts +++ b/src/daemon/handlers/interaction-touch.ts @@ -1,25 +1,254 @@ +import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; +import { + buttonTag, + getClickButtonValidationError, + resolveClickButton, +} from '../../core/click-button.ts'; +import type { FillCommandResult, PressCommandResult } from '../../commands/index.ts'; +import { asAppError, normalizeError } from '../../utils/errors.ts'; import type { DaemonResponse } from '../types.ts'; -import type { InteractionHandlerParams } from './interaction-common.ts'; +import { + buildTouchVisualizationResult, + finalizeTouchInteraction, + type InteractionHandlerParams, +} from './interaction-common.ts'; import type { CaptureSnapshotForSession } from './interaction-snapshot.ts'; -import type { ResolveRefTarget } from './interaction-targeting.ts'; import type { RefSnapshotFlagGuardResponse } from './interaction-flags.ts'; -import { handlePressCommand } from './interaction-press.ts'; -import { handleFillCommand } from './interaction-fill.ts'; +import { + readSnapshotNodesReferenceFrame, + resolveDirectTouchReferenceFrameSafely, +} from './interaction-touch-reference-frame.ts'; +import { unsupportedMacOsDesktopSurfaceInteraction } from './interaction-touch-policy.ts'; +import { errorResponse } from './response.ts'; +import { + assertAndroidPressStayedInApp, + isAndroidEscapeError, +} from './interaction-android-escape.ts'; +import { createInteractionRuntime } from './interaction-runtime.ts'; +import { + fillResultExtra, + formatPressTargetLabel, + parseFillTarget, + parsePressTarget, + pressResultExtra, + stripAtPrefix, +} from './interaction-touch-targets.ts'; export async function handleTouchInteractionCommands( params: InteractionHandlerParams & { captureSnapshotForSession: CaptureSnapshotForSession; - resolveRefTarget: ResolveRefTarget; refSnapshotFlagGuardResponse: RefSnapshotFlagGuardResponse; }, ): Promise { switch (params.req.command) { case 'press': case 'click': - return await handlePressCommand(params); + return await dispatchPressViaRuntime(params); case 'fill': - return await handleFillCommand(params); + return await dispatchFillViaRuntime(params); default: return null; } } + +async function dispatchPressViaRuntime( + params: InteractionHandlerParams & { + captureSnapshotForSession: CaptureSnapshotForSession; + refSnapshotFlagGuardResponse: RefSnapshotFlagGuardResponse; + }, +): Promise { + const { req, sessionName, sessionStore } = params; + const session = sessionStore.get(sessionName); + const commandLabel = req.command === 'click' ? 'click' : 'press'; + if (!session) return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); + + const unsupportedSurfaceResponse = unsupportedMacOsDesktopSurfaceInteraction( + session, + commandLabel, + ); + if (unsupportedSurfaceResponse) return unsupportedSurfaceResponse; + if (!isCommandSupportedOnDevice('press', session.device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'press is not supported on this device'); + } + + const clickButton = resolveClickButton(req.flags); + const resultButtonTag = buttonTag(clickButton); + if (clickButton !== 'primary') { + const validationError = getClickButtonValidationError({ + commandLabel, + platform: session.device.platform, + button: clickButton, + count: req.flags?.count, + intervalMs: req.flags?.intervalMs, + holdMs: req.flags?.holdMs, + jitterPx: req.flags?.jitterPx, + doubleTap: req.flags?.doubleTap, + }); + if (validationError) { + return errorResponse(validationError.code, validationError.message, validationError.details); + } + } + + const parsedTarget = parsePressTarget(req.positionals ?? [], commandLabel); + if (!parsedTarget.ok) return parsedTarget.response; + if (parsedTarget.target.kind === 'ref') { + const invalidRefFlagsResponse = params.refSnapshotFlagGuardResponse('press', req.flags); + if (invalidRefFlagsResponse) return invalidRefFlagsResponse; + } + + return await dispatchRuntimeInteraction(params, { + run: async (runtime) => { + const options = { + session: sessionName, + requestId: req.meta?.requestId, + button: clickButton, + count: req.flags?.count, + intervalMs: req.flags?.intervalMs, + holdMs: req.flags?.holdMs, + jitterPx: req.flags?.jitterPx, + doubleTap: req.flags?.doubleTap, + }; + return commandLabel === 'click' + ? await runtime.interactions.click(parsedTarget.target, options) + : await runtime.interactions.press(parsedTarget.target, options); + }, + afterRun: async (result) => { + await assertAndroidPressStayedInApp( + session, + formatPressTargetLabel(parsedTarget.target, result), + ); + }, + buildPayloads: async (result) => { + const referenceFrame = + result.kind === 'point' + ? await resolveDirectTouchReferenceFrameSafely({ + session, + flags: req.flags, + sessionStore, + contextFromFlags: params.contextFromFlags, + captureSnapshotForSession: params.captureSnapshotForSession, + }) + : readSnapshotNodesReferenceFrame(session.snapshot?.nodes ?? []); + const responseData = buildTouchVisualizationResult({ + data: result.backendResult, + fallbackX: result.point.x, + fallbackY: result.point.y, + referenceFrame, + extra: { + ...pressResultExtra(result), + ...resultButtonTag, + }, + }); + return { result: responseData, responseData }; + }, + }); +} + +async function dispatchFillViaRuntime( + params: InteractionHandlerParams & { + captureSnapshotForSession: CaptureSnapshotForSession; + refSnapshotFlagGuardResponse: RefSnapshotFlagGuardResponse; + }, +): Promise { + const { req, sessionName, sessionStore } = params; + const session = sessionStore.get(sessionName); + if (session) { + const unsupportedSurfaceResponse = unsupportedMacOsDesktopSurfaceInteraction(session, 'fill'); + if (unsupportedSurfaceResponse) return unsupportedSurfaceResponse; + } + if (session && !isCommandSupportedOnDevice('fill', session.device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'fill is not supported on this device'); + } + if (!session) return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); + + const parsedTarget = parseFillTarget(req.positionals ?? []); + if (!parsedTarget.ok) return parsedTarget.response; + if (parsedTarget.target.kind === 'ref') { + const invalidRefFlagsResponse = params.refSnapshotFlagGuardResponse('fill', req.flags); + if (invalidRefFlagsResponse) return invalidRefFlagsResponse; + } + + return await dispatchRuntimeInteraction(params, { + run: async (runtime) => + await runtime.interactions.fill(parsedTarget.target, parsedTarget.text, { + session: sessionName, + requestId: req.meta?.requestId, + delayMs: req.flags?.delayMs, + }), + buildPayloads: (result) => { + const referenceFrame = + result.kind === 'point' + ? undefined + : readSnapshotNodesReferenceFrame(session.snapshot?.nodes ?? []); + const recordedResult = buildTouchVisualizationResult({ + data: result.backendResult, + fallbackX: result.point.x, + fallbackY: result.point.y, + referenceFrame, + extra: { + ...fillResultExtra(result), + text: parsedTarget.text, + }, + }); + if (result.warning) recordedResult.warning = result.warning; + + const responseData = + result.kind === 'ref' + ? { + ...(result.backendResult ?? { + ref: stripAtPrefix(result.target?.kind === 'ref' ? result.target.ref : undefined), + x: result.point.x, + y: result.point.y, + }), + } + : recordedResult; + if (result.warning) responseData.warning = result.warning; + return { result: recordedResult, responseData }; + }, + }); +} + +async function dispatchRuntimeInteraction( + params: InteractionHandlerParams & { + captureSnapshotForSession: CaptureSnapshotForSession; + }, + options: { + run(runtime: ReturnType): Promise; + afterRun?(result: TResult): Promise; + buildPayloads( + result: TResult, + ): + | { result: Record; responseData: Record } + | Promise<{ result: Record; responseData: Record }>; + }, +): Promise { + const session = params.sessionStore.get(params.sessionName); + if (!session) return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); + const runtime = createInteractionRuntime(params); + const actionStartedAt = Date.now(); + try { + const runtimeResult = await options.run(runtime); + await options.afterRun?.(runtimeResult); + const actionFinishedAt = Date.now(); + const { result, responseData } = await options.buildPayloads(runtimeResult); + return finalizeTouchInteraction({ + session, + sessionStore: params.sessionStore, + command: params.req.command, + positionals: params.req.positionals ?? [], + flags: params.req.flags, + result, + responseData, + actionStartedAt, + actionFinishedAt, + }); + } catch (error) { + const appError = asAppError(error); + if (isAndroidEscapeError(appError)) throw appError; + return appErrorResponse(error); + } +} + +function appErrorResponse(error: unknown): DaemonResponse { + return { ok: false, error: normalizeError(error) }; +} diff --git a/src/daemon/handlers/interaction.ts b/src/daemon/handlers/interaction.ts index 972cf576..3b443f2b 100644 --- a/src/daemon/handlers/interaction.ts +++ b/src/daemon/handlers/interaction.ts @@ -1,11 +1,16 @@ import type { DaemonResponse } from '../types.ts'; import type { InteractionHandlerParams } from './interaction-common.ts'; import { handleTouchInteractionCommands } from './interaction-touch.ts'; -import { handleGetCommand } from './interaction-get.ts'; -import { handleIsCommand } from './interaction-is.ts'; import { captureSnapshotForSession } from './interaction-snapshot.ts'; -import { resolveRefTarget } from './interaction-targeting.ts'; import { refSnapshotFlagGuardResponse } from './interaction-flags.ts'; +import { dispatchGetViaRuntime, dispatchIsViaRuntime } from '../selector-runtime.ts'; +import { createInteractionRuntime } from './interaction-runtime.ts'; +import { finalizeTouchInteraction } from './interaction-common.ts'; +import { errorResponse } from './response.ts'; +import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; +import { normalizeError } from '../../utils/errors.ts'; +import { successText } from '../../utils/success-text.ts'; +import { recoverAndroidBlockingSystemDialog } from '../android-system-dialog.ts'; export { unsupportedRefSnapshotFlags } from './interaction-flags.ts'; @@ -15,7 +20,6 @@ export async function handleInteractionCommands( const touchResponse = await handleTouchInteractionCommands({ ...params, captureSnapshotForSession, - resolveRefTarget, refSnapshotFlagGuardResponse, }); if (touchResponse) { @@ -23,11 +27,66 @@ export async function handleInteractionCommands( } switch (params.req.command) { + case 'type': + return await dispatchTypeViaRuntime({ + ...params, + captureSnapshotForSession, + }); case 'get': - return await handleGetCommand(params); + return await dispatchGetViaRuntime(params); case 'is': - return await handleIsCommand(params); + return await dispatchIsViaRuntime(params); default: return null; } } + +async function dispatchTypeViaRuntime( + params: InteractionHandlerParams & { + captureSnapshotForSession: typeof captureSnapshotForSession; + }, +): Promise { + const { req, sessionName, sessionStore } = params; + const session = sessionStore.get(sessionName); + if (!session) return errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'); + if (!isCommandSupportedOnDevice('type', session.device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'type is not supported on this device'); + } + if (session.device.platform === 'android' && session.recording) { + const androidRecoveryResult = await recoverAndroidBlockingSystemDialog({ session }); + if (androidRecoveryResult === 'failed') { + return errorResponse('COMMAND_FAILED', 'Android system dialog blocked the recording session'); + } + } + + const text = (req.positionals ?? []).join(' '); + const runtime = createInteractionRuntime(params); + const actionStartedAt = Date.now(); + try { + const result = await runtime.interactions.typeText(text, { + session: sessionName, + requestId: req.meta?.requestId, + delayMs: req.flags?.delayMs, + }); + const actionFinishedAt = Date.now(); + const responseData = { + ...(result.backendResult ?? {}), + text: result.text, + delayMs: result.delayMs, + ...successText(result.message ?? `Typed ${Array.from(result.text).length} chars`), + }; + return finalizeTouchInteraction({ + session, + sessionStore, + command: req.command, + positionals: req.positionals ?? [], + flags: req.flags, + result: responseData, + responseData, + actionStartedAt, + actionFinishedAt, + }); + } catch (error) { + return { ok: false, error: normalizeError(error) }; + } +} diff --git a/src/daemon/handlers/snapshot-capture.ts b/src/daemon/handlers/snapshot-capture.ts index f115e535..0611348d 100644 --- a/src/daemon/handlers/snapshot-capture.ts +++ b/src/daemon/handlers/snapshot-capture.ts @@ -9,10 +9,9 @@ import { type RawSnapshotNode, type SnapshotBackend, type SnapshotState, - type SnapshotVisibility, } from '../../utils/snapshot.ts'; import { normalizeSnapshotTree } from '../../utils/snapshot-tree.ts'; -import { buildMobileSnapshotPresentation } from '../../utils/mobile-snapshot-semantics.ts'; +export { buildSnapshotVisibility } from '../../utils/snapshot-visibility.ts'; import type { SessionState } from '../types.ts'; import { ANDROID_FRESHNESS_RETRY_DELAYS_MS, @@ -27,10 +26,6 @@ import { contextFromFlags } from '../context.ts'; import { findNodeByLabel, pruneGroupNodes, resolveRefLabel } from '../snapshot-processing.ts'; import { errorResponse, type DaemonFailureResponse } from './response.ts'; -function isDesktopBackend(backend: SnapshotBackend | undefined): boolean { - return backend === 'macos-helper' || backend === 'linux-atspi'; -} - type CaptureSnapshotParams = { device: SessionState['device']; session: SessionState | undefined; @@ -228,41 +223,6 @@ export function buildSnapshotState( }; } -export function buildSnapshotVisibility(params: { - nodes: SnapshotState['nodes']; - backend?: SnapshotState['backend']; - snapshotRaw?: boolean; -}): SnapshotVisibility { - const { nodes, backend, snapshotRaw } = params; - if (snapshotRaw || isDesktopBackend(backend)) { - return { - partial: false, - visibleNodeCount: nodes.length, - totalNodeCount: nodes.length, - reasons: [], - }; - } - - const presentation = buildMobileSnapshotPresentation(nodes); - const reasons = new Set(); - if (presentation.hiddenCount > 0) { - reasons.add('offscreen-nodes'); - } - if (presentation.nodes.some((node) => node.hiddenContentAbove)) { - reasons.add('scroll-hidden-above'); - } - if (presentation.nodes.some((node) => node.hiddenContentBelow)) { - reasons.add('scroll-hidden-below'); - } - - return { - partial: reasons.size > 0, - visibleNodeCount: presentation.nodes.length, - totalNodeCount: nodes.length, - reasons: [...reasons], - }; -} - function shapeDesktopSurfaceSnapshot( data: SnapshotData, options: { diff --git a/src/daemon/handlers/snapshot.ts b/src/daemon/handlers/snapshot.ts index 95d3769d..edfd6d8e 100644 --- a/src/daemon/handlers/snapshot.ts +++ b/src/daemon/handlers/snapshot.ts @@ -1,24 +1,12 @@ -import { isCommandSupportedOnDevice } from '../../core/capabilities.ts'; -import type { DaemonRequest, DaemonResponse, SessionState } from '../types.ts'; +import type { DaemonRequest, DaemonResponse } from '../types.ts'; import { SessionStore } from '../session-store.ts'; -import { buildSnapshotDiff, countSnapshotComparableLines } from '../snapshot-diff.ts'; import { errorResponse } from './response.ts'; -import { - buildSnapshotVisibility, - captureSnapshot, - resolveSnapshotScope, -} from './snapshot-capture.ts'; -import { - buildSnapshotSession, - recordIfSession, - resolveSessionDevice, - withSessionlessRunnerCleanup, -} from './snapshot-session.ts'; -import { handleWaitCommand, parseWaitArgs, waitNeedsRunnerCleanup } from './snapshot-wait.ts'; +import { parseWaitArgs } from './snapshot-wait.ts'; import { handleAlertCommand } from './snapshot-alert.ts'; import { handleSettingsCommand, parseSettingsArgs } from './snapshot-settings.ts'; -import { uniqueStrings } from '../action-utils.ts'; -import { isLikelyStaleSnapshotDrop } from '../android-snapshot-freshness.ts'; +import { dispatchSnapshotDiffViaRuntime, dispatchSnapshotViaRuntime } from '../snapshot-runtime.ts'; +import { dispatchWaitViaRuntime } from '../selector-runtime.ts'; +import { resolveSessionDevice, withSessionlessRunnerCleanup } from './snapshot-session.ts'; const SNAPSHOT_COMMANDS = new Set(['snapshot', 'diff', 'wait', 'alert', 'settings']); @@ -38,57 +26,11 @@ export async function handleSnapshotCommands(params: { } if (command === 'snapshot') { - const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); - if (!isCommandSupportedOnDevice('snapshot', device)) { - return errorResponse('UNSUPPORTED_OPERATION', 'snapshot is not supported on this device'); - } - const resolvedScope = resolveSnapshotScope(req.flags?.snapshotScope, session); - if (!resolvedScope.ok) return resolvedScope; - - return await withSessionlessRunnerCleanup(session, device, async () => { - const capture = await captureSnapshot({ - device, - session, - flags: req.flags, - outPath: req.flags?.out, - logPath, - snapshotScope: resolvedScope.scope, - }); - const warnings = buildSnapshotWarnings({ - capture, - flags: req.flags, - session, - }); - const visibility = buildSnapshotVisibility({ - nodes: capture.snapshot.nodes, - backend: capture.snapshot.backend, - snapshotRaw: req.flags?.snapshotRaw, - }); - const nextSession = buildSnapshotSession({ - session, - sessionName, - device, - snapshot: capture.snapshot, - appBundleId: session?.appBundleId, - }); - recordIfSession(sessionStore, nextSession, req, { - nodes: capture.snapshot.nodes.length, - truncated: capture.snapshot.truncated ?? false, - }); - sessionStore.set(sessionName, nextSession); - return { - ok: true, - data: { - nodes: capture.snapshot.nodes, - truncated: capture.snapshot.truncated ?? false, - visibility, - ...(warnings.length > 0 ? { warnings } : {}), - appName: nextSession.appBundleId - ? (nextSession.appName ?? nextSession.appBundleId) - : undefined, - appBundleId: nextSession.appBundleId, - }, - }; + return await dispatchSnapshotViaRuntime({ + req, + sessionName, + logPath, + sessionStore, }); } @@ -96,29 +38,11 @@ export async function handleSnapshotCommands(params: { if (req.positionals?.[0] !== 'snapshot') { return errorResponse('INVALID_ARGS', 'diff currently supports only: diff snapshot'); } - return await handleSnapshotDiffRequest({ req, sessionName, logPath, sessionStore }); + return await dispatchSnapshotDiffViaRuntime({ req, sessionName, logPath, sessionStore }); } if (command === 'wait') { - const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); - const parsed = parseWaitArgs(req.positionals ?? []); - if (!parsed) { - return errorResponse('INVALID_ARGS', 'wait requires a duration or text'); - } - const executeWait = () => - handleWaitCommand({ - parsed, - req, - sessionName, - logPath, - sessionStore, - session, - device, - }); - if (!waitNeedsRunnerCleanup(parsed)) { - return await executeWait(); - } - return await withSessionlessRunnerCleanup(session, device, executeWait); + return await dispatchWaitViaRuntime({ req, sessionName, logPath, sessionStore }); } if (command === 'alert') { @@ -152,154 +76,3 @@ export async function handleSnapshotCommands(params: { return null; } - -function buildSnapshotWarnings(params: { - capture: Awaited>; - flags: DaemonRequest['flags']; - session: SessionState | undefined; -}): string[] { - const { capture, flags, session } = params; - const warnings: string[] = []; - const analysis = capture.analysis; - const interactiveOnly = flags?.snapshotInteractiveOnly === true; - - if ( - capture.snapshot.backend === 'android' && - interactiveOnly && - capture.snapshot.nodes.length === 0 && - analysis && - analysis.rawNodeCount >= 12 - ) { - warnings.push( - `Interactive snapshot is empty after filtering ${analysis.rawNodeCount} raw Android nodes. Likely causes: depth too low, transient route change, or collector filtering.`, - ); - if (typeof flags?.snapshotDepth === 'number' && analysis.maxDepth >= flags.snapshotDepth + 2) { - warnings.push( - `Interactive output is empty at depth ${flags.snapshotDepth}; retry without -d.`, - ); - } - } - - // When a snapshot was captured very recently (within 2 s) and the node count dropped - // sharply, the new dump likely hit a mid-transition frame. The 2 s window limits - // this check to rapid successive snapshots where the UI had no time to settle. - const previousSnapshot = session?.snapshot; - if ( - !capture.freshness && - previousSnapshot && - Date.now() - previousSnapshot.createdAt <= 2_000 && - isLikelyStaleSnapshotDrop(previousSnapshot.nodes.length, capture.snapshot.nodes.length) - ) { - warnings.push( - 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', - ); - } - - if (capture.freshness?.staleAfterRetries && capture.snapshot.backend === 'android') { - // `empty-interactive` intentionally relies on the generic empty-interactive warning above. - // Freshness recovery may resolve a transient filtered-zero tree, but if retries still end - // empty we want one final warning, not a second freshness-specific variant of the same issue. - if (capture.freshness.reason === 'stuck-route') { - warnings.push( - `Recent ${capture.freshness.action} was followed by a nearly identical snapshot after ${capture.freshness.retryCount} automatic retr${capture.freshness.retryCount === 1 ? 'y' : 'ies'}. If you expected navigation or submit, the tree may still be stale. Use screenshot as visual truth, wait briefly, then re-snapshot once.`, - ); - } else if (capture.freshness.reason === 'sharp-drop') { - warnings.push( - 'Recent snapshots dropped sharply in node count, which suggests stale or mid-transition UI. Use screenshot as visual truth, wait briefly, then re-snapshot once.', - ); - } - } - - return uniqueStrings(warnings); -} - -async function handleSnapshotDiffRequest(params: { - req: DaemonRequest; - sessionName: string; - logPath: string; - sessionStore: SessionStore; -}): Promise { - const { req, sessionName, logPath, sessionStore } = params; - const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); - if (!isCommandSupportedOnDevice('diff', device)) { - return errorResponse('UNSUPPORTED_OPERATION', 'diff is not supported on this device'); - } - const resolvedScope = resolveSnapshotScope(req.flags?.snapshotScope, session); - if (!resolvedScope.ok) return resolvedScope; - const flattenForDiff = req.flags?.snapshotInteractiveOnly === true; - - return await withSessionlessRunnerCleanup(session, device, async () => { - const capture = await captureSnapshot({ - device, - session, - flags: req.flags, - outPath: req.flags?.out, - logPath, - snapshotScope: resolvedScope.scope, - }); - const currentSnapshot = capture.snapshot; - const warnings = buildSnapshotWarnings({ - capture, - flags: req.flags, - session, - }); - - if (!session?.snapshot) { - const unchanged = countSnapshotComparableLines(currentSnapshot.nodes, { - flatten: flattenForDiff, - }); - const nextSession = buildSnapshotSession({ - session, - sessionName, - device, - snapshot: currentSnapshot, - appBundleId: session?.appBundleId, - }); - recordIfSession(sessionStore, nextSession, req, { - mode: 'snapshot', - baselineInitialized: true, - summary: { - additions: 0, - removals: 0, - unchanged, - }, - }); - sessionStore.set(sessionName, nextSession); - return { - ok: true, - data: { - mode: 'snapshot', - baselineInitialized: true, - summary: { - additions: 0, - removals: 0, - unchanged, - }, - lines: [], - ...(warnings.length > 0 ? { warnings } : {}), - }, - }; - } - - const diff = buildSnapshotDiff(session.snapshot.nodes, currentSnapshot.nodes, { - flatten: flattenForDiff, - }); - const nextSession: SessionState = { ...session, snapshot: currentSnapshot }; - recordIfSession(sessionStore, nextSession, req, { - mode: 'snapshot', - baselineInitialized: false, - summary: diff.summary, - }); - sessionStore.set(sessionName, nextSession); - return { - ok: true, - data: { - mode: 'snapshot', - baselineInitialized: false, - summary: diff.summary, - lines: diff.lines, - ...(warnings.length > 0 ? { warnings } : {}), - }, - }; - }); -} diff --git a/src/daemon/is-predicates.ts b/src/daemon/is-predicates.ts index ea00bb0f..36e9633d 100644 --- a/src/daemon/is-predicates.ts +++ b/src/daemon/is-predicates.ts @@ -1,122 +1 @@ -import type { Platform } from '../utils/device.ts'; -import type { SnapshotState } from '../utils/snapshot.ts'; -import { isNodeVisibleInEffectiveViewport } from '../utils/mobile-snapshot-semantics.ts'; -import { extractNodeText, normalizeType } from './snapshot-processing.ts'; -import { isNodeEditable, isNodeVisible } from './selectors.ts'; - -type IsPredicate = 'visible' | 'hidden' | 'exists' | 'editable' | 'selected' | 'text'; - -export function isSupportedPredicate(input: string): input is IsPredicate { - return ['visible', 'hidden', 'exists', 'editable', 'selected', 'text'].includes(input); -} - -export function evaluateIsPredicate(params: { - predicate: Exclude; - node: SnapshotState['nodes'][number]; - nodes: SnapshotState['nodes']; - expectedText?: string; - platform: Platform; -}): { pass: boolean; actualText: string; details: string } { - const { predicate, node, nodes, expectedText, platform } = params; - const actualText = extractNodeText(node); - const editable = isNodeEditable(node, platform); - const selected = node.selected === true; - const visible = predicate === 'text' ? isNodeVisible(node) : isAssertionVisible(node, nodes); - let pass = false; - switch (predicate) { - case 'visible': - pass = visible; - break; - case 'hidden': - pass = !visible; - break; - case 'editable': - pass = editable; - break; - case 'selected': - pass = selected; - break; - case 'text': - pass = actualText === (expectedText ?? ''); - break; - } - const details = - predicate === 'text' - ? `expected="${expectedText ?? ''}" actual="${actualText}"` - : `actual=${JSON.stringify({ - visible, - editable, - selected, - })}`; - return { pass, actualText, details }; -} - -function isAssertionVisible( - node: SnapshotState['nodes'][number], - nodes: SnapshotState['nodes'], -): boolean { - if (node.hittable === true) return true; - if (hasPositiveRect(node.rect)) return isRectVisibleInViewport(node, nodes); - if (node.rect) return false; - const anchor = resolveVisibilityAnchor(node, nodes); - if (!anchor) return false; - if (anchor.hittable === true) return true; - if (!hasPositiveRect(anchor.rect)) return false; - return isRectVisibleInViewport(anchor, nodes); -} - -function isRectVisibleInViewport( - node: SnapshotState['nodes'][number], - nodes: SnapshotState['nodes'], -): boolean { - return isNodeVisibleInEffectiveViewport(node, nodes); -} - -function resolveVisibilityAnchor( - node: SnapshotState['nodes'][number], - nodes: SnapshotState['nodes'], -): SnapshotState['nodes'][number] | null { - const nodesByIndex = new Map(nodes.map((entry) => [entry.index, entry])); - let current = node; - const visited = new Set(); - while (typeof current.parentIndex === 'number' && !visited.has(current.index)) { - visited.add(current.index); - const parent = nodesByIndex.get(current.parentIndex); - if (!parent) break; - if (isUsefulVisibilityAnchor(parent)) return parent; - current = parent; - } - return null; -} - -function isUsefulVisibilityAnchor(node: SnapshotState['nodes'][number]): boolean { - const type = normalizeType(node.type ?? ''); - // These containers often report the full content frame, not the clipped on-screen geometry. - if ( - type.includes('application') || - type.includes('window') || - type.includes('scrollview') || - type.includes('tableview') || - type.includes('collectionview') || - type === 'table' || - type === 'list' || - type === 'listview' - ) { - return false; - } - return node.hittable === true || hasPositiveRect(node.rect); -} - -function hasPositiveRect( - rect: SnapshotState['nodes'][number]['rect'], -): rect is NonNullable { - return Boolean( - rect && - Number.isFinite(rect.x) && - Number.isFinite(rect.y) && - Number.isFinite(rect.width) && - Number.isFinite(rect.height) && - rect.width > 0 && - rect.height > 0, - ); -} +export * from '../utils/selector-is-predicates.ts'; diff --git a/src/daemon/request-router.ts b/src/daemon/request-router.ts index ebebd4ec..a17fbf85 100644 --- a/src/daemon/request-router.ts +++ b/src/daemon/request-router.ts @@ -38,6 +38,10 @@ import { augmentScrollVisualizationResult, recordTouchVisualizationEvent, } from './recording-gestures.ts'; +import { + dispatchScreenshotViaRuntime, + type ScreenshotOutputPlacement, +} from './screenshot-runtime.ts'; import { recoverAndroidBlockingSystemDialog } from './android-system-dialog.ts'; import { getRunnerSessionSnapshot } from '../platforms/ios/runner-client.ts'; import { annotateScreenshotWithRefs } from './screenshot-overlay.ts'; @@ -340,6 +344,7 @@ async function runHandlerChain(params: { const interactionResponse = await handleInteractionCommands({ req, sessionName, + logPath, sessionStore, contextFromFlags, }); @@ -393,9 +398,18 @@ async function dispatchGenericCommand(params: { ...contextFromFlags(logPath, req.flags, session.appBundleId, session.trace?.outPath), surface: session.surface, }; - const data = await dispatchCommand(session.device, command, resolvedPositionals, resolvedOut, { - ...dispatchContext, - }); + const data = + command === 'screenshot' + ? await dispatchScreenshotViaRuntime({ + session, + sessionName: params.sessionName, + outPath: resolvedPositionals[0] ?? resolvedOut, + outputPlacement: resolveScreenshotOutputPlacement(req), + dispatchContext, + }) + : await dispatchCommand(session.device, command, resolvedPositionals, resolvedOut, { + ...dispatchContext, + }); if (command === 'screenshot' && req.flags?.overlayRefs && typeof data?.path === 'string') { await applyScreenshotOverlay(session, data, logPath); @@ -422,6 +436,13 @@ async function dispatchGenericCommand(params: { return { ok: true, data: data ?? {} }; } +function resolveScreenshotOutputPlacement(req: DaemonRequest): ScreenshotOutputPlacement { + if (req.command !== 'screenshot') return 'default'; + if ((req.positionals ?? [])[0]) return 'positional'; + if (req.flags?.out) return 'out'; + return 'default'; +} + function resolveCommandPositionals(req: DaemonRequest): { resolvedPositionals: string[]; resolvedOut: string | undefined; diff --git a/src/daemon/screenshot-runtime.ts b/src/daemon/screenshot-runtime.ts new file mode 100644 index 00000000..ea5c547e --- /dev/null +++ b/src/daemon/screenshot-runtime.ts @@ -0,0 +1,126 @@ +import { promises as fs } from 'node:fs'; +import os from 'node:os'; +import path from 'node:path'; +import type { AgentDeviceBackend, BackendScreenshotResult } from '../backend.ts'; +import type { ArtifactAdapter } from '../io.ts'; +import { createAgentDevice, localCommandPolicy } from '../runtime.ts'; +import { dispatchCommand } from '../core/dispatch.ts'; +import { AppError } from '../utils/errors.ts'; +import type { DaemonCommandContext } from './context.ts'; +import type { SessionState } from './types.ts'; + +export type ScreenshotOutputPlacement = 'positional' | 'out' | 'default'; + +export async function dispatchScreenshotViaRuntime(params: { + session: SessionState; + sessionName: string; + outPath?: string; + outputPlacement: ScreenshotOutputPlacement; + dispatchContext: DaemonCommandContext; +}): Promise> { + const { session, sessionName, outPath, outputPlacement, dispatchContext } = params; + const runtime = createAgentDevice({ + backend: createDispatchScreenshotBackend({ session, outputPlacement, dispatchContext }), + artifacts: createDaemonScreenshotArtifactAdapter(), + sessions: { + get: (name) => + name === sessionName + ? { + name: sessionName, + appBundleId: session.appBundleId, + metadata: { surface: session.surface }, + } + : undefined, + set: () => {}, + }, + policy: localCommandPolicy(), + }); + + return await runtime.capture.screenshot({ + session: sessionName, + requestId: dispatchContext.requestId, + appBundleId: session.appBundleId, + fullscreen: dispatchContext.screenshotFullscreen, + surface: session.surface, + ...(outPath ? { out: { kind: 'path', path: outPath } } : {}), + }); +} + +function createDispatchScreenshotBackend(params: { + session: SessionState; + outputPlacement: ScreenshotOutputPlacement; + dispatchContext: DaemonCommandContext; +}): AgentDeviceBackend { + const { session, outputPlacement, dispatchContext } = params; + return { + platform: session.device.platform, + captureScreenshot: async (_context, outPath, options) => { + const context = { + ...dispatchContext, + screenshotFullscreen: options?.fullscreen, + overlayRefs: options?.overlayRefs, + surface: options?.surface, + }; + if (outputPlacement === 'out') { + return toBackendScreenshotResult( + await dispatchCommand(session.device, 'screenshot', [], outPath, context), + ); + } + return toBackendScreenshotResult( + await dispatchCommand(session.device, 'screenshot', [outPath], undefined, context), + ); + }, + }; +} + +function toBackendScreenshotResult(data: unknown): BackendScreenshotResult | void { + if (typeof data !== 'object' || data === null) return; + const record = data as Record; + return { + ...(typeof record.path === 'string' ? { path: record.path } : {}), + ...(Array.isArray(record.overlayRefs) + ? { overlayRefs: record.overlayRefs as NonNullable } + : {}), + }; +} + +function createDaemonScreenshotArtifactAdapter(): ArtifactAdapter { + return { + resolveInput: async () => { + throw new AppError('UNSUPPORTED_OPERATION', 'screenshot does not resolve input artifacts'); + }, + reserveOutput: async (ref) => { + let tempRoot: string | undefined; + let outputPath: string; + if (ref?.kind === 'path') { + outputPath = ref.path; + } else { + tempRoot = await fs.mkdtemp(path.join(os.tmpdir(), 'agent-device-screenshot-')); + outputPath = path.join(tempRoot, 'screenshot.png'); + } + await fs.mkdir(path.dirname(outputPath), { recursive: true }); + return { + path: outputPath, + visibility: 'client-visible', + publish: async () => undefined, + ...(tempRoot + ? { + cleanup: async () => { + await fs.rm(tempRoot, { recursive: true, force: true }); + }, + } + : {}), + }; + }, + createTempFile: async (options) => { + const root = await fs.mkdtemp(path.join(os.tmpdir(), `${options.prefix}-`)); + return { + path: path.join(root, `file${options.ext}`), + visibility: 'internal', + cleanup: async () => { + await fs.rm(root, { recursive: true, force: true }); + }, + }; + }, + }; +} diff --git a/src/daemon/selector-recording.ts b/src/daemon/selector-recording.ts new file mode 100644 index 00000000..4ad06fed --- /dev/null +++ b/src/daemon/selector-recording.ts @@ -0,0 +1,120 @@ +import type { DaemonRequest } from './types.ts'; +import { SessionStore } from './session-store.ts'; + +export function buildFindRecordResult( + result: Record, + action: 'exists' | 'wait' | 'get_text' | 'get_attrs', +): Record { + if (action === 'exists') return { found: true }; + if (action === 'wait') { + return { found: true, waitedMs: result.waitedMs }; + } + const ref = typeof result.ref === 'string' ? result.ref : undefined; + if (action === 'get_attrs') return { ref, action: 'get attrs' }; + return { + ref, + action: 'get text', + text: typeof result.text === 'string' ? result.text : '', + }; +} + +export function toDaemonFindData(result: Record): Record { + if (result.kind === 'found') { + return { + found: true, + ...(typeof result.waitedMs === 'number' ? { waitedMs: result.waitedMs } : {}), + }; + } + return { + ...(typeof result.ref === 'string' ? { ref: result.ref } : {}), + ...(typeof result.text === 'string' ? { text: result.text } : {}), + ...(result.node && typeof result.node === 'object' ? { node: result.node } : {}), + }; +} + +export function buildGetRecordResult( + result: Record, + property: 'text' | 'attrs', +): Record { + const selectorChain = Array.isArray(result.selectorChain) ? result.selectorChain : undefined; + const resolvedTarget = getResolvedTarget(result); + const ref = resolvedTarget?.kind === 'ref' ? normalizeDaemonRef(resolvedTarget.ref) : undefined; + const selector = resolvedTarget?.kind === 'selector' ? resolvedTarget.selector : undefined; + const recordedTarget = { + ...(ref ? { ref } : {}), + ...(selector ? { selector } : {}), + ...(selectorChain ? { selectorChain } : {}), + }; + if (property === 'attrs') return recordedTarget; + + const text = typeof result.text === 'string' ? result.text : ''; + return { + ...recordedTarget, + text, + refLabel: compactRecordedGetRefLabel(text), + }; +} + +export function toDaemonGetData(result: Record): Record { + const target = getResolvedTarget(result); + return { + ...(target?.kind === 'ref' ? { ref: normalizeDaemonRef(target.ref) } : {}), + ...(target?.kind === 'selector' ? { selector: target.selector } : {}), + ...(typeof result.text === 'string' ? { text: result.text } : {}), + ...(result.node && typeof result.node === 'object' ? { node: result.node } : {}), + }; +} + +export function toDaemonWaitData(result: Record): Record { + return { + waitedMs: result.waitedMs, + ...(typeof result.text === 'string' ? { text: result.text } : {}), + ...(typeof result.selector === 'string' ? { selector: result.selector } : {}), + }; +} + +export function stripSelectorChain>(result: T): T { + const { selectorChain: _selectorChain, ...publicResult } = result; + return publicResult as T; +} + +export function recordIfSession( + sessionStore: SessionStore, + sessionName: string, + req: DaemonRequest, + result: Record, +): void { + const session = sessionStore.get(sessionName); + if (!session) return; + sessionStore.recordAction(session, { + command: req.command, + positionals: req.positionals ?? [], + flags: req.flags ?? {}, + result, + }); +} + +function compactRecordedGetRefLabel(text: string): string | undefined { + const trimmed = text.trim(); + if (!trimmed || trimmed.length > 80 || /[\r\n]/.test(trimmed)) return undefined; + return trimmed; +} + +function getResolvedTarget( + result: Record, +): { kind: 'ref'; ref: string } | { kind: 'selector'; selector: string } | undefined { + const target = result.target; + if (!target || typeof target !== 'object') return undefined; + const record = target as Record; + if (record.kind === 'ref' && typeof record.ref === 'string') { + return { kind: 'ref', ref: record.ref }; + } + if (record.kind === 'selector' && typeof record.selector === 'string') { + return { kind: 'selector', selector: record.selector }; + } + return undefined; +} + +function normalizeDaemonRef(ref: string): string { + return ref.startsWith('@') ? ref.slice(1) : ref; +} diff --git a/src/daemon/selector-runtime.ts b/src/daemon/selector-runtime.ts new file mode 100644 index 00000000..9eb9b967 --- /dev/null +++ b/src/daemon/selector-runtime.ts @@ -0,0 +1,502 @@ +import type { + AgentDeviceBackend, + BackendSnapshotOptions, + BackendSnapshotResult, +} from '../backend.ts'; +import { createAgentDevice, localCommandPolicy } from '../runtime.ts'; +import { isCommandSupportedOnDevice } from '../core/capabilities.ts'; +import { resolveTargetDevice, type CommandFlags } from '../core/dispatch.ts'; +import { isApplePlatform } from '../utils/device.ts'; +import { AppError, asAppError } from '../utils/errors.ts'; +import type { SnapshotNode } from '../utils/snapshot.ts'; +import { runIosRunnerCommand } from '../platforms/ios/runner-client.ts'; +import type { DaemonRequest, DaemonResponse, SessionState } from './types.ts'; +import { SessionStore } from './session-store.ts'; +import { contextFromFlags } from './context.ts'; +import { ensureDeviceReady } from './device-ready.ts'; +import { captureSnapshot } from './handlers/snapshot-capture.ts'; +import { readTextForNode } from './handlers/interaction-read.ts'; +import { + parseWaitArgs, + waitNeedsRunnerCleanup, + type WaitParsed, +} from './handlers/snapshot-wait.ts'; +import { errorResponse } from './handlers/response.ts'; +import { findNodeByLabel } from './snapshot-processing.ts'; +import { resolveSessionDevice, withSessionlessRunnerCleanup } from './handlers/snapshot-session.ts'; +import { parseFindArgs, type FindAction } from '../utils/finders.ts'; +import { splitIsSelectorArgs } from './selectors.ts'; +import { refSnapshotFlagGuardResponse } from './handlers/interaction-flags.ts'; +import type { IsCommandOptions } from '../commands/selector-read.ts'; +import { isSupportedPredicate } from './is-predicates.ts'; +import type { ContextFromFlags } from './handlers/interaction-common.ts'; +import { getActiveAndroidSnapshotFreshness } from './android-snapshot-freshness.ts'; +import { + buildFindRecordResult, + buildGetRecordResult, + recordIfSession, + stripSelectorChain, + toDaemonFindData, + toDaemonGetData, + toDaemonWaitData, +} from './selector-recording.ts'; + +type SelectorRuntimeParams = { + req: DaemonRequest; + sessionName: string; + logPath?: string; + sessionStore: SessionStore; + contextFromFlags?: ContextFromFlags; +}; + +type SnapshotFlagOverrides = Partial< + Pick< + CommandFlags, + | 'snapshotInteractiveOnly' + | 'snapshotCompact' + | 'snapshotScope' + | 'snapshotDepth' + | 'snapshotRaw' + > +>; + +export async function dispatchFindReadOnlyViaRuntime( + params: SelectorRuntimeParams, +): Promise { + const { req } = params; + if (req.command !== 'find') return null; + const args = req.positionals ?? []; + if (args.length === 0) return errorResponse('INVALID_ARGS', 'find requires a locator or text'); + const parsed = parseFindArgs(args); + if (!parsed.query) return errorResponse('INVALID_ARGS', 'find requires a value'); + if (req.flags?.findFirst && req.flags?.findLast) { + return errorResponse('INVALID_ARGS', 'find accepts only one of --first or --last'); + } + const action = parsed.action; + if (!isReadOnlyFindAction(action)) return null; + + const resolvedRuntime = await createSelectorRuntime(params, { + requireSession: false, + capability: 'find', + }); + if (!resolvedRuntime.ok) return resolvedRuntime.response; + + return await toDaemonResponse(async () => { + const result = await resolvedRuntime.runtime.selectors.find({ + session: params.sessionName, + requestId: req.meta?.requestId, + locator: parsed.locator, + query: parsed.query, + action, + timeoutMs: parsed.timeoutMs, + }); + recordIfSession( + params.sessionStore, + params.sessionName, + req, + buildFindRecordResult(result, action), + ); + return toDaemonFindData(result); + }); +} + +export async function dispatchGetViaRuntime( + params: SelectorRuntimeParams, +): Promise { + const { req } = params; + if (req.command !== 'get') return null; + const sub = req.positionals?.[0]; + if (sub !== 'text' && sub !== 'attrs') { + return errorResponse('INVALID_ARGS', 'get only supports text or attrs'); + } + const resolvedRuntime = await createSelectorRuntime(params, { + requireSession: true, + capability: 'get', + }); + if (!resolvedRuntime.ok) return resolvedRuntime.response; + + const target = parseGetTarget(req); + if (!target.ok) return target.response; + if (target.target.kind === 'ref') { + const invalidRefFlagsResponse = refSnapshotFlagGuardResponse('get', req.flags); + if (invalidRefFlagsResponse) return invalidRefFlagsResponse; + } + + return await toDaemonResponse(async () => { + const result = await resolvedRuntime.runtime.selectors.get({ + session: params.sessionName, + requestId: req.meta?.requestId, + property: sub, + target: target.target, + }); + recordIfSession( + params.sessionStore, + params.sessionName, + req, + buildGetRecordResult(result, sub), + ); + return toDaemonGetData(result); + }); +} + +export async function dispatchIsViaRuntime( + params: SelectorRuntimeParams, +): Promise { + const { req } = params; + if (req.command !== 'is') return null; + const predicate = (req.positionals?.[0] ?? '').toLowerCase(); + if (!isSupportedPredicate(predicate)) { + return errorResponse( + 'INVALID_ARGS', + 'is requires predicate: visible|hidden|exists|editable|selected|text', + ); + } + const { split } = splitIsSelectorArgs(req.positionals ?? []); + if (!split) return errorResponse('INVALID_ARGS', 'is requires a selector expression'); + const expectedText = split.rest.join(' ').trim(); + if (predicate === 'text' && !expectedText) { + return errorResponse('INVALID_ARGS', 'is text requires expected text value'); + } + if (predicate !== 'text' && split.rest.length > 0) { + return errorResponse('INVALID_ARGS', `is ${predicate} does not accept trailing values`); + } + const resolvedRuntime = await createSelectorRuntime(params, { + requireSession: true, + capability: 'is', + }); + if (!resolvedRuntime.ok) return resolvedRuntime.response; + + return await toDaemonResponse(async () => { + const result = await resolvedRuntime.runtime.selectors.is({ + session: params.sessionName, + requestId: req.meta?.requestId, + predicate: predicate as IsCommandOptions['predicate'], + selector: split.selectorExpression, + expectedText, + }); + recordIfSession(params.sessionStore, params.sessionName, req, result); + return stripSelectorChain(result); + }); +} + +export async function dispatchWaitViaRuntime( + params: SelectorRuntimeParams, +): Promise { + const { req, sessionName, sessionStore } = params; + const parsed = parseWaitArgs(req.positionals ?? []); + if (!parsed) return errorResponse('INVALID_ARGS', 'wait requires a duration or text'); + const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); + if (parsed.kind !== 'sleep' && !isCommandSupportedOnDevice('wait', device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'wait is not supported on this device'); + } + const execute = async () => { + const runtime = createSelectorRuntimeForDevice({ + ...params, + session, + device, + }); + return await toDaemonResponse(async () => { + const result = await runtime.selectors.wait({ + session: sessionName, + requestId: req.meta?.requestId, + target: toWaitTarget(parsed, session), + }); + recordIfSession(sessionStore, sessionName, req, result); + return toDaemonWaitData(result); + }); + }; + if (!waitNeedsRunnerCleanup(parsed)) return await execute(); + return await withSessionlessRunnerCleanup(session, device, execute); +} + +function createSelectorRuntimeForDevice(params: { + req: DaemonRequest; + sessionName: string; + logPath?: string; + sessionStore: SessionStore; + contextFromFlags?: ContextFromFlags; + session: SessionState | undefined; + device: SessionState['device']; +}) { + return createAgentDevice({ + backend: createSelectorBackend(params), + artifacts: { + resolveInput: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'selector commands do not resolve input artifacts', + ); + }, + reserveOutput: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'selector commands do not reserve output artifacts', + ); + }, + createTempFile: async () => { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'selector commands do not create temporary files', + ); + }, + }, + sessions: { + get: (name) => (name === params.sessionName ? toCommandSession(params.session) : undefined), + set: (record) => { + if (!params.session || !record.snapshot) return; + params.session.snapshot = record.snapshot; + params.sessionStore.set(params.sessionName, params.session); + }, + }, + policy: localCommandPolicy(), + }); +} + +async function createSelectorRuntime( + params: SelectorRuntimeParams, + options: { requireSession: boolean; capability: 'find' | 'get' | 'is' }, +): Promise< + | { ok: true; runtime: ReturnType } + | { ok: false; response: DaemonResponse } +> { + const session = params.sessionStore.get(params.sessionName); + if (!session && options.requireSession) { + return { + ok: false, + response: errorResponse('SESSION_NOT_FOUND', 'No active session. Run open first.'), + }; + } + const device = session?.device ?? (await resolveTargetDevice(params.req.flags ?? {})); + if (!session) await ensureDeviceReady(device); + if (!isCommandSupportedOnDevice(options.capability, device)) { + return { + ok: false, + response: errorResponse( + 'UNSUPPORTED_OPERATION', + `${options.capability} is not supported on this device`, + ), + }; + } + return { + ok: true, + runtime: createSelectorRuntimeForDevice({ + ...params, + session, + device, + }), + }; +} + +function createSelectorBackend(params: { + req: DaemonRequest; + sessionName: string; + logPath?: string; + sessionStore: SessionStore; + contextFromFlags?: ContextFromFlags; + session: SessionState | undefined; + device: SessionState['device']; +}): AgentDeviceBackend { + const { req, session, device, logPath, sessionName, sessionStore } = params; + let lastSnapshotAt = 0; + let lastSnapshotResult: BackendSnapshotResult | undefined; + return { + platform: device.platform, + captureSnapshot: async (_context, options): Promise => { + const flags = { + ...req.flags, + ...snapshotFlagOverrides(options), + }; + const snapshotScope = options?.scope ?? req.flags?.snapshotScope; + const timestamp = Date.now(); + if ( + lastSnapshotResult && + timestamp - lastSnapshotAt < 750 && + !getActiveAndroidSnapshotFreshness(session) + ) { + return lastSnapshotResult; + } + const capture = await captureSnapshot({ + device, + session, + flags, + outPath: req.flags?.out, + logPath: logPath ?? '', + snapshotScope, + }); + if (session) { + session.snapshot = capture.snapshot; + sessionStore.set(sessionName, session); + } + lastSnapshotAt = timestamp; + lastSnapshotResult = { snapshot: capture.snapshot }; + return lastSnapshotResult; + }, + readText: async (_context, node: SnapshotNode) => ({ + text: await readTextForNode({ + device, + node, + flags: req.flags, + appBundleId: session?.appBundleId, + traceOutPath: session?.trace?.outPath, + surface: session?.surface, + contextFromFlags: + params.contextFromFlags ?? + ((flags, appBundleId, traceLogPath) => + contextFromFlags(logPath ?? '', flags, appBundleId, traceLogPath)), + }), + }), + findText: async (_context, text) => ({ + found: await findText(params, text), + }), + }; +} + +function snapshotFlagOverrides(options: BackendSnapshotOptions | undefined): SnapshotFlagOverrides { + const flags: SnapshotFlagOverrides = {}; + if (options?.interactiveOnly !== undefined) + flags.snapshotInteractiveOnly = options.interactiveOnly; + if (options?.compact !== undefined) flags.snapshotCompact = options.compact; + if (options?.scope !== undefined) flags.snapshotScope = options.scope; + if (options?.depth !== undefined) flags.snapshotDepth = options.depth; + if (options?.raw !== undefined) flags.snapshotRaw = options.raw; + return flags; +} + +async function findText( + params: { + req: DaemonRequest; + sessionName: string; + logPath?: string; + sessionStore: SessionStore; + contextFromFlags?: ContextFromFlags; + session: SessionState | undefined; + device: SessionState['device']; + }, + text: string, +): Promise { + const { device, session, req, logPath } = params; + if (device.platform === 'macos' && session?.surface && session.surface !== 'app') { + const snapshot = await captureWaitSnapshot(params); + return Boolean(findNodeByLabel(snapshot.nodes, text)); + } + if (isApplePlatform(device.platform)) { + const result = (await runIosRunnerCommand( + device, + { command: 'findText', text, appBundleId: session?.appBundleId }, + { + verbose: req.flags?.verbose, + logPath, + traceLogPath: session?.trace?.outPath, + requestId: req.meta?.requestId, + }, + )) as { found?: boolean }; + return result?.found === true; + } + const snapshot = await captureWaitSnapshot(params); + return Boolean(findNodeByLabel(snapshot.nodes, text)); +} + +async function captureWaitSnapshot(params: { + req: DaemonRequest; + sessionName: string; + logPath?: string; + sessionStore: SessionStore; + contextFromFlags?: ContextFromFlags; + session: SessionState | undefined; + device: SessionState['device']; +}) { + const capture = await captureSnapshot({ + device: params.device, + session: params.session, + flags: { + ...params.req.flags, + snapshotInteractiveOnly: false, + snapshotCompact: false, + }, + outPath: params.req.flags?.out, + logPath: params.logPath ?? '', + }); + if (params.session) { + params.session.snapshot = capture.snapshot; + params.sessionStore.set(params.sessionName, params.session); + } + return capture.snapshot; +} + +function parseGetTarget(req: DaemonRequest): + | { + ok: true; + target: + | { kind: 'ref'; ref: string; fallbackLabel?: string } + | { kind: 'selector'; selector: string }; + } + | { ok: false; response: DaemonResponse } { + const refInput = req.positionals?.[1] ?? ''; + if (refInput.startsWith('@')) { + return { + ok: true, + target: { + kind: 'ref', + ref: refInput, + fallbackLabel: req.positionals.length > 2 ? req.positionals.slice(2).join(' ').trim() : '', + }, + }; + } + const selector = req.positionals?.slice(1).join(' ').trim() ?? ''; + if (!selector) { + return { + ok: false, + response: errorResponse('INVALID_ARGS', 'get requires @ref or selector expression'), + }; + } + return { ok: true, target: { kind: 'selector', selector } }; +} + +function toWaitTarget(parsed: WaitParsed, session: SessionState | undefined) { + if (parsed.kind === 'sleep') return { kind: 'sleep' as const, durationMs: parsed.durationMs }; + if (parsed.kind === 'selector') { + return { + kind: 'selector' as const, + selector: parsed.selectorExpression, + timeoutMs: parsed.timeoutMs, + }; + } + if (parsed.kind === 'ref') { + if (!session?.snapshot) { + throw new AppError('INVALID_ARGS', 'Ref wait requires an existing snapshot in session.'); + } + return { kind: 'ref' as const, ref: parsed.rawRef, timeoutMs: parsed.timeoutMs }; + } + if (!parsed.text) throw new AppError('INVALID_ARGS', 'wait requires text'); + return { kind: 'text' as const, text: parsed.text, timeoutMs: parsed.timeoutMs }; +} + +async function toDaemonResponse( + task: () => Promise>, +): Promise { + try { + return { ok: true, data: await task() }; + } catch (error) { + const appError = asAppError(error); + return errorResponse(appError.code, appError.message, appError.details); + } +} + +function toCommandSession(session: SessionState | undefined) { + if (!session) return undefined; + return { + name: session.name, + appName: session.appName, + appBundleId: session.appBundleId, + snapshot: session.snapshot, + metadata: { surface: session.surface }, + }; +} + +function isReadOnlyFindAction( + action: FindAction['kind'], +): action is 'exists' | 'wait' | 'get_text' | 'get_attrs' { + return ( + action === 'exists' || action === 'wait' || action === 'get_text' || action === 'get_attrs' + ); +} diff --git a/src/daemon/selectors-build.ts b/src/daemon/selectors-build.ts index abbcf847..0c645db1 100644 --- a/src/daemon/selectors-build.ts +++ b/src/daemon/selectors-build.ts @@ -1,79 +1 @@ -import type { Platform } from '../utils/device.ts'; -import type { SnapshotNode } from '../utils/snapshot.ts'; -import { extractNodeText, normalizeType } from './snapshot-processing.ts'; -import { uniqueStrings } from './action-utils.ts'; -import { isNodeVisible } from './selectors-match.ts'; - -export function buildSelectorChainForNode( - node: SnapshotNode, - _platform: Platform, - options: { action?: 'click' | 'fill' | 'get' } = {}, -): string[] { - const chain: string[] = []; - const role = normalizeType(node.type ?? ''); - const id = normalizeSelectorText(node.identifier); - const label = normalizeSelectorText(node.label); - const value = normalizeSelectorText(node.value); - const text = normalizeSelectorText(extractNodeText(node)); - const requireEditable = options.action === 'fill'; - - if (id) { - chain.push(`id=${quoteSelectorValue(id)}`); - } - if (role && label) { - chain.push( - requireEditable - ? `role=${quoteSelectorValue(role)} label=${quoteSelectorValue(label)} editable=true` - : `role=${quoteSelectorValue(role)} label=${quoteSelectorValue(label)}`, - ); - } - if (label) { - chain.push( - requireEditable - ? `label=${quoteSelectorValue(label)} editable=true` - : `label=${quoteSelectorValue(label)}`, - ); - } - if (value) { - chain.push( - requireEditable - ? `value=${quoteSelectorValue(value)} editable=true` - : `value=${quoteSelectorValue(value)}`, - ); - } - if (text && text !== label && text !== value) { - chain.push( - requireEditable - ? `text=${quoteSelectorValue(text)} editable=true` - : `text=${quoteSelectorValue(text)}`, - ); - } - if (role && requireEditable && !chain.some((entry) => entry.includes('editable=true'))) { - chain.push(`role=${quoteSelectorValue(role)} editable=true`); - } - - const deduped = uniqueStrings(chain); - if (deduped.length === 0 && role) { - deduped.push( - requireEditable - ? `role=${quoteSelectorValue(role)} editable=true` - : `role=${quoteSelectorValue(role)}`, - ); - } - if (deduped.length === 0) { - const visible = isNodeVisible(node); - if (visible) deduped.push('visible=true'); - } - return deduped; -} - -function quoteSelectorValue(value: string): string { - return JSON.stringify(value); -} - -function normalizeSelectorText(value: string | undefined): string | null { - if (!value) return null; - const trimmed = value.trim(); - if (!trimmed) return null; - return trimmed; -} +export * from '../utils/selector-build.ts'; diff --git a/src/daemon/selectors-match.ts b/src/daemon/selectors-match.ts index 41c6a1c4..fc12cd15 100644 --- a/src/daemon/selectors-match.ts +++ b/src/daemon/selectors-match.ts @@ -1,9 +1,12 @@ import type { Platform } from '../utils/device.ts'; import type { SnapshotNode } from '../utils/snapshot.ts'; -import { extractNodeText, isFillableType, normalizeType } from './snapshot-processing.ts'; +import { isNodeEditable, isNodeVisible } from '../utils/selector-node.ts'; +import { extractNodeText, normalizeType } from '../utils/snapshot-processing.ts'; import { normalizeText } from '../utils/finders.ts'; import type { Selector, SelectorTerm } from './selectors-parse.ts'; +export { isNodeEditable, isNodeVisible } from '../utils/selector-node.ts'; + export function matchesSelector( node: SnapshotNode, selector: Selector, @@ -12,16 +15,6 @@ export function matchesSelector( return selector.terms.every((term) => matchesTerm(node, term, platform)); } -export function isNodeVisible(node: SnapshotNode): boolean { - if (node.hittable === true) return true; - if (!node.rect) return false; - return node.rect.width > 0 && node.rect.height > 0; -} - -export function isNodeEditable(node: SnapshotNode, platform: Platform): boolean { - return isFillableType(node.type ?? '', platform) && node.enabled !== false; -} - function matchesTerm(node: SnapshotNode, term: SelectorTerm, platform: Platform): boolean { switch (term.key) { case 'id': diff --git a/src/daemon/snapshot-diff.ts b/src/daemon/snapshot-diff.ts index 0d9cfdf3..15c4fd9f 100644 --- a/src/daemon/snapshot-diff.ts +++ b/src/daemon/snapshot-diff.ts @@ -1,158 +1 @@ -import type { SnapshotNode } from '../utils/snapshot.ts'; -import { - buildSnapshotDisplayLines, - displayLabel, - formatRole, - formatSnapshotLine, -} from '../utils/snapshot-lines.ts'; - -type SnapshotDiffLine = { - kind: 'added' | 'removed' | 'unchanged'; - text: string; -}; - -type SnapshotDiffSummary = { - additions: number; - removals: number; - unchanged: number; -}; - -type SnapshotDiffResult = { - summary: SnapshotDiffSummary; - lines: SnapshotDiffLine[]; -}; - -type SnapshotDiffOptions = { - flatten?: boolean; -}; - -type SnapshotComparableLine = { - text: string; - comparable: string; -}; - -function snapshotNodeToComparableLine(node: SnapshotNode, depthOverride?: number): string { - const role = formatRole(node.type ?? 'Element'); - const textPart = displayLabel(node, role); - const enabledPart = node.enabled === false ? 'disabled' : 'enabled'; - const selectedPart = node.selected === true ? 'selected' : 'unselected'; - const hittablePart = node.hittable === true ? 'hittable' : 'not-hittable'; - const depthPart = String(depthOverride ?? node.depth ?? 0); - return [depthPart, role, textPart, enabledPart, selectedPart, hittablePart].join('|'); -} - -export function buildSnapshotDiff( - previousNodes: SnapshotNode[], - currentNodes: SnapshotNode[], - options: SnapshotDiffOptions = {}, -): SnapshotDiffResult { - const previous = snapshotNodesToLines(previousNodes, options); - const current = snapshotNodesToLines(currentNodes, options); - const lines = diffComparableLinesMyers(previous, current); - const summary: SnapshotDiffSummary = { additions: 0, removals: 0, unchanged: 0 }; - for (const line of lines) { - if (line.kind === 'added') summary.additions += 1; - if (line.kind === 'removed') summary.removals += 1; - if (line.kind === 'unchanged') summary.unchanged += 1; - } - return { summary, lines }; -} - -export function countSnapshotComparableLines( - nodes: SnapshotNode[], - options: SnapshotDiffOptions = {}, -): number { - return snapshotNodesToLines(nodes, options).length; -} - -function snapshotNodesToLines( - nodes: SnapshotNode[], - options: SnapshotDiffOptions, -): SnapshotComparableLine[] { - if (options.flatten) { - return nodes.map((node) => ({ - text: formatSnapshotLine(node, 0, false), - comparable: snapshotNodeToComparableLine(node, 0), - })); - } - return buildSnapshotDisplayLines(nodes).map((line) => ({ - text: line.text, - comparable: snapshotNodeToComparableLine(line.node, line.depth), - })); -} - -function diffComparableLinesMyers( - previous: SnapshotComparableLine[], - current: SnapshotComparableLine[], -): SnapshotDiffLine[] { - // Myers diff is efficient for normal UI snapshots; very large trees may still be expensive. - const n = previous.length; - const m = current.length; - const max = n + m; - const v = new Map(); - const trace: Array> = []; - v.set(1, 0); - - for (let d = 0; d <= max; d += 1) { - trace.push(new Map(v)); - for (let k = -d; k <= d; k += 2) { - const goDown = k === -d || (k !== d && getV(v, k - 1) < getV(v, k + 1)); - let x = goDown ? getV(v, k + 1) : getV(v, k - 1) + 1; - let y = x - k; - while (x < n && y < m && previous[x].comparable === current[y].comparable) { - x += 1; - y += 1; - } - v.set(k, x); - if (x >= n && y >= m) { - return backtrackMyers(trace, previous, current, n, m); - } - } - } - - return []; -} - -function backtrackMyers( - trace: Array>, - previous: SnapshotComparableLine[], - current: SnapshotComparableLine[], - n: number, - m: number, -): SnapshotDiffLine[] { - const lines: SnapshotDiffLine[] = []; - let x = n; - let y = m; - - for (let d = trace.length - 1; d >= 0; d -= 1) { - const v = trace[d]; - const k = x - y; - const goDown = k === -d || (k !== d && getV(v, k - 1) < getV(v, k + 1)); - const prevK = goDown ? k + 1 : k - 1; - const prevX = getV(v, prevK); - const prevY = prevX - prevK; - - while (x > prevX && y > prevY) { - lines.push({ kind: 'unchanged', text: current[y - 1].text }); - x -= 1; - y -= 1; - } - - if (d === 0) break; - - if (x === prevX) { - lines.push({ kind: 'added', text: current[prevY].text }); - y = prevY; - } else { - lines.push({ kind: 'removed', text: previous[prevX].text }); - x = prevX; - } - } - - lines.reverse(); - return lines; -} - -function getV(v: Map, k: number): number { - return v.get(k) ?? 0; -} +export { buildSnapshotDiff, countSnapshotComparableLines } from '../utils/snapshot-diff.ts'; diff --git a/src/daemon/snapshot-processing.ts b/src/daemon/snapshot-processing.ts index f5381542..abb5b7cb 100644 --- a/src/daemon/snapshot-processing.ts +++ b/src/daemon/snapshot-processing.ts @@ -1,138 +1 @@ -import type { Platform } from '../utils/device.ts'; -import type { RawSnapshotNode, SnapshotState } from '../utils/snapshot.ts'; -import { extractReadableText } from '../utils/text-surface.ts'; - -export function findNodeByLabel(nodes: SnapshotState['nodes'], label: string) { - const query = label.toLowerCase(); - return ( - nodes.find((node) => { - const labelValue = (node.label ?? '').toLowerCase(); - const valueValue = (node.value ?? '').toLowerCase(); - const idValue = (node.identifier ?? '').toLowerCase(); - return labelValue.includes(query) || valueValue.includes(query) || idValue.includes(query); - }) ?? null - ); -} - -export function resolveRefLabel( - node: SnapshotState['nodes'][number], - nodes: SnapshotState['nodes'], -): string | undefined { - const primary = [node.label, node.value, node.identifier] - .map((value) => (typeof value === 'string' ? value.trim() : '')) - .find((value) => value && value.length > 0); - if (primary && isMeaningfulLabel(primary)) return primary; - const fallback = findNearestMeaningfulLabel(node, nodes); - return fallback ?? (primary && isMeaningfulLabel(primary) ? primary : undefined); -} - -function isMeaningfulLabel(value: string): boolean { - const trimmed = value.trim(); - if (!trimmed) return false; - if (/^(true|false)$/i.test(trimmed)) return false; - if (/^\d+$/.test(trimmed)) return false; - return true; -} - -function findNearestMeaningfulLabel( - target: SnapshotState['nodes'][number], - nodes: SnapshotState['nodes'], -): string | undefined { - if (!target.rect) return undefined; - const targetY = target.rect.y + target.rect.height / 2; - let best: { label: string; distance: number } | null = null; - for (const node of nodes) { - if (!node.rect) continue; - const label = [node.label, node.value, node.identifier] - .map((value) => (typeof value === 'string' ? value.trim() : '')) - .find((value) => value && value.length > 0); - if (!label || !isMeaningfulLabel(label)) continue; - const nodeY = node.rect.y + node.rect.height / 2; - const distance = Math.abs(nodeY - targetY); - if (!best || distance < best.distance) { - best = { label, distance }; - } - } - return best?.label; -} - -export function pruneGroupNodes(nodes: RawSnapshotNode[]): RawSnapshotNode[] { - const skippedDepths: number[] = []; - const result: RawSnapshotNode[] = []; - for (const node of nodes) { - const depth = node.depth ?? 0; - while (skippedDepths.length > 0 && depth <= skippedDepths[skippedDepths.length - 1]) { - skippedDepths.pop(); - } - const type = normalizeType(node.type ?? ''); - const labelCandidate = [node.label, node.value, node.identifier] - .map((value) => (typeof value === 'string' ? value.trim() : '')) - .find((value) => value && value.length > 0); - const hasMeaningfulLabel = labelCandidate ? isMeaningfulLabel(labelCandidate) : false; - if ((type === 'group' || type === 'ioscontentgroup') && !hasMeaningfulLabel) { - skippedDepths.push(depth); - continue; - } - const adjustedDepth = Math.max(0, depth - skippedDepths.length); - result.push({ ...node, depth: adjustedDepth }); - } - return result; -} - -export function normalizeType(type: string): string { - let value = type.trim().replace(/XCUIElementType/gi, ''); - if (value.startsWith('AX')) { - value = value.slice(2); - } - value = value.toLowerCase(); - const lastSeparator = Math.max(value.lastIndexOf('.'), value.lastIndexOf('/')); - if (lastSeparator !== -1) { - value = value.slice(lastSeparator + 1); - } - return value; -} - -export function isFillableType(type: string, platform: Platform): boolean { - const normalized = normalizeType(type); - if (!normalized) return true; - if (platform === 'android') { - return normalized.includes('edittext') || normalized.includes('autocompletetextview'); - } - return ( - normalized.includes('textfield') || - normalized.includes('securetextfield') || - normalized.includes('searchfield') || - normalized.includes('textview') || - normalized.includes('textarea') || - normalized === 'search' - ); -} - -export function findNearestHittableAncestor( - nodes: SnapshotState['nodes'], - node: SnapshotState['nodes'][number], -): SnapshotState['nodes'][number] | null { - if (node.hittable) return node; - let current = node; - const visited = new Set(); - while (current.parentIndex !== undefined) { - if (visited.has(current.ref)) break; - visited.add(current.ref); - const parent = nodes[current.parentIndex]; - if (!parent) break; - if (parent.hittable) return parent; - current = parent; - } - return null; -} - -export function extractNodeText(node: SnapshotState['nodes'][number]): string { - const candidates = [node.label, node.value, node.identifier] - .map((value) => (typeof value === 'string' ? value.trim() : '')) - .filter((value) => value.length > 0); - return candidates[0] ?? ''; -} - -export function extractNodeReadText(node: SnapshotState['nodes'][number]): string { - return extractReadableText(node); -} +export * from '../utils/snapshot-processing.ts'; diff --git a/src/daemon/snapshot-runtime.ts b/src/daemon/snapshot-runtime.ts new file mode 100644 index 00000000..689fd862 --- /dev/null +++ b/src/daemon/snapshot-runtime.ts @@ -0,0 +1,224 @@ +import type { AgentDeviceBackend, BackendSnapshotResult } from '../backend.ts'; +import type { CommandSessionRecord } from '../runtime.ts'; +import { createAgentDevice, localCommandPolicy } from '../runtime.ts'; +import { isCommandSupportedOnDevice } from '../core/capabilities.ts'; +import { AppError } from '../utils/errors.ts'; +import type { DaemonRequest, DaemonResponse, SessionState } from './types.ts'; +import { SessionStore } from './session-store.ts'; +import { errorResponse } from './handlers/response.ts'; +import { captureSnapshot, resolveSnapshotScope } from './handlers/snapshot-capture.ts'; +import { + buildSnapshotSession, + resolveSessionDevice, + withSessionlessRunnerCleanup, +} from './handlers/snapshot-session.ts'; + +export async function dispatchSnapshotViaRuntime(params: { + req: DaemonRequest; + sessionName: string; + logPath: string; + sessionStore: SessionStore; +}): Promise { + const { req, sessionName, logPath, sessionStore } = params; + const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); + if (!isCommandSupportedOnDevice('snapshot', device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'snapshot is not supported on this device'); + } + const resolvedScope = resolveSnapshotScope(req.flags?.snapshotScope, session); + if (!resolvedScope.ok) return resolvedScope; + + return await withSessionlessRunnerCleanup(session, device, async () => { + const runtime = createSnapshotRuntime({ + req, + sessionName, + logPath, + sessionStore, + session, + device, + snapshotScope: resolvedScope.scope, + }); + const result = await runtime.capture.snapshot({ + session: sessionName, + interactiveOnly: req.flags?.snapshotInteractiveOnly, + compact: req.flags?.snapshotCompact, + depth: req.flags?.snapshotDepth, + scope: resolvedScope.scope, + raw: req.flags?.snapshotRaw, + }); + recordSnapshotRuntimeAction({ + req, + sessionName, + sessionStore, + result: { + nodes: result.nodes.length, + truncated: result.truncated, + }, + }); + return { + ok: true, + data: result, + }; + }); +} + +export async function dispatchSnapshotDiffViaRuntime(params: { + req: DaemonRequest; + sessionName: string; + logPath: string; + sessionStore: SessionStore; +}): Promise { + const { req, sessionName, logPath, sessionStore } = params; + const { session, device } = await resolveSessionDevice(sessionStore, sessionName, req.flags); + if (!isCommandSupportedOnDevice('diff', device)) { + return errorResponse('UNSUPPORTED_OPERATION', 'diff is not supported on this device'); + } + const resolvedScope = resolveSnapshotScope(req.flags?.snapshotScope, session); + if (!resolvedScope.ok) return resolvedScope; + + return await withSessionlessRunnerCleanup(session, device, async () => { + const runtime = createSnapshotRuntime({ + req, + sessionName, + logPath, + sessionStore, + session, + device, + snapshotScope: resolvedScope.scope, + }); + const result = await runtime.capture.diffSnapshot({ + session: sessionName, + interactiveOnly: req.flags?.snapshotInteractiveOnly, + compact: req.flags?.snapshotCompact, + depth: req.flags?.snapshotDepth, + scope: resolvedScope.scope, + raw: req.flags?.snapshotRaw, + }); + recordSnapshotRuntimeAction({ + req, + sessionName, + sessionStore, + result: { + mode: 'snapshot', + baselineInitialized: result.baselineInitialized, + summary: result.summary, + }, + }); + return { + ok: true, + data: result, + }; + }); +} + +function createSnapshotRuntime(params: { + req: DaemonRequest; + sessionName: string; + logPath: string; + sessionStore: SessionStore; + session: SessionState | undefined; + device: SessionState['device']; + snapshotScope: string | undefined; +}) { + const { req, sessionName, logPath, sessionStore, session, device, snapshotScope } = params; + return createAgentDevice({ + backend: createDaemonSnapshotBackend({ + req, + logPath, + session, + device, + snapshotScope, + }), + artifacts: { + resolveInput: async () => { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot does not resolve input artifacts'); + }, + reserveOutput: async () => { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot does not reserve output artifacts'); + }, + createTempFile: async () => { + throw new AppError('UNSUPPORTED_OPERATION', 'snapshot does not create temporary files'); + }, + }, + sessions: { + get: (name) => + name === sessionName ? toCommandSessionRecord(sessionStore.get(sessionName)) : undefined, + set: (record) => { + if (!record.snapshot) { + throw new AppError('UNKNOWN', 'snapshot runtime did not produce session state'); + } + const current = sessionStore.get(sessionName); + const nextSession = buildSnapshotSession({ + session: current, + sessionName, + device, + snapshot: record.snapshot, + appBundleId: record.appBundleId, + }); + if (record.appName) nextSession.appName = record.appName; + sessionStore.set(sessionName, nextSession); + }, + }, + policy: localCommandPolicy(), + }); +} + +function createDaemonSnapshotBackend(params: { + req: DaemonRequest; + logPath: string; + session: SessionState | undefined; + device: SessionState['device']; + snapshotScope: string | undefined; +}): AgentDeviceBackend { + const { req, logPath, session, device, snapshotScope } = params; + return { + platform: device.platform, + captureSnapshot: async (_context, options): Promise => { + const capture = await captureSnapshot({ + device, + session, + flags: req.flags, + outPath: options?.outPath ?? req.flags?.out, + logPath, + snapshotScope, + }); + return { + snapshot: capture.snapshot, + analysis: capture.analysis, + freshness: capture.freshness, + appName: session?.appBundleId ? (session.appName ?? session.appBundleId) : undefined, + appBundleId: session?.appBundleId, + }; + }, + }; +} + +function toCommandSessionRecord( + session: SessionState | undefined, +): CommandSessionRecord | undefined { + if (!session) return undefined; + return { + name: session.name, + appBundleId: session.appBundleId, + appName: session.appName, + snapshot: session.snapshot, + metadata: { + surface: session.surface, + }, + }; +} + +function recordSnapshotRuntimeAction(params: { + req: DaemonRequest; + sessionName: string; + sessionStore: SessionStore; + result: Record; +}): void { + const session = params.sessionStore.get(params.sessionName); + if (!session) return; + params.sessionStore.recordAction(session, { + command: params.req.command, + positionals: params.req.positionals ?? [], + flags: params.req.flags ?? {}, + result: params.result, + }); +} diff --git a/src/index.ts b/src/index.ts index 60258b5f..e1594fed 100644 --- a/src/index.ts +++ b/src/index.ts @@ -1,7 +1,85 @@ +export { + assertBackendCapabilityAllowed, + createAgentDevice, + createMemorySessionStore, + localCommandPolicy, + restrictedCommandPolicy, +} from './runtime.ts'; export { createAgentDeviceClient } from './client.ts'; -export { AppError } from './utils/errors.ts'; +export { createLocalArtifactAdapter } from './io.ts'; +export { commandCatalog, commands, createCommandRouter, ref, selector } from './commands/index.ts'; +export { AppError, isAgentDeviceError, normalizeAgentDeviceError } from './utils/errors.ts'; export { centerOfRect } from './utils/snapshot.ts'; +export type { + AgentDevice, + AgentDeviceRuntime, + AgentDeviceRuntimeConfig, + CommandClock, + CommandContext, + CommandPolicy, + CommandSessionRecord, + CommandSessionStore, + DiagnosticsSink, +} from './runtime.ts'; + +export type { + AgentDeviceBackend, + AgentDeviceBackendPlatform, + BackendActionResult, + BackendCapabilityName, + BackendCapabilitySet, + BackendCommandContext, + BackendEscapeHatches, + BackendFillOptions, + BackendInstallTarget, + BackendFindTextResult, + BackendOpenTarget, + BackendReadTextResult, + BackendRunnerCommand, + BackendSnapshotAnalysis, + BackendSnapshotFreshness, + BackendSnapshotOptions, + BackendScreenshotOptions, + BackendScreenshotResult, + BackendShellResult, + BackendSnapshotResult, + BackendTapOptions, +} from './backend.ts'; + +export type { + ArtifactAdapter, + ArtifactDescriptor, + CreateTempFileOptions, + FileInputRef, + FileOutputRef, + LocalArtifactAdapterOptions, + OutputVisibility, + ReserveOutputOptions, + ReservedOutputFile, + ResolveInputOptions, + ResolvedInputFile, + TemporaryFile, +} from './io.ts'; + +export type { + BoundAgentDeviceCommands, + BoundRuntimeCommand, + CommandCatalogEntry, + CommandResult, + CommandRouter, + CommandRouterConfig, + CommandRouterRequest, + CommandRouterResponse, + CommandRouterResult, + RuntimeCommand, + SelectorSnapshotOptions, + TypeTextCommandOptions, + TypeTextCommandResult, +} from './commands/index.ts'; + +export type { AppErrorCode, NormalizedError } from './utils/errors.ts'; + export type { AgentDeviceClient, AgentDeviceClientConfig, diff --git a/src/io.ts b/src/io.ts new file mode 100644 index 00000000..98924b9c --- /dev/null +++ b/src/io.ts @@ -0,0 +1,175 @@ +import { promises as fs } from 'node:fs'; +import os from 'node:os'; +import path from 'node:path'; +import { AppError } from './utils/errors.ts'; + +export type FileInputRef = + | { + kind: 'path'; + path: string; + } + | { + kind: 'uploadedArtifact'; + id: string; + }; + +export type FileOutputRef = + | { + kind: 'path'; + path: string; + } + | { + kind: 'downloadableArtifact'; + clientPath?: string; + fileName?: string; + }; + +export type ArtifactDescriptor = + | { + kind: 'localPath'; + field: string; + path: string; + fileName?: string; + metadata?: Record; + } + | { + kind: 'artifact'; + field: string; + artifactId: string; + fileName?: string; + url?: string; + clientPath?: string; + metadata?: Record; + }; + +export type OutputVisibility = 'client-visible' | 'internal'; + +export type ResolvedInputFile = { + path: string; + cleanup?: () => Promise; +}; + +export type ReservedOutputFile = { + path: string; + visibility: OutputVisibility; + publish: () => Promise; + cleanup?: () => Promise; +}; + +export type TemporaryFile = { + path: string; + visibility: 'internal'; + cleanup: () => Promise; +}; + +export type ResolveInputOptions = { + usage: string; + field?: string; +}; + +export type ReserveOutputOptions = { + field: string; + ext: string; + requestedClientPath?: string; + visibility?: OutputVisibility; +}; + +export type CreateTempFileOptions = { + prefix: string; + ext: string; +}; + +export type ArtifactAdapter = { + resolveInput(ref: FileInputRef, options: ResolveInputOptions): Promise; + reserveOutput( + ref: FileOutputRef | undefined, + options: ReserveOutputOptions, + ): Promise; + createTempFile(options: CreateTempFileOptions): Promise; +}; + +export type LocalArtifactAdapterOptions = { + cwd?: string; + tempDir?: string; + rootDir?: string; +}; + +export function createLocalArtifactAdapter( + options: LocalArtifactAdapterOptions = {}, +): ArtifactAdapter { + const cwd = options.cwd ?? process.cwd(); + const tempDir = options.tempDir ?? os.tmpdir(); + const rootDir = options.rootDir ? resolveLocalPath(options.rootDir, cwd) : undefined; + + return { + resolveInput: async (ref) => { + if (ref.kind === 'uploadedArtifact') { + throw new AppError( + 'UNSUPPORTED_OPERATION', + 'Uploaded artifact inputs require a custom artifact adapter', + ); + } + return { path: resolveLocalPath(ref.path, cwd, rootDir) }; + }, + reserveOutput: async (ref, outputOptions) => { + let tempRoot: string | undefined; + const visibility = outputOptions.visibility ?? 'client-visible'; + const outputPath = + ref?.kind === 'path' + ? resolveLocalPath(ref.path, cwd, rootDir) + : path.join( + (tempRoot = await fs.mkdtemp( + path.join(tempDir, `agent-device-${outputOptions.field}-`), + )), + `${outputOptions.field}${outputOptions.ext}`, + ); + await fs.mkdir(path.dirname(outputPath), { recursive: true }); + return { + path: outputPath, + visibility, + ...(tempRoot + ? { + cleanup: async () => { + await fs.rm(tempRoot, { recursive: true, force: true }); + }, + } + : {}), + publish: async () => + ref?.kind === 'downloadableArtifact' + ? { + kind: 'localPath', + field: outputOptions.field, + path: outputPath, + fileName: ref.fileName ?? path.basename(ref.clientPath ?? outputPath), + } + : undefined, + }; + }, + createTempFile: async (tempOptions) => { + const root = await fs.mkdtemp(path.join(tempDir, `${tempOptions.prefix}-`)); + return { + path: path.join(root, `file${tempOptions.ext}`), + visibility: 'internal', + cleanup: async () => { + await fs.rm(root, { recursive: true, force: true }); + }, + }; + }, + }; +} + +function resolveLocalPath(filePath: string, cwd: string, rootDir?: string): string { + const resolvedPath = path.isAbsolute(filePath) ? filePath : path.resolve(cwd, filePath); + if (rootDir && !isPathInside(resolvedPath, rootDir)) { + throw new AppError('INVALID_ARGS', 'Local path is outside the artifact adapter root', { + path: resolvedPath, + rootDir, + }); + } + return resolvedPath; +} + +function isPathInside(filePath: string, rootDir: string): boolean { + const relative = path.relative(rootDir, filePath); + return relative === '' || (!relative.startsWith('..') && !path.isAbsolute(relative)); +} diff --git a/src/runtime.ts b/src/runtime.ts new file mode 100644 index 00000000..3e62e643 --- /dev/null +++ b/src/runtime.ts @@ -0,0 +1,189 @@ +import { + hasBackendEscapeHatch, + hasBackendCapability, + type AgentDeviceBackend, + type BackendCapabilityName, +} from './backend.ts'; +import type { ArtifactAdapter } from './io.ts'; +import type { SnapshotState } from './utils/snapshot.ts'; +import { AppError } from './utils/errors.ts'; +import { bindCommands, type BoundAgentDeviceCommands } from './commands/index.ts'; + +export type CommandPolicy = { + allowLocalInputPaths: boolean; + allowLocalOutputPaths: boolean; + maxImagePixels: number; + allowNamedBackendCapabilities: readonly BackendCapabilityName[]; +}; + +export type CommandSessionRecord = { + name: string; + appId?: string; + appBundleId?: string; + appName?: string; + backendSessionId?: string; + snapshot?: SnapshotState; + metadata?: Record; +}; + +// Runtime commands can read and then write the same session. CommandSessionStore +// implementations that are shared across concurrent callers should serialize +// per-session updates, or route commands through a transport that already does. +export type CommandSessionStore = { + get(name: string): CommandSessionRecord | undefined | Promise; + set(record: CommandSessionRecord): void | Promise; + delete?(name: string): void | Promise; + list?(): readonly CommandSessionRecord[] | Promise; +}; + +export type CommandContext = { + session?: string; + requestId?: string; + signal?: AbortSignal; + metadata?: Record; +}; + +export type DiagnosticsSink = { + emit(event: { + level: 'debug' | 'info' | 'warn' | 'error'; + message: string; + data?: unknown; + }): void; +}; + +export type CommandClock = { + now(): number; + sleep(ms: number): Promise; +}; + +export type AgentDeviceRuntime = { + backend: AgentDeviceBackend; + artifacts: ArtifactAdapter; + sessions: CommandSessionStore; + policy: CommandPolicy; + diagnostics?: DiagnosticsSink; + clock?: CommandClock; + signal?: AbortSignal; +}; + +export type AgentDeviceRuntimeConfig = { + backend: AgentDeviceBackend; + artifacts: ArtifactAdapter; + sessions?: CommandSessionStore; + policy?: CommandPolicy; + diagnostics?: DiagnosticsSink; + clock?: CommandClock; + signal?: AbortSignal; +}; + +export type AgentDevice = AgentDeviceRuntime & BoundAgentDeviceCommands; + +export function createAgentDevice(config: AgentDeviceRuntimeConfig): AgentDevice { + const runtime: AgentDeviceRuntime = { + backend: config.backend, + artifacts: config.artifacts, + sessions: config.sessions ?? createMemorySessionStore(), + policy: config.policy ?? restrictedCommandPolicy(), + diagnostics: config.diagnostics, + clock: config.clock, + signal: config.signal, + }; + + return { + ...runtime, + ...bindCommands(runtime), + }; +} + +export function createMemorySessionStore( + records: readonly CommandSessionRecord[] = [], +): CommandSessionStore { + const sessions = new Map( + records.map((record) => [record.name, cloneDefinedSessionRecord(record)]), + ); + return { + get: (name) => cloneSessionRecord(sessions.get(name)), + set: (record) => { + sessions.set(record.name, cloneSessionRecord(record)); + }, + delete: (name) => { + sessions.delete(name); + }, + list: () => Array.from(sessions.values(), cloneDefinedSessionRecord), + }; +} + +function cloneDefinedSessionRecord(record: CommandSessionRecord): CommandSessionRecord { + return cloneSessionRecord(record); +} + +function cloneSessionRecord(record: CommandSessionRecord): CommandSessionRecord; +function cloneSessionRecord(record: undefined): undefined; +function cloneSessionRecord( + record: CommandSessionRecord | undefined, +): CommandSessionRecord | undefined; +function cloneSessionRecord( + record: CommandSessionRecord | undefined, +): CommandSessionRecord | undefined { + if (!record) return undefined; + return { + ...record, + ...(record.snapshot ? { snapshot: structuredClone(record.snapshot) } : {}), + ...(record.metadata ? { metadata: cloneMetadata(record.metadata) } : {}), + }; +} + +function cloneMetadata(metadata: Record): Record { + try { + return structuredClone(metadata) as Record; + } catch { + return { ...metadata }; + } +} + +export function localCommandPolicy(overrides: Partial = {}): CommandPolicy { + return { + allowLocalInputPaths: true, + allowLocalOutputPaths: true, + maxImagePixels: 20_000_000, + allowNamedBackendCapabilities: [], + ...overrides, + }; +} + +export function restrictedCommandPolicy(overrides: Partial = {}): CommandPolicy { + return { + allowLocalInputPaths: false, + allowLocalOutputPaths: false, + maxImagePixels: 20_000_000, + allowNamedBackendCapabilities: [], + ...overrides, + }; +} + +export function assertBackendCapabilityAllowed( + runtime: Pick, + capability: BackendCapabilityName, +): void { + if (!hasBackendCapability(runtime.backend, capability)) { + throw new AppError( + 'UNSUPPORTED_OPERATION', + `Backend capability ${capability} is not supported by this backend`, + { capability }, + ); + } + if (!runtime.policy.allowNamedBackendCapabilities.includes(capability)) { + throw new AppError( + 'UNSUPPORTED_OPERATION', + `Backend capability ${capability} is not allowed by command policy`, + { capability }, + ); + } + if (!hasBackendEscapeHatch(runtime.backend, capability)) { + throw new AppError( + 'UNSUPPORTED_OPERATION', + `Backend capability ${capability} does not implement its escape hatch method`, + { capability }, + ); + } +} diff --git a/src/testing/conformance.ts b/src/testing/conformance.ts new file mode 100644 index 00000000..66ab587f --- /dev/null +++ b/src/testing/conformance.ts @@ -0,0 +1,289 @@ +import assert from 'node:assert/strict'; +import type { Point } from '../utils/snapshot.ts'; +import type { AgentDeviceRuntime } from '../runtime.ts'; +import { commands, selector, type InteractionTarget } from '../commands/index.ts'; + +export type ConformanceRuntimeFactory = () => AgentDeviceRuntime | Promise; + +export type CommandConformanceFixtures = { + session: string; + visibleSelector: string; + visibleText: string; + editableTarget: InteractionTarget; + fillText: string; + point: Point; +}; + +export type CommandConformanceTarget = { + name: string; + createRuntime: ConformanceRuntimeFactory; + fixtures?: Partial; + beforeEach?(context: CommandConformanceCaseContext): void | Promise; + afterEach?(context: CommandConformanceCaseContext): void | Promise; +}; + +export type CommandConformanceCaseContext = { + suite: string; + caseName: string; + fixtures: CommandConformanceFixtures; +}; + +export type CommandConformanceCase = { + name: string; + command: string; + run(runtime: AgentDeviceRuntime, fixtures: CommandConformanceFixtures): Promise; +}; + +export type CommandConformanceSuiteResult = { + suite: string; + passed: number; + failed: number; + failures: CommandConformanceFailure[]; +}; + +export type CommandConformanceFailure = { + suite: string; + caseName: string; + command: string; + error: unknown; +}; + +export type CommandConformanceReport = { + target: string; + passed: number; + failed: number; + failures: CommandConformanceFailure[]; + suites: CommandConformanceSuiteResult[]; +}; + +export type CommandConformanceSuite = { + name: string; + cases: readonly CommandConformanceCase[]; + run(target: CommandConformanceTarget): Promise; +}; + +export const defaultCommandConformanceFixtures: CommandConformanceFixtures = { + session: 'default', + visibleSelector: 'label=Continue', + visibleText: 'Continue', + editableTarget: selector('label=Email'), + fillText: 'hello@example.com', + point: { x: 4, y: 8 }, +}; + +export const captureConformanceSuite = createCommandConformanceSuite({ + name: 'capture', + cases: [ + { + name: 'captures screenshots through the backend primitive', + command: 'capture.screenshot', + run: async (runtime, fixtures) => { + const result = await commands.capture.screenshot(runtime, { + session: fixtures.session, + }); + assert.equal(typeof result.path, 'string'); + assert.ok(result.path.length > 0); + }, + }, + { + name: 'captures snapshots with nodes', + command: 'capture.snapshot', + run: async (runtime, fixtures) => { + const result = await commands.capture.snapshot(runtime, { + session: fixtures.session, + }); + assert.ok(Array.isArray(result.nodes)); + }, + }, + ], +}); + +export const selectorConformanceSuite = createCommandConformanceSuite({ + name: 'selectors', + cases: [ + { + name: 'finds visible text', + command: 'selectors.find', + run: async (runtime, fixtures) => { + const result = await commands.selectors.find(runtime, { + session: fixtures.session, + query: fixtures.visibleText, + action: 'exists', + }); + assert.equal(result.kind, 'found'); + assert.equal(result.found, true); + }, + }, + { + name: 'reads text from a selector', + command: 'selectors.getText', + run: async (runtime, fixtures) => { + const result = await commands.selectors.getText(runtime, { + session: fixtures.session, + target: selector(fixtures.visibleSelector), + }); + assert.equal(result.kind, 'text'); + assert.equal(result.text, fixtures.visibleText); + }, + }, + { + name: 'checks selector visibility', + command: 'selectors.isVisible', + run: async (runtime, fixtures) => { + const result = await commands.selectors.isVisible(runtime, { + session: fixtures.session, + target: selector(fixtures.visibleSelector), + }); + assert.equal(result.pass, true); + }, + }, + { + name: 'waits for visible text', + command: 'selectors.waitForText', + run: async (runtime, fixtures) => { + const result = await commands.selectors.waitForText(runtime, { + session: fixtures.session, + text: fixtures.visibleText, + timeoutMs: 1, + }); + assert.equal(result.kind, 'text'); + assert.equal(result.text, fixtures.visibleText); + }, + }, + ], +}); + +export const interactionConformanceSuite = createCommandConformanceSuite({ + name: 'interactions', + cases: [ + { + name: 'clicks selector targets', + command: 'interactions.click', + run: async (runtime, fixtures) => { + const result = await commands.interactions.click(runtime, { + session: fixtures.session, + target: selector(fixtures.visibleSelector), + }); + assert.equal(result.kind, 'selector'); + }, + }, + { + name: 'presses explicit points', + command: 'interactions.press', + run: async (runtime, fixtures) => { + const result = await commands.interactions.press(runtime, { + session: fixtures.session, + target: { kind: 'point', ...fixtures.point }, + }); + assert.deepEqual(result.point, fixtures.point); + }, + }, + { + name: 'fills editable targets', + command: 'interactions.fill', + run: async (runtime, fixtures) => { + const result = await commands.interactions.fill(runtime, { + session: fixtures.session, + target: fixtures.editableTarget, + text: fixtures.fillText, + }); + assert.equal(result.text, fixtures.fillText); + }, + }, + { + name: 'types text without a target', + command: 'interactions.typeText', + run: async (runtime, fixtures) => { + const result = await commands.interactions.typeText(runtime, { + session: fixtures.session, + text: fixtures.fillText, + }); + assert.equal(result.text, fixtures.fillText); + }, + }, + ], +}); + +export const commandConformanceSuites: readonly CommandConformanceSuite[] = [ + captureConformanceSuite, + selectorConformanceSuite, + interactionConformanceSuite, +]; + +export async function runCommandConformance( + target: CommandConformanceTarget, + options: { suites?: readonly CommandConformanceSuite[] } = {}, +): Promise { + const suites = options.suites ?? commandConformanceSuites; + const results: CommandConformanceSuiteResult[] = []; + for (const suite of suites) { + results.push(await suite.run(target)); + } + const failures = results.flatMap((result) => result.failures); + return { + target: target.name, + passed: results.reduce((sum, result) => sum + result.passed, 0), + failed: results.reduce((sum, result) => sum + result.failed, 0), + failures, + suites: results, + }; +} + +export async function assertCommandConformance( + target: CommandConformanceTarget, + options: { suites?: readonly CommandConformanceSuite[] } = {}, +): Promise { + const report = await runCommandConformance(target, options); + if (report.failed > 0) { + throw new AggregateError( + report.failures.map((failure) => failure.error), + `${target.name} failed ${report.failed} agent-device conformance case${ + report.failed === 1 ? '' : 's' + }`, + ); + } + return report; +} + +function createCommandConformanceSuite(params: { + name: string; + cases: readonly CommandConformanceCase[]; +}): CommandConformanceSuite { + return { + name: params.name, + cases: params.cases, + run: async (target) => { + const fixtures = { ...defaultCommandConformanceFixtures, ...target.fixtures }; + const failures: CommandConformanceFailure[] = []; + let passed = 0; + for (const testCase of params.cases) { + const context = { + suite: params.name, + caseName: testCase.name, + fixtures, + }; + try { + await target.beforeEach?.(context); + const runtime = await target.createRuntime(); + await testCase.run(runtime, fixtures); + passed += 1; + } catch (error) { + failures.push({ + suite: params.name, + caseName: testCase.name, + command: testCase.command, + error, + }); + } finally { + await target.afterEach?.(context); + } + } + return { + suite: params.name, + passed, + failed: failures.length, + failures, + }; + }, + }; +} diff --git a/src/utils/errors.ts b/src/utils/errors.ts index 5f02455e..ec4ca471 100644 --- a/src/utils/errors.ts +++ b/src/utils/errors.ts @@ -7,6 +7,7 @@ export type AppErrorCode = | 'APP_NOT_INSTALLED' | 'UNSUPPORTED_PLATFORM' | 'UNSUPPORTED_OPERATION' + | 'NOT_IMPLEMENTED' | 'COMMAND_FAILED' | 'SESSION_NOT_FOUND' | 'UNAUTHORIZED' @@ -48,6 +49,17 @@ export function asAppError(err: unknown): AppError { return new AppError('UNKNOWN', 'Unknown error', { err }); } +export function isAgentDeviceError(err: unknown): err is AppError { + return err instanceof AppError; +} + +export function normalizeAgentDeviceError( + err: unknown, + context: { diagnosticId?: string; logPath?: string } = {}, +): NormalizedError { + return normalizeError(err, context); +} + export function normalizeError( err: unknown, context: { diagnosticId?: string; logPath?: string } = {}, @@ -129,6 +141,8 @@ export function defaultHintForCode(code: string): string | undefined { return 'Run apps to discover the exact installed package or bundle id, or install the app before open.'; case 'UNSUPPORTED_OPERATION': return 'This command is not available for the selected platform/device.'; + case 'NOT_IMPLEMENTED': + return 'This command is part of the planned API but is not implemented yet.'; case 'COMMAND_FAILED': return 'Retry with --debug and inspect diagnostics log for details.'; case 'UNAUTHORIZED': diff --git a/src/utils/screenshot-diff.ts b/src/utils/screenshot-diff.ts index b15fca23..1097c194 100644 --- a/src/utils/screenshot-diff.ts +++ b/src/utils/screenshot-diff.ts @@ -34,6 +34,7 @@ export type ScreenshotDiffOptions = { threshold?: number; outputPath?: string; maxRegions?: number; + maxPixels?: number; }; // Each pixel is a point in 3D RGB space (R, G, B each 0–255). @@ -63,6 +64,8 @@ export async function compareScreenshots( const baseline = decodePng(baselineBuffer, 'baseline screenshot'); const current = decodePng(currentBuffer, 'current screenshot'); + validateMaxPixels(baseline.width, baseline.height, 'baseline screenshot', options.maxPixels); + validateMaxPixels(current.width, current.height, 'current screenshot', options.maxPixels); const threshold = options.threshold ?? 0.1; @@ -192,6 +195,21 @@ async function validateFileExists(filePath: string, errorMessage: string): Promi } } +function validateMaxPixels( + width: number, + height: number, + label: string, + maxPixels: number | undefined, +): void { + if (maxPixels == null || maxPixels <= 0) return; + const totalPixels = width * height; + if (totalPixels <= maxPixels) return; + throw new AppError( + 'INVALID_ARGS', + `${label} is ${totalPixels} pixels, which exceeds the configured maxImagePixels limit of ${maxPixels}`, + ); +} + async function removeStaleDiffOutput(outputPath: string | undefined): Promise { if (!outputPath) return; try { diff --git a/src/utils/selector-build.ts b/src/utils/selector-build.ts new file mode 100644 index 00000000..98a8adea --- /dev/null +++ b/src/utils/selector-build.ts @@ -0,0 +1,82 @@ +import type { Platform } from './device.ts'; +import type { SnapshotNode } from './snapshot.ts'; +import { isNodeVisible } from './selector-node.ts'; +import { extractNodeText, normalizeType } from './snapshot-processing.ts'; + +export function buildSelectorChainForNode( + node: SnapshotNode, + _platform: Platform, + options: { action?: 'click' | 'fill' | 'get' } = {}, +): string[] { + const chain: string[] = []; + const role = normalizeType(node.type ?? ''); + const id = normalizeSelectorText(node.identifier); + const label = normalizeSelectorText(node.label); + const value = normalizeSelectorText(node.value); + const text = normalizeSelectorText(extractNodeText(node)); + const requireEditable = options.action === 'fill'; + + if (id) { + chain.push(`id=${quoteSelectorValue(id)}`); + } + if (role && label) { + chain.push( + requireEditable + ? `role=${quoteSelectorValue(role)} label=${quoteSelectorValue(label)} editable=true` + : `role=${quoteSelectorValue(role)} label=${quoteSelectorValue(label)}`, + ); + } + if (label) { + chain.push( + requireEditable + ? `label=${quoteSelectorValue(label)} editable=true` + : `label=${quoteSelectorValue(label)}`, + ); + } + if (value) { + chain.push( + requireEditable + ? `value=${quoteSelectorValue(value)} editable=true` + : `value=${quoteSelectorValue(value)}`, + ); + } + if (text && text !== label && text !== value) { + chain.push( + requireEditable + ? `text=${quoteSelectorValue(text)} editable=true` + : `text=${quoteSelectorValue(text)}`, + ); + } + if (role && requireEditable && !chain.some((entry) => entry.includes('editable=true'))) { + chain.push(`role=${quoteSelectorValue(role)} editable=true`); + } + + const deduped = uniqueStrings(chain); + if (deduped.length === 0 && role) { + deduped.push( + requireEditable + ? `role=${quoteSelectorValue(role)} editable=true` + : `role=${quoteSelectorValue(role)}`, + ); + } + if (deduped.length === 0) { + const visible = isNodeVisible(node); + if (visible) deduped.push('visible=true'); + } + return deduped; +} + +function uniqueStrings(values: readonly string[]): string[] { + return Array.from(new Set(values)); +} + +function quoteSelectorValue(value: string): string { + return JSON.stringify(value); +} + +function normalizeSelectorText(value: string | undefined): string | null { + if (!value) return null; + const trimmed = value.trim(); + if (!trimmed) return null; + return trimmed; +} diff --git a/src/utils/selector-is-predicates.ts b/src/utils/selector-is-predicates.ts new file mode 100644 index 00000000..9c369f54 --- /dev/null +++ b/src/utils/selector-is-predicates.ts @@ -0,0 +1,122 @@ +import type { Platform } from './device.ts'; +import type { SnapshotState } from './snapshot.ts'; +import { isNodeVisibleInEffectiveViewport } from './mobile-snapshot-semantics.ts'; +import { isNodeEditable, isNodeVisible } from './selector-node.ts'; +import { extractNodeText, normalizeType } from './snapshot-processing.ts'; + +type IsPredicate = 'visible' | 'hidden' | 'exists' | 'editable' | 'selected' | 'text'; + +export function isSupportedPredicate(input: string): input is IsPredicate { + return ['visible', 'hidden', 'exists', 'editable', 'selected', 'text'].includes(input); +} + +export function evaluateIsPredicate(params: { + predicate: Exclude; + node: SnapshotState['nodes'][number]; + nodes: SnapshotState['nodes']; + expectedText?: string; + platform: Platform; +}): { pass: boolean; actualText: string; details: string } { + const { predicate, node, nodes, expectedText, platform } = params; + const actualText = extractNodeText(node); + const editable = isNodeEditable(node, platform); + const selected = node.selected === true; + const visible = predicate === 'text' ? isNodeVisible(node) : isAssertionVisible(node, nodes); + let pass = false; + switch (predicate) { + case 'visible': + pass = visible; + break; + case 'hidden': + pass = !visible; + break; + case 'editable': + pass = editable; + break; + case 'selected': + pass = selected; + break; + case 'text': + pass = actualText === (expectedText ?? ''); + break; + } + const details = + predicate === 'text' + ? `expected="${expectedText ?? ''}" actual="${actualText}"` + : `actual=${JSON.stringify({ + visible, + editable, + selected, + })}`; + return { pass, actualText, details }; +} + +function isAssertionVisible( + node: SnapshotState['nodes'][number], + nodes: SnapshotState['nodes'], +): boolean { + if (node.hittable === true) return true; + if (hasPositiveRect(node.rect)) return isRectVisibleInViewport(node, nodes); + if (node.rect) return false; + const anchor = resolveVisibilityAnchor(node, nodes); + if (!anchor) return false; + if (anchor.hittable === true) return true; + if (!hasPositiveRect(anchor.rect)) return false; + return isRectVisibleInViewport(anchor, nodes); +} + +function isRectVisibleInViewport( + node: SnapshotState['nodes'][number], + nodes: SnapshotState['nodes'], +): boolean { + return isNodeVisibleInEffectiveViewport(node, nodes); +} + +function resolveVisibilityAnchor( + node: SnapshotState['nodes'][number], + nodes: SnapshotState['nodes'], +): SnapshotState['nodes'][number] | null { + const nodesByIndex = new Map(nodes.map((entry) => [entry.index, entry])); + let current = node; + const visited = new Set(); + while (typeof current.parentIndex === 'number' && !visited.has(current.index)) { + visited.add(current.index); + const parent = nodesByIndex.get(current.parentIndex); + if (!parent) break; + if (isUsefulVisibilityAnchor(parent)) return parent; + current = parent; + } + return null; +} + +function isUsefulVisibilityAnchor(node: SnapshotState['nodes'][number]): boolean { + const type = normalizeType(node.type ?? ''); + // These containers often report the full content frame, not the clipped on-screen geometry. + if ( + type.includes('application') || + type.includes('window') || + type.includes('scrollview') || + type.includes('tableview') || + type.includes('collectionview') || + type === 'table' || + type === 'list' || + type === 'listview' + ) { + return false; + } + return node.hittable === true || hasPositiveRect(node.rect); +} + +function hasPositiveRect( + rect: SnapshotState['nodes'][number]['rect'], +): rect is NonNullable { + return Boolean( + rect && + Number.isFinite(rect.x) && + Number.isFinite(rect.y) && + Number.isFinite(rect.width) && + Number.isFinite(rect.height) && + rect.width > 0 && + rect.height > 0, + ); +} diff --git a/src/utils/selector-node.ts b/src/utils/selector-node.ts new file mode 100644 index 00000000..5759de54 --- /dev/null +++ b/src/utils/selector-node.ts @@ -0,0 +1,13 @@ +import type { Platform } from './device.ts'; +import type { SnapshotNode } from './snapshot.ts'; +import { isFillableType } from './snapshot-processing.ts'; + +export function isNodeVisible(node: SnapshotNode): boolean { + if (node.hittable === true) return true; + if (!node.rect) return false; + return node.rect.width > 0 && node.rect.height > 0; +} + +export function isNodeEditable(node: SnapshotNode, platform: Platform): boolean { + return isFillableType(node.type ?? '', platform) && node.enabled !== false; +} diff --git a/src/utils/snapshot-diff.ts b/src/utils/snapshot-diff.ts new file mode 100644 index 00000000..6e3ff948 --- /dev/null +++ b/src/utils/snapshot-diff.ts @@ -0,0 +1,158 @@ +import type { SnapshotNode } from './snapshot.ts'; +import { + buildSnapshotDisplayLines, + displayLabel, + formatRole, + formatSnapshotLine, +} from './snapshot-lines.ts'; + +type SnapshotDiffLine = { + kind: 'added' | 'removed' | 'unchanged'; + text: string; +}; + +type SnapshotDiffSummary = { + additions: number; + removals: number; + unchanged: number; +}; + +type SnapshotDiffResult = { + summary: SnapshotDiffSummary; + lines: SnapshotDiffLine[]; +}; + +type SnapshotDiffOptions = { + flatten?: boolean; +}; + +type SnapshotComparableLine = { + text: string; + comparable: string; +}; + +function snapshotNodeToComparableLine(node: SnapshotNode, depthOverride?: number): string { + const role = formatRole(node.type ?? 'Element'); + const textPart = displayLabel(node, role); + const enabledPart = node.enabled === false ? 'disabled' : 'enabled'; + const selectedPart = node.selected === true ? 'selected' : 'unselected'; + const hittablePart = node.hittable === true ? 'hittable' : 'not-hittable'; + const depthPart = String(depthOverride ?? node.depth ?? 0); + return [depthPart, role, textPart, enabledPart, selectedPart, hittablePart].join('|'); +} + +export function buildSnapshotDiff( + previousNodes: SnapshotNode[], + currentNodes: SnapshotNode[], + options: SnapshotDiffOptions = {}, +): SnapshotDiffResult { + const previous = snapshotNodesToLines(previousNodes, options); + const current = snapshotNodesToLines(currentNodes, options); + const lines = diffComparableLinesMyers(previous, current); + const summary: SnapshotDiffSummary = { additions: 0, removals: 0, unchanged: 0 }; + for (const line of lines) { + if (line.kind === 'added') summary.additions += 1; + if (line.kind === 'removed') summary.removals += 1; + if (line.kind === 'unchanged') summary.unchanged += 1; + } + return { summary, lines }; +} + +export function countSnapshotComparableLines( + nodes: SnapshotNode[], + options: SnapshotDiffOptions = {}, +): number { + return snapshotNodesToLines(nodes, options).length; +} + +function snapshotNodesToLines( + nodes: SnapshotNode[], + options: SnapshotDiffOptions, +): SnapshotComparableLine[] { + if (options.flatten) { + return nodes.map((node) => ({ + text: formatSnapshotLine(node, 0, false), + comparable: snapshotNodeToComparableLine(node, 0), + })); + } + return buildSnapshotDisplayLines(nodes).map((line) => ({ + text: line.text, + comparable: snapshotNodeToComparableLine(line.node, line.depth), + })); +} + +function diffComparableLinesMyers( + previous: SnapshotComparableLine[], + current: SnapshotComparableLine[], +): SnapshotDiffLine[] { + // Myers diff is efficient for normal UI snapshots; very large trees may still be expensive. + const n = previous.length; + const m = current.length; + const max = n + m; + const v = new Map(); + const trace: Array> = []; + v.set(1, 0); + + for (let d = 0; d <= max; d += 1) { + trace.push(new Map(v)); + for (let k = -d; k <= d; k += 2) { + const goDown = k === -d || (k !== d && getV(v, k - 1) < getV(v, k + 1)); + let x = goDown ? getV(v, k + 1) : getV(v, k - 1) + 1; + let y = x - k; + while (x < n && y < m && previous[x].comparable === current[y].comparable) { + x += 1; + y += 1; + } + v.set(k, x); + if (x >= n && y >= m) { + return backtrackMyers(trace, previous, current, n, m); + } + } + } + + return []; +} + +function backtrackMyers( + trace: Array>, + previous: SnapshotComparableLine[], + current: SnapshotComparableLine[], + n: number, + m: number, +): SnapshotDiffLine[] { + const lines: SnapshotDiffLine[] = []; + let x = n; + let y = m; + + for (let d = trace.length - 1; d >= 0; d -= 1) { + const v = trace[d]; + const k = x - y; + const goDown = k === -d || (k !== d && getV(v, k - 1) < getV(v, k + 1)); + const prevK = goDown ? k + 1 : k - 1; + const prevX = getV(v, prevK); + const prevY = prevX - prevK; + + while (x > prevX && y > prevY) { + lines.push({ kind: 'unchanged', text: current[y - 1].text }); + x -= 1; + y -= 1; + } + + if (d === 0) break; + + if (x === prevX) { + lines.push({ kind: 'added', text: current[prevY].text }); + y = prevY; + } else { + lines.push({ kind: 'removed', text: previous[prevX].text }); + x = prevX; + } + } + + lines.reverse(); + return lines; +} + +function getV(v: Map, k: number): number { + return v.get(k) ?? 0; +} diff --git a/src/utils/snapshot-processing.ts b/src/utils/snapshot-processing.ts new file mode 100644 index 00000000..2f16c10d --- /dev/null +++ b/src/utils/snapshot-processing.ts @@ -0,0 +1,138 @@ +import type { Platform } from './device.ts'; +import type { RawSnapshotNode, SnapshotState } from './snapshot.ts'; +import { extractReadableText } from './text-surface.ts'; + +export function findNodeByLabel(nodes: SnapshotState['nodes'], label: string) { + const query = label.toLowerCase(); + return ( + nodes.find((node) => { + const labelValue = (node.label ?? '').toLowerCase(); + const valueValue = (node.value ?? '').toLowerCase(); + const idValue = (node.identifier ?? '').toLowerCase(); + return labelValue.includes(query) || valueValue.includes(query) || idValue.includes(query); + }) ?? null + ); +} + +export function resolveRefLabel( + node: SnapshotState['nodes'][number], + nodes: SnapshotState['nodes'], +): string | undefined { + const primary = [node.label, node.value, node.identifier] + .map((value) => (typeof value === 'string' ? value.trim() : '')) + .find((value) => value && value.length > 0); + if (primary && isMeaningfulLabel(primary)) return primary; + const fallback = findNearestMeaningfulLabel(node, nodes); + return fallback ?? (primary && isMeaningfulLabel(primary) ? primary : undefined); +} + +function isMeaningfulLabel(value: string): boolean { + const trimmed = value.trim(); + if (!trimmed) return false; + if (/^(true|false)$/i.test(trimmed)) return false; + if (/^\d+$/.test(trimmed)) return false; + return true; +} + +function findNearestMeaningfulLabel( + target: SnapshotState['nodes'][number], + nodes: SnapshotState['nodes'], +): string | undefined { + if (!target.rect) return undefined; + const targetY = target.rect.y + target.rect.height / 2; + let best: { label: string; distance: number } | null = null; + for (const node of nodes) { + if (!node.rect) continue; + const label = [node.label, node.value, node.identifier] + .map((value) => (typeof value === 'string' ? value.trim() : '')) + .find((value) => value && value.length > 0); + if (!label || !isMeaningfulLabel(label)) continue; + const nodeY = node.rect.y + node.rect.height / 2; + const distance = Math.abs(nodeY - targetY); + if (!best || distance < best.distance) { + best = { label, distance }; + } + } + return best?.label; +} + +export function pruneGroupNodes(nodes: RawSnapshotNode[]): RawSnapshotNode[] { + const skippedDepths: number[] = []; + const result: RawSnapshotNode[] = []; + for (const node of nodes) { + const depth = node.depth ?? 0; + while (skippedDepths.length > 0 && depth <= skippedDepths[skippedDepths.length - 1]) { + skippedDepths.pop(); + } + const type = normalizeType(node.type ?? ''); + const labelCandidate = [node.label, node.value, node.identifier] + .map((value) => (typeof value === 'string' ? value.trim() : '')) + .find((value) => value && value.length > 0); + const hasMeaningfulLabel = labelCandidate ? isMeaningfulLabel(labelCandidate) : false; + if ((type === 'group' || type === 'ioscontentgroup') && !hasMeaningfulLabel) { + skippedDepths.push(depth); + continue; + } + const adjustedDepth = Math.max(0, depth - skippedDepths.length); + result.push({ ...node, depth: adjustedDepth }); + } + return result; +} + +export function normalizeType(type: string): string { + let value = type.trim().replace(/XCUIElementType/gi, ''); + if (value.startsWith('AX')) { + value = value.slice(2); + } + value = value.toLowerCase(); + const lastSeparator = Math.max(value.lastIndexOf('.'), value.lastIndexOf('/')); + if (lastSeparator !== -1) { + value = value.slice(lastSeparator + 1); + } + return value; +} + +export function isFillableType(type: string, platform: Platform): boolean { + const normalized = normalizeType(type); + if (!normalized) return true; + if (platform === 'android') { + return normalized.includes('edittext') || normalized.includes('autocompletetextview'); + } + return ( + normalized.includes('textfield') || + normalized.includes('securetextfield') || + normalized.includes('searchfield') || + normalized.includes('textview') || + normalized.includes('textarea') || + normalized === 'search' + ); +} + +export function findNearestHittableAncestor( + nodes: SnapshotState['nodes'], + node: SnapshotState['nodes'][number], +): SnapshotState['nodes'][number] | null { + if (node.hittable) return node; + let current = node; + const visited = new Set(); + while (current.parentIndex !== undefined) { + if (visited.has(current.ref)) break; + visited.add(current.ref); + const parent = nodes[current.parentIndex]; + if (!parent) break; + if (parent.hittable) return parent; + current = parent; + } + return null; +} + +export function extractNodeText(node: SnapshotState['nodes'][number]): string { + const candidates = [node.label, node.value, node.identifier] + .map((value) => (typeof value === 'string' ? value.trim() : '')) + .filter((value) => value.length > 0); + return candidates[0] ?? ''; +} + +export function extractNodeReadText(node: SnapshotState['nodes'][number]): string { + return extractReadableText(node); +} diff --git a/src/utils/snapshot-visibility.ts b/src/utils/snapshot-visibility.ts new file mode 100644 index 00000000..4b028936 --- /dev/null +++ b/src/utils/snapshot-visibility.ts @@ -0,0 +1,41 @@ +import { buildMobileSnapshotPresentation } from './mobile-snapshot-semantics.ts'; +import type { SnapshotBackend, SnapshotState, SnapshotVisibility } from './snapshot.ts'; + +function isDesktopBackend(backend: SnapshotBackend | undefined): boolean { + return backend === 'macos-helper' || backend === 'linux-atspi'; +} + +export function buildSnapshotVisibility(params: { + nodes: SnapshotState['nodes']; + backend?: SnapshotState['backend']; + snapshotRaw?: boolean; +}): SnapshotVisibility { + const { nodes, backend, snapshotRaw } = params; + if (snapshotRaw || isDesktopBackend(backend)) { + return { + partial: false, + visibleNodeCount: nodes.length, + totalNodeCount: nodes.length, + reasons: [], + }; + } + + const presentation = buildMobileSnapshotPresentation(nodes); + const reasons = new Set(); + if (presentation.hiddenCount > 0) { + reasons.add('offscreen-nodes'); + } + if (presentation.nodes.some((node) => node.hiddenContentAbove)) { + reasons.add('scroll-hidden-above'); + } + if (presentation.nodes.some((node) => node.hiddenContentBelow)) { + reasons.add('scroll-hidden-below'); + } + + return { + partial: reasons.size > 0, + visibleNodeCount: presentation.nodes.length, + totalNodeCount: nodes.length, + reasons: [...reasons], + }; +} diff --git a/src/utils/validation.ts b/src/utils/validation.ts new file mode 100644 index 00000000..58c5f891 --- /dev/null +++ b/src/utils/validation.ts @@ -0,0 +1,8 @@ +import { AppError } from './errors.ts'; + +export function requireIntInRange(value: number, name: string, min: number, max: number): number { + if (!Number.isFinite(value) || !Number.isInteger(value) || value < min || value > max) { + throw new AppError('INVALID_ARGS', `${name} must be an integer between ${min} and ${max}`); + } + return value; +} diff --git a/test/integration/installed-package-metro.test.ts b/test/integration/installed-package-metro.test.ts index b6180c6f..59e17956 100644 --- a/test/integration/installed-package-metro.test.ts +++ b/test/integration/installed-package-metro.test.ts @@ -49,9 +49,16 @@ async function execFileText( options: { cwd: string }, ): Promise { return await new Promise((resolve, reject) => { - execFile(file, args, { ...options, encoding: 'utf8' }, (error, stdout) => { + execFile(file, args, { ...options, encoding: 'utf8' }, (error, stdout, stderr) => { if (error) { - reject(error); + reject( + new Error( + [error.message, stdout ? `stdout:\n${stdout}` : '', stderr ? `stderr:\n${stderr}` : ''] + .filter(Boolean) + .join('\n'), + { cause: error }, + ), + ); return; } resolve(stdout); @@ -274,6 +281,9 @@ test('installed package exposes Node APIs and packaged metro companion entrypoin metroPublicBaseUrl: 'https://public.example.test', metroProxyBaseUrl: `http://127.0.0.1:${bridgePort}`, metroBearerToken: bridgeToken, + tenant: 'tenant-1', + runId: 'run-1', + leaseId: 'lease-1', metroPreparePort: metroPort, metroStatusHost: '127.0.0.1', }), @@ -284,13 +294,30 @@ test('installed package exposes Node APIs and packaged metro companion entrypoin consumerRoot, ['--input-type=module', '-e'], ` + import { createAgentDevice, createMemorySessionStore, restrictedCommandPolicy } from 'agent-device'; import 'agent-device/contracts'; import { daemonCommandRequestSchema } from 'agent-device/contracts'; + import { BACKEND_CAPABILITY_NAMES } from 'agent-device/backend'; + import { commandCatalog, commands, selector } from 'agent-device/commands'; + import { createLocalArtifactAdapter } from 'agent-device/io'; import { buildBundleUrl, buildIosRuntimeHints, normalizeBaseUrl } from 'agent-device/metro'; import { resolveRemoteConfigProfile } from 'agent-device/remote-config'; + import { commandConformanceSuites } from 'agent-device/testing/conformance'; const loaded = resolveRemoteConfigProfile({ configPath: ${JSON.stringify(remoteConfigPath)}, cwd: process.cwd() }); + const device = createAgentDevice({ + backend: { platform: 'ios' }, + artifacts: createLocalArtifactAdapter({ cwd: process.cwd() }), + sessions: createMemorySessionStore(), + policy: restrictedCommandPolicy() + }); console.log(JSON.stringify({ + backendCapabilityCount: BACKEND_CAPABILITY_NAMES.length, bundleUrl: buildIosRuntimeHints('https://public.example.test').bundleUrl, + catalogEntries: commandCatalog.length, + conformanceSuites: commandConformanceSuites.length, + hasBoundScreenshot: typeof device.capture.screenshot, + hasCommandsScreenshot: typeof commands.capture.screenshot, + selectorKind: selector('label=Continue').kind, normalizedBaseUrl: normalizeBaseUrl('https://public.example.test///'), protocolBundleUrl: buildBundleUrl('https://public.example.test', 'android'), parsedCommand: daemonCommandRequestSchema.parse({ @@ -306,6 +333,12 @@ test('installed package exposes Node APIs and packaged metro companion entrypoin imports.bundleUrl, 'https://public.example.test/index.bundle?platform=ios&dev=true&minify=false', ); + assert.equal(imports.backendCapabilityCount > 0, true); + assert.equal(imports.catalogEntries > 0, true); + assert.equal(imports.conformanceSuites > 0, true); + assert.equal(imports.hasBoundScreenshot, 'function'); + assert.equal(imports.hasCommandsScreenshot, 'function'); + assert.equal(imports.selectorKind, 'selector'); assert.equal(imports.normalizedBaseUrl, 'https://public.example.test'); assert.equal( imports.protocolBundleUrl, diff --git a/test/integration/replays/android/02-deep-navigation.ad b/test/integration/replays/android/02-deep-navigation.ad index 483bccac..2072595e 100644 --- a/test/integration/replays/android/02-deep-navigation.ad +++ b/test/integration/replays/android/02-deep-navigation.ad @@ -6,7 +6,7 @@ appstate snapshot -i screenshot "./test/screenshots/replays/android-deep-nav-root.png" -click @e16 +click "label=Notifications" wait 1000 snapshot -i find text "Notification history" exists diff --git a/website/docs/docs/client-api.md b/website/docs/docs/client-api.md index 780f2214..657bae21 100644 --- a/website/docs/docs/client-api.md +++ b/website/docs/docs/client-api.md @@ -10,6 +10,14 @@ For remote Metro-backed flows, import the reusable Node APIs instead of spawning Public subpath API exposed for Node consumers: +- `agent-device/commands` + - runtime command namespaces for command semantics as they migrate out of the daemon and CLI layers +- `agent-device/backend` + - backend primitive and policy-gated capability types for local and hosted adapters +- `agent-device/io` + - artifact adapter types, file input refs, and file output refs +- `agent-device/testing/conformance` + - conformance suites for backend/runtime parity across capture, selectors, and interactions - `agent-device/metro` - `prepareRemoteMetro(options)` - `ensureMetroTunnel(options)` @@ -50,6 +58,8 @@ Public subpath API exposed for Node consumers: - `agent-device/artifacts` - `resolveAndroidArchivePackageName(archivePath)` +The `contracts`, `selectors`, `finders`, `install-source`, `android-apps`, `artifacts`, `metro`, and `remote-config` subpaths remain available for compatibility. New command-level integrations should prefer the runtime boundary: `agent-device/commands`, `agent-device/backend`, and `agent-device/io`. + ## Basic usage ```ts @@ -83,6 +93,51 @@ const snapshot = await client.capture.snapshot({ interactiveOnly: true }); await client.sessions.close(); ``` +## Runtime command API + +Use `createAgentDevice()` when you want command semantics without the daemon RPC +client. The runtime takes an explicit backend and IO adapter, so service code can +avoid accidental local filesystem or local device access. If no session store is +provided, the runtime uses an isolated in-memory store. + +```ts +import { + createAgentDevice, + createLocalArtifactAdapter, + createMemorySessionStore, + localCommandPolicy, + selector, +} from 'agent-device'; + +const device = createAgentDevice({ + backend, + artifacts: createLocalArtifactAdapter(), + sessions: createMemorySessionStore([{ name: 'default' }]), + policy: localCommandPolicy(), +}); + +await device.capture.screenshot({ + session: 'default', + out: { kind: 'path', path: './screen.png' }, +}); + +await device.selectors.waitForText('Ready', { session: 'default', timeoutMs: 5_000 }); +await device.interactions.click(selector('label=Continue'), { session: 'default' }); +``` + +Implemented runtime namespaces are currently: + +- `capture`: `screenshot`, `diffScreenshot`, `snapshot`, `diffSnapshot` +- `selectors`: `find`, `get`, `getText`, `getAttrs`, `is`, `isVisible`, `isHidden`, `wait`, `waitForText` +- `interactions`: `click`, `press`, `fill`, `typeText` + +Commands that have not migrated are tracked in `commandCatalog` instead of being +exposed as throwing methods. + +Backend authors can use `runCommandConformance()` or `assertCommandConformance()` from +`agent-device/testing/conformance` to verify capture, selector, and interaction +semantics against a prepared fixture app or test backend. + ## Command methods Use `client.command.()` for command-level device actions. It uses the same daemon transport path as the higher-level client methods, including session metadata, tenant/run/lease fields, normalized daemon errors, and remote artifact handling.