[PM-34487] llm: Add Android device interaction MCP server with ADB tooling#6747
[PM-34487] llm: Add Android device interaction MCP server with ADB tooling#6747SaintPatrck wants to merge 10 commits intomainfrom
Conversation
Bitwarden Claude Code ReviewOverall Assessment: APPROVE Reviewed 26 new files (2661 additions) adding a TypeScript MCP server for Android device interaction via ADB. The server provides 6 tools (capture, find_element, tap_element, tap_at, navigate, input_text) with structured XML parsing, dumpsys window parsing, and two-layer obstruction detection. Security posture is sound: Code Review DetailsNo findings met the confidence threshold for inline comments. Potential concerns evaluated and dismissed:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6747 +/- ##
==========================================
+ Coverage 84.75% 85.58% +0.83%
==========================================
Files 944 820 -124
Lines 60715 58162 -2553
Branches 8569 8504 -65
==========================================
- Hits 51456 49776 -1680
+ Misses 6293 5436 -857
+ Partials 2966 2950 -16
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ripts Provides Claude with structured guidance for Android device interaction via ADB — capturing UI state, tapping elements, navigating, and verifying screens — with scoped allowed-tools and helper scripts that reduce token overhead on repetitive operations.
d527819 to
483c8e7
Compare
.claude/skills/interacting-with-android-device/scripts/adb-find-element.sh
Outdated
Show resolved
Hide resolved
.claude/skills/interacting-with-android-device/scripts/adb-tap-and-capture.sh
Outdated
Show resolved
Hide resolved
.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh
Outdated
Show resolved
Hide resolved
|
New Issues (129)Checkmarx found the following issues in this Pull Request
|
| - `adb-capture.sh [--xml] [--screenshot] [--all]` - Capture current device state. Default (no flags): both screenshot and XML hierarchy. | ||
| - `adb-find-element.sh <text>` - Find element by `text` or `content-desc`, return center coordinates (`X Y`). Dumps UI hierarchy, parses XML, calculates center from bounds. | ||
| - `adb-tap-and-capture.sh <x> <y> [wait_seconds=2]` - Tap at coordinates, wait, capture and pull screenshot. | ||
| - `adb-tap-element.sh <text> [wait_seconds=2]` - Find, tap, and capture in one command (recommended). Combines `adb-find-element.sh` + `adb-tap-and-capture.sh`. |
There was a problem hiding this comment.
Maybe this is getting too far into the weeds but are we at all concerned about z-order issues.
I know we have seen that in our own tests, where the FAB will be on top of another button, so when we click those coordinates, we expect the button clicked but we actually click the FAB.
There was a problem hiding this comment.
Yes. I just ran into that. Claude figured out how to manually adjust the touch coordinates pretty quickly given the other instructions and commands that are present. I'll see if there's a relatively easy way to mitigate it.
There was a problem hiding this comment.
Pivoted to an MCP server because the bash was getting crrraaaazy. This made handling z-order easier, and Claude is more likely to use them without direct instruction thanks to built-in MCP discovery. Converted the skill into an "instruction manual" for how to use the MCP server and perform common interactions. 🚀
Improve the reliability of automated UI interactions by verifying that the target element is the topmost clickable component at the calculated tap coordinates. This prevents "clickjacking" or failed interactions when the desired element is obscured by overlays, dialogs, or other UI containers. - Implement a search within `view.xml` to identify the topmost clickable element at the target $(X, Y)$ coordinate, following the UIAutomator document order where the last element drawn is the one that receives the touch event. - Add validation logic to determine if the topmost element matches the intended target based on text, content description, or identical boundary coordinates (supporting Compose parent wrappers). - Introduce a new exit code `3` to signal that an element is obstructed, allowing callers to handle obscured states gracefully. - Provide detailed diagnostic information to `stderr` when an obstruction is detected, including the text, resource ID, or package name of the obstructing element. - Maintain backward compatibility by continuing to output the target center coordinates to `stdout`.
.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh
Outdated
Show resolved
Hide resolved
.claude/skills/interacting-with-android-device/scripts/adb-find-element.sh
Outdated
Show resolved
Hide resolved
.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh
Outdated
Show resolved
Hide resolved
Replace 5 bash scripts with a TypeScript MCP server for Android device interaction. The shell scripts hit limitations parsing dumpsys window output and XML hierarchies with grep/awk (false positives on window type matching, wallpaper frame detection, etc). The MCP server uses: - fast-xml-parser for UIAutomator XML hierarchy traversal - Structured line-by-line dumpsys window parser with typed WindowInfo - Native geometry types (Rect, Point) for obstruction detection - Two-layer obstruction detection: system overlays via dumpsys touchable regions + in-app elements via XML topmost-clickable-at-point - Visible region computation with 4-strip candidate selection 5 tools: capture, find_element, tap_at, tap_element, navigate 57 unit tests across geometry, parsers, and ADB discovery. Designed for future promotion to ai-marketplace plugin.
Addresses code review feedback: find_element and tap_element shared ~30 lines of identical hierarchy dump + parse + obstruction detection logic.
New tool for typing text into focused fields via adb shell input text.
Escape $, backtick, and backslash in addition to double quotes before passing text to adb shell. Without this, input like p@ss$WORD silently expands the empty $WORD variable. Also fix stale .claude/mcp.json reference in SKILL.md — file lives at .mcp.json (project root).


🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-34487
📔 Objective
Add a TypeScript MCP server and companion skill for structured Android device interaction via ADB. Replaces the initial shell script approach with proper XML parsing, structured
dumpsys windowparsing, and native geometry for obstruction detection.🏗️ Architecture
🔧 MCP Tools
capturefind_elementtap_elementtap_atnavigateinput_text🛡️ Obstruction Detection
Two-layer system for detecting when UI elements are blocked:
dumpsys window windowsfor TalkBack, PiP, accessibility overlays usingtouchable regionboundsWhen obstructed, computes the largest visible strip (top/bottom/left/right of obstructor) and returns adjusted tap coordinates.
📁 Key Files
.mcp.json— MCP server config (stdio, auto-build).claude/mcp/android-device-server/— TypeScript MCP server (self-contained for future plugin extraction).claude/skills/interacting-with-android-device/SKILL.md— Companion skill with tool docs andallowed-tools✅ Testing
fast-xml-parserfor UIAutomator XML → typed treeexecFile(notexec) prevents host-side command injection