Skip to content

Commit d527819

Browse files
committed
[PM-34487] llm: Add interacting-with-android-device skill with ADB scripts
Provides Claude with structured guidance for Android device interaction via ADB — capturing UI state, tapping elements, navigating, and verifying screens — with scoped allowed-tools and helper scripts that reduce token overhead on repetitive operations.
1 parent 5d7ea8f commit d527819

6 files changed

Lines changed: 397 additions & 0 deletions

File tree

Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
name: interacting-with-android-device
3+
description: Instructions for capturing UI state, comparing with mocks, and interacting with an Android device using universal ADB commands.
4+
allowed-tools:
5+
- Bash(adb *)
6+
7+
- Bash(./.claude/skills/interacting-with-android-device/scripts/adb-*)
8+
- Bash(sleep *)
9+
- Bash(./gradlew install*)
10+
- Read
11+
- Glob
12+
---
13+
14+
# Interacting with Android Device
15+
16+
## Quick Start: Using Helper Scripts
17+
18+
Helper scripts in the `.claude/skills/interacting-with-android-device/scripts/` directory automate repetitive UI testing tasks and reduce token overhead.
19+
20+
**Available scripts:**
21+
- `adb-capture.sh [--xml] [--screenshot] [--all]` - Capture current device state. Default (no flags): both screenshot and XML hierarchy.
22+
- `adb-find-element.sh <text>` - Find element by text, return center coordinates (`X Y`). Dumps UI hierarchy, parses XML, calculates center from bounds.
23+
- `adb-tap-and-capture.sh <x> <y> [wait_seconds=2]` - Tap at coordinates, wait, capture and pull screenshot.
24+
- `adb-tap-element.sh <text> [wait_seconds=2]` - Find, tap, and capture in one command (recommended). Combines `adb-find-element.sh` + `adb-tap-and-capture.sh`.
25+
- `adb-navigate.sh <home|back|app-drawer> [wait_seconds=1]` - Navigation actions via keyevent or swipe, then capture screenshot.
26+
27+
**Use these scripts instead of inlining commands** to save tokens and reduce mechanical steps.
28+
29+
To use `adb-find-element.sh` for manual coordinate extraction:
30+
```bash
31+
COORDS=$(./.claude/skills/interacting-with-android-device/scripts/adb-find-element.sh "Check for update")
32+
X=$(echo $COORDS | awk '{print $1}')
33+
Y=$(echo $COORDS | awk '{print $2}')
34+
adb shell input tap $X $Y
35+
```
36+
37+
## 1. Capturing Current State
38+
To understand what is currently on the device:
39+
```bash
40+
# Capture both screenshot and UI hierarchy XML
41+
./.claude/skills/interacting-with-android-device/scripts/adb-capture.sh
42+
43+
# Or capture only one
44+
./.claude/skills/interacting-with-android-device/scripts/adb-capture.sh --xml # UI hierarchy only
45+
./.claude/skills/interacting-with-android-device/scripts/adb-capture.sh --screenshot # Screenshot only
46+
```
47+
* Read `view.xml` to find coordinates (`bounds`) and properties (like `text` or `resource-id`) of UI elements.
48+
* Use `screen.png` for visual verification against design mocks.
49+
50+
## 2. Interacting with the Device
51+
52+
### Using Scripts (Recommended)
53+
Use helper scripts to reduce token overhead and automate mechanical steps:
54+
55+
* **Find and tap an element by text**:
56+
```bash
57+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh "System"
58+
```
59+
This finds the element, taps it, captures screenshot—all in one command.
60+
61+
* **Tap at specific coordinates**:
62+
```bash
63+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-and-capture.sh 332 1367 2
64+
```
65+
Parameters: `<x> <y> [wait_seconds]`
66+
67+
* **Navigate (home, back, app-drawer)**:
68+
```bash
69+
./.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh home
70+
./.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh back
71+
./.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh app-drawer
72+
```
73+
74+
### Raw Commands (When Scripts Aren't Sufficient)
75+
76+
* **Finding Coordinates**: From the dumped XML, find the `bounds` attribute of the element you want to interact with. The bounds are in `[left, top][right, bottom]` format. Use the center point for a tap: `x = (left + right) / 2`, `y = (top + bottom) / 2`.
77+
* **Inputting Text**: First tap the text field, then `adb shell input text "<your_text>"` (Note: handle spaces and special characters with quotes).
78+
* **Key Events** (if not using navigate script):
79+
* Back: `adb shell input keyevent 4`
80+
* Home: `adb shell input keyevent 3`
81+
* Enter: `adb shell input keyevent 66`
82+
* **Scrolling/Swiping**: Use `adb shell input swipe <x1> <y1> <x2> <y2> <duration_ms>` where:
83+
* `(x1, y1)` = starting point
84+
* `(x2, y2)` = ending point
85+
* `duration_ms` = duration in milliseconds (1000ms is typical; adjust for speed/distance)
86+
* **Note**: For expanding containers/drawers, use large distances (e.g., 2400→300 for a 2992px tall screen)
87+
88+
## 3. Verification Workflow
89+
Follow these steps for a complete UI test:
90+
1. **Build and Install**: Ensure the latest version of the app is running: `./gradlew installDebug`.
91+
2. **Inspect**: Run `adb-capture.sh` to dump the UI hierarchy and take a screenshot.
92+
3. **Compare**: Check the current UI against any mock image files in the project.
93+
4. **Interact**: Perform an action (like a button click) using the calculated coordinates and `adb shell input tap`.
94+
5. **Wait**: Sleep for a second (`sleep 1`) to allow for animations or network transitions.
95+
6. **Verify**: Dump the UI hierarchy again to confirm the UI has updated as expected (e.g., a new screen is shown, or a success message appeared in the XML).
96+
97+
## 4. Examples
98+
99+
### Example: Navigate to Settings and Check for Updates
100+
**Using scripts:**
101+
```bash
102+
# Go to home screen
103+
./.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh home
104+
105+
# Open app drawer
106+
./.claude/skills/interacting-with-android-device/scripts/adb-navigate.sh app-drawer
107+
108+
# Find and tap "Settings" app
109+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh "Settings" 2
110+
111+
# Find and tap "System" option
112+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh "System" 2
113+
114+
# Find and tap "Software updates"
115+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh "Software updates" 2
116+
117+
# Find and tap "Check for update" button
118+
./.claude/skills/interacting-with-android-device/scripts/adb-tap-element.sh "Check for update" 5
119+
```
120+
121+
### Example: Swiping
122+
For swipe gestures not covered by the navigation script:
123+
```bash
124+
adb shell input swipe 672 2800 672 500 1000 && sleep 1 && adb shell screencap -p /sdcard/screen.png && adb pull /sdcard/screen.png .
125+
```
126+
127+
## 5. Best Practices
128+
129+
### Coordinate Calculation
130+
* Always calculate coordinates from the `bounds` attribute in the XML dump, as layouts can vary across different screen sizes.
131+
* Parse bounds format `[left,top][right,bottom]` and compute center: `x = (left + right) / 2`, `y = (top + bottom) / 2`
132+
* Use shell tools to programmatically extract coordinates rather than estimating from screenshots
133+
* When multiple instances of an element exist (e.g., in prediction row and full list), verify you're using the correct one by checking the context
134+
135+
### Command Chaining and Efficiency
136+
* For custom operations not covered by scripts, combine tap + wait + capture:
137+
```bash
138+
adb shell input tap X Y && sleep N && adb shell screencap -p /sdcard/screen.png && adb pull /sdcard/screen.png .
139+
```
140+
* Always include a sleep duration (typically 1-5 seconds) between tap and capture to allow animations and transitions to complete
141+
* Pull the screenshot immediately after capture to avoid losing transient UI states
142+
143+
### Navigation and State Evaluation
144+
* **Dump XML before interaction**: Always extract the UI hierarchy before tapping to find precise element locations
145+
* **Verify after each interaction**: Don't assume an action succeeded—capture a screenshot after every tap to confirm the correct element was activated and the UI changed as expected
146+
* **Check both visual and structural state**: Use screenshot for visual verification, XML dump for structural confirmation (element presence, text content, state changes)
147+
* **Identify navigation failures early**: If a tap opened the wrong screen, use back button (`adb shell input keyevent 4`) to recover immediately rather than continuing with an incorrect state
148+
149+
### Interaction Patterns
150+
* **Scrolling before interaction**: When looking for an element, check if it's visible on screen first. If not, scroll using swipe gestures to reveal it
151+
* **Use consistent scroll direction**: For vertical scrolling in lists/settings, use downward swipes (higher Y → lower Y) to scroll down
152+
* **Handle app crashes gracefully**: Some apps may fail to launch. Don't retry the same action—use back button and try an alternative approach
153+
* **Sanitize Input**: When using `adb shell input text`, be mindful of special characters that might need escaping in a terminal shell
154+
* **Check Accessibility**: Use the `content-desc` and `text` properties in the XML hierarchy to ensure the UI is accessible for screen readers
155+
156+
## 6. Troubleshooting
157+
158+
### Device Not Connected
159+
If `adb devices` returns an empty list:
160+
* Check USB connection or emulator status
161+
* Enable USB debugging on the device (Settings > Developer Options > USB Debugging)
162+
* Accept the RSA key prompt on the device if asked
163+
* Restart the device or disconnect/reconnect the USB cable
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#!/bin/bash
2+
# Capture current device state (screenshot and/or UI hierarchy)
3+
# Usage: ./adb-capture.sh [--xml] [--screenshot] [--all]
4+
# Default (no flags): captures both screenshot and XML hierarchy
5+
6+
CAPTURE_XML=false
7+
CAPTURE_SCREENSHOT=false
8+
9+
# Parse flags
10+
if [ $# -eq 0 ]; then
11+
CAPTURE_XML=true
12+
CAPTURE_SCREENSHOT=true
13+
else
14+
for arg in "$@"; do
15+
case $arg in
16+
--xml)
17+
CAPTURE_XML=true
18+
;;
19+
--screenshot)
20+
CAPTURE_SCREENSHOT=true
21+
;;
22+
--all)
23+
CAPTURE_XML=true
24+
CAPTURE_SCREENSHOT=true
25+
;;
26+
*)
27+
echo "Usage: $0 [--xml] [--screenshot] [--all]"
28+
echo "Default (no flags): captures both screenshot and XML hierarchy"
29+
exit 1
30+
;;
31+
esac
32+
done
33+
fi
34+
35+
# Check if adb is in PATH
36+
if ! command -v adb &> /dev/null; then
37+
if [ -x ~/Library/Android/sdk/platform-tools/adb ]; then
38+
ADB=~/Library/Android/sdk/platform-tools/adb
39+
elif [ -x /usr/local/bin/adb ]; then
40+
ADB=/usr/local/bin/adb
41+
else
42+
echo "Error: adb not found. Install Android SDK or add platform-tools to PATH."
43+
exit 1
44+
fi
45+
else
46+
ADB=adb
47+
fi
48+
49+
if [ "$CAPTURE_XML" = true ]; then
50+
echo "Dumping UI hierarchy..."
51+
$ADB shell uiautomator dump /sdcard/view.xml && $ADB pull /sdcard/view.xml .
52+
echo "UI hierarchy saved to: $(pwd)/view.xml"
53+
fi
54+
55+
if [ "$CAPTURE_SCREENSHOT" = true ]; then
56+
echo "Capturing screenshot..."
57+
$ADB shell screencap -p /sdcard/screen.png && $ADB pull /sdcard/screen.png .
58+
echo "Screenshot saved to: $(pwd)/screen.png"
59+
fi
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
#!/bin/bash
2+
# Find element by text and return center coordinates
3+
# Usage: ./adb-find-element.sh "Search Text"
4+
5+
if [ -z "$1" ]; then
6+
echo "Usage: $0 \"Element Text\""
7+
exit 1
8+
fi
9+
10+
SEARCH_TEXT="$1"
11+
12+
# Check if adb is in PATH
13+
if ! command -v adb &> /dev/null; then
14+
# Try common default paths
15+
if [ -x ~/Library/Android/sdk/platform-tools/adb ]; then
16+
ADB=~/Library/Android/sdk/platform-tools/adb
17+
elif [ -x /usr/local/bin/adb ]; then
18+
ADB=/usr/local/bin/adb
19+
else
20+
echo "Error: adb not found. Install Android SDK or add platform-tools to PATH."
21+
exit 1
22+
fi
23+
else
24+
ADB=adb
25+
fi
26+
27+
# Dump UI hierarchy
28+
$ADB shell uiautomator dump /sdcard/view.xml > /dev/null 2>&1 && $ADB pull /sdcard/view.xml . > /dev/null 2>&1
29+
echo "UI hierarchy saved to: $(pwd)/view.xml" >&2
30+
31+
# Extract coordinates using Python
32+
python3 << EOF
33+
import xml.etree.ElementTree as ET
34+
import sys
35+
36+
try:
37+
root = ET.parse('view.xml').getroot()
38+
found = False
39+
40+
for node in root.iter():
41+
text = node.get('text', '')
42+
if '$SEARCH_TEXT' in text:
43+
bounds = node.get('bounds')
44+
if bounds:
45+
bounds_str = bounds.strip('[]')
46+
parts = bounds_str.split('][')
47+
left_top = parts[0].strip('[]').split(',')
48+
right_bottom = parts[1].strip('[]').split(',')
49+
left, top = int(left_top[0]), int(left_top[1])
50+
right, bottom = int(right_bottom[0]), int(right_bottom[1])
51+
center_x = (left + right) // 2
52+
center_y = (top + bottom) // 2
53+
print(f"{center_x} {center_y}")
54+
found = True
55+
break
56+
57+
if not found:
58+
print("ERROR: Element not found", file=sys.stderr)
59+
sys.exit(1)
60+
except Exception as e:
61+
print(f"ERROR: {e}", file=sys.stderr)
62+
sys.exit(1)
63+
EOF
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
#!/bin/bash
2+
# Common navigation actions
3+
# Usage: ./adb-navigate.sh <action> [wait_seconds]
4+
# Actions: home, back, app-drawer
5+
6+
if [ -z "$1" ]; then
7+
echo "Usage: $0 <action> [wait_seconds]"
8+
echo "Actions:"
9+
echo " home - Go to home screen"
10+
echo " back - Press back button"
11+
echo " app-drawer - Open app drawer"
12+
exit 1
13+
fi
14+
15+
ACTION=$1
16+
WAIT=${2:-1}
17+
18+
# Check if adb is in PATH
19+
if ! command -v adb &> /dev/null; then
20+
if [ -x ~/Library/Android/sdk/platform-tools/adb ]; then
21+
ADB=~/Library/Android/sdk/platform-tools/adb
22+
elif [ -x /usr/local/bin/adb ]; then
23+
ADB=/usr/local/bin/adb
24+
else
25+
echo "Error: adb not found. Install Android SDK or add platform-tools to PATH."
26+
exit 1
27+
fi
28+
else
29+
ADB=adb
30+
fi
31+
32+
case $ACTION in
33+
home)
34+
echo "Going to home screen..."
35+
$ADB shell input keyevent 3
36+
;;
37+
back)
38+
echo "Pressing back button..."
39+
$ADB shell input keyevent 4
40+
;;
41+
app-drawer)
42+
echo "Opening app drawer..."
43+
$ADB shell input swipe 672 2800 672 500 1000
44+
;;
45+
*)
46+
echo "Unknown action: $ACTION"
47+
exit 1
48+
;;
49+
esac
50+
51+
sleep $WAIT
52+
$ADB shell screencap -p /sdcard/screen.png && $ADB pull /sdcard/screen.png .
53+
echo "Screenshot saved to: $(pwd)/screen.png"
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/bin/bash
2+
# Tap at coordinates and capture screenshot
3+
# Usage: ./adb-tap-and-capture.sh <x> <y> [wait_seconds]
4+
5+
if [ -z "$1" ] || [ -z "$2" ]; then
6+
echo "Usage: $0 <x> <y> [wait_seconds]"
7+
exit 1
8+
fi
9+
10+
X=$1
11+
Y=$2
12+
WAIT=${3:-2}
13+
14+
# Check if adb is in PATH
15+
if ! command -v adb &> /dev/null; then
16+
if [ -x ~/Library/Android/sdk/platform-tools/adb ]; then
17+
ADB=~/Library/Android/sdk/platform-tools/adb
18+
elif [ -x /usr/local/bin/adb ]; then
19+
ADB=/usr/local/bin/adb
20+
else
21+
echo "Error: adb not found. Install Android SDK or add platform-tools to PATH."
22+
exit 1
23+
fi
24+
else
25+
ADB=adb
26+
fi
27+
28+
# Tap and capture
29+
$ADB shell input tap $X $Y && sleep $WAIT && $ADB shell screencap -p /sdcard/screen.png && $ADB pull /sdcard/screen.png .
30+
echo "Screenshot saved to: $(pwd)/screen.png"

0 commit comments

Comments
 (0)