This document is the source of truth for how iOS tests are organized, run, and interpreted in this repo.
The test setup must work cleanly in four places:
- Xcode while iterating locally
./scripts/test-ios.shfor deterministic CLI verification- CI for the default merge gate
- AI workflows that need an explicit command matrix and machine-readable diagnostics
Use the native framework that matches the surface under test:
- Swift Testing for non-UI unit, integration, and hosted tests
- XCTest for UI tests
- XCTest for performance tests that use
measure
Current toolchain assumptions:
- Xcode 17.x
- Apple Swift 6.2 toolchain
- project source currently builds in Swift 5 language mode
Location:
Modules/VoiceCore/Tests/VoiceCoreTests/
Use for:
- coordinators
- audio session policy
- capture and playback behavior
- route state transitions
- deterministic async behavior
Current framework split:
- Swift Testing for logic tests
- XCTest for
VoiceCorePerformanceTests
Location:
heardTests/
Use for:
- app-host boot sanity
- test-mode sanity
- lightweight hosted configuration and wiring checks
- hosted performance checks that stay out of the stable lane
Stable hosted coverage currently includes:
AppLaunchSmokeTestsGeminiServiceSetupTests
Hosted configuration coverage explicitly validates multiple GeminiService audio setup payload variants without changing the runtime default profile.
Experimental hosted coverage currently includes:
AppStartupPerformanceTests
Location:
heardUITests/
Use for:
- simulator-driven CRUD flows
- navigation regressions
- search and filtering regressions
- destructive confirmation flows
- experimental gesture regressions
Stable UI coverage currently includes:
EditorFlowUITestsInventoryFlowUITestsRecipeFlowUITestsNavigationUITestsSearchFilteringUITests
Experimental UI coverage currently includes:
KeyboardDismissUITests
Commands:
./scripts/test-ios.sh voicecore
./scripts/test-ios.sh app-build
./scripts/test-ios.sh app-smoke
./scripts/test-ios.sh app-ui
./scripts/test-ios.sh stable
./scripts/test-ios.sh allMeaning:
voicecore: non-performanceVoiceCoreTestsapp-build: shared hosted build-for-testing pathapp-smoke: stable hosted lane forheardTestsapp-ui: stableheardUITestsclasses onlystableandall: default full merge gate
Commands:
./scripts/test-ios.sh app-ui-gestures
./scripts/test-ios.sh app-ui-gestures-repeat 10
./scripts/test-ios.sh experimentalMeaning:
app-ui-gestures: gesture-only UI suiteapp-ui-gestures-repeat 10: repeated gesture reliability runexperimental: VoiceCore perf plus the hosted experimental plan
Performance tests remain experimental until the repo has enough repeated-run evidence to treat them as budgets rather than instrumentation.
Use:
VoiceCorescheme for module logic and VoiceCore perfheardscheme withheard-stablefor default hosted and stable UI workheardscheme withheard-experimentalfor gesture and hosted perf work
Shared plans:
app/TestPlans/heard-stable.xctestplanapp/TestPlans/heard-experimental.xctestplan
The Xcode-native default path is the shared heard-stable plan plus the standalone VoiceCore scheme.
Preferred default target:
- device:
iPhone 17 Pro - runtime:
iOS 26.2
scripts/test-ios.sh resolves the simulator in this order:
IOS_SIMULATOR_DESTINATIONIOS_SIMULATOR_ID- exact
iPhone 17 ProoniOS 26.2 iPhone 17 Proon the newest installed iOS runtime- newest available iPhone simulator
The script prints the destination it selected before running tests.
UITEST_SCENARIOHEARD_SKIP_WARMUPHEARD_ENABLE_GESTURE_UI_TESTSIOS_SIMULATOR_IDIOS_SIMULATOR_DESTINATIONDERIVED_DATA_PATH
Do not introduce one-off test flags without documenting them here.
Every UI test should launch through UIHarness.launchApp(scenario:).
Current scenario names:
editor_flowssearch_filteringkeyboard_dismissempty_stateattachments_basic
Rules:
- each class requests the scenario it needs explicitly
- scenario data stays deterministic and in-memory only
- scenario data is reset before each app launch
- new UI coverage should extend scenario fixtures rather than ad hoc launch data
Preferred summary flow after any run:
- identify the logical run with
--latest-runor--run <id> - read the grouped
.xcresultsummary - only then fall back to a single bundle with
--latestor--path - use
--allonly when you intentionally want historical directory aggregation
Commands:
./scripts/xcresult-summary.sh --latest-run
./scripts/xcresult-summary.sh --latest-run --json
./scripts/xcresult-summary.sh --run <run-id>
./scripts/xcresult-summary.sh --run <run-id> --json
./scripts/xcresult-summary.sh --latest
./scripts/xcresult-summary.sh --latest --json
./scripts/xcresult-summary.sh --latest --markdown
./scripts/xcresult-summary.sh --path <bundle>
./scripts/xcresult-summary.sh --all
./scripts/xcresult-summary.sh --all --jsonUse --latest-run --json for automation and AI triage by default. Use markdown for CI or PR summaries. Use --all only for historical directory-level inspection.
AI agents should follow this order:
- run the smallest relevant command
- inspect
./scripts/xcresult-summary.sh --latest-run --json - classify the failure
- decide the next command before rerunning
Failure classes:
- compile/build failure
- module logic failure
- app-host failure
- stable UI regression
- experimental gesture instability
- performance regression
Expected next action by class:
- compile/build failure: fix project or compile issues first
- module logic failure: stay in
VoiceCoreTests - app-host failure: inspect
heardTests,HeardChefApp, and hosted wiring - stable UI regression: inspect identifiers, scenario seeding, and navigation assumptions
- experimental gesture instability: use repeated runs and
.xcresultattachments before changing coverage - performance regression: rerun the focused perf class before changing any budget language
Only promote an experimental test into the stable lane when:
- it passes repeated local runs
- it passes repeated CI runs
- it needs no undocumented simulator setup
- failures are diagnosable from
.xcresult - adding it keeps the stable path trustworthy and fast enough
This currently applies most directly to KeyboardDismissUITests.
Current note:
- the inventory add/edit sheets still allow two valid experimental swipe-down outcomes:
- the focused field blurs
- the sheet dismisses entirely
- this remains an owned experimental behavior overlap, not stable-lane semantics
./scripts/test-ios.sh voicecore- if app integration changed,
./scripts/test-ios.sh app-smoke
./scripts/test-ios.sh app-build./scripts/test-ios.sh app-smoke./scripts/test-ios.sh app-ui./scripts/xcresult-summary.sh --latest
./scripts/test-ios.sh app-ui-gestures./scripts/test-ios.sh app-ui-gestures-repeat 10./scripts/xcresult-summary.sh --path <failing bundle>
xcodebuild ... -only-testing:VoiceCoreTests/VoiceCorePerformanceTestsxcodebuild ... -testPlan heard-experimental -only-testing:heardTests/AppStartupPerformanceTests- compare repeated-run spread before treating a value like a budget
Still use physical devices for:
- Bluetooth and route truth
- receiver and speaker truth
- CallKit activation and interruption truth
- camera capture fidelity
- richer attachment and media flows