MUTON-Android is the Android client for MUTON, a real-time multimodal dialogue assistance system for hearing-impaired users. The app captures camera frames and microphone audio, streams them to the MUTON backend, and displays subtitles, visual emotion cues, and multimodal summaries in a mobile interface.
- Overview
- App Features
- Installation
- Backend Connection
- Running On Device
- Screens
- Related Repository
- Project Structure
- License
The Android app is the user-facing part of MUTON. While the backend handles STT, face/audio processing, and Qwen2.5-Omni based summary generation, this client focuses on real-time capture, request synchronization, and presenting the result in a practical conversation flow.
In Graduation Project 2, the app was refined for a more stable live demo. It now discovers the active backend address dynamically, keeps the UI compatible with backend model changes, and routes conversation record summaries through the server so API keys are not stored inside the APK.
- Camera frame capture for visual context
- Microphone audio capture for streaming STT
- Real-time requests to the MUTON backend
- Subtitle display from finalized speech utterances
- Visual emotion display from face-frame analysis
- Multimodal summary display from backend fusion analysis
- Conversation record screens and server-side record summary requests
- Dynamic backend URL loading from the main MUTON repository
Open the project in Android Studio.
Recommended project configuration:
compileSdk: 35minSdk: 28targetSdk: 35- package namespace:
com.example.myapplication
Build the app module from Android Studio, or use Gradle from the project root:
./gradlew :app:assembleDebugOn Windows PowerShell:
.\gradlew.bat :app:assembleDebugThe app reads the active backend URL from the main MUTON repository:
https://raw.githubusercontent.com/Ai-pre/MUTON/server_main/backend_url.json
The backend URL file points to the current Cloudflare Tunnel address. This keeps the Android app stable even when the server tunnel changes between demos.
Runtime relationship:
- Android sends audio chunks to
/process_audio_chunk. - Android sends camera frames to
/process_video_chunk. - Android requests multimodal summaries from
/get_fusion_analysis. - Android requests conversation record summaries from
/summarize_conversation_record.
Before launching the app, make sure the backend server is running and the current Cloudflare Tunnel URL has been published to backend_url.json.
The device or emulator must allow:
- Camera access
- Microphone access
- Network access
For the live demo, a physical Android device is recommended because camera, microphone, and network timing are closer to the intended use case.
Backend runtime, API implementation, model training scripts, dataset processing, and wiki documentation are maintained in the main MUTON repository:
MUTON-Android/
app/
src/main/java/com/example/myapplication/
MainActivity.kt
OpenAiSummaryService.kt
ConversationRecordStore.kt
RecordDetailActivity.kt
HomeActivity.kt
SettingsActivity.kt
src/main/res/
gradle/
build.gradle.kts
settings.gradle.kts

