Add first-pass Windows Voice Mode#120
Conversation
…lient.cs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ovider route kinds Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/f2ae3d04-4f08-49c2-8095-9e801a4ccf6d Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com>
…ws-node into feature/voice-mode
…aming provider route kinds" Reverts CoPilot fix This reverts commit 78d0a3d.
This comment has been minimized.
This comment has been minimized.
Move voice-mode test-targeted logic out of the WinUI app and into a dedicated shared project so tray tests no longer need to reference OpenClaw.Tray.WinUI directly. This restores the original CI assumption that the tray test project can be built on its own without transitively building a Windows App SDK application with an implicit architecture. It also keeps the voice/chat extraction scoped away from the broader OpenClaw.Shared library, which remains general-purpose and non-tray-specific. The new OpenClaw.Tray.Shared project now contains the shared voice/chat surface used by both the tray app and tray tests, including voice transport helpers, provider catalog loading, cloud TTS support, chat coordination, and the web chat DOM bridge. The WinUI app retains the UI shell pieces, including DispatcherQueueAdapter and the app-level icon path helper. As a follow-up cleanup during the extraction, split the previous IconHelper into AppIconHelper in the WinUI project and VoiceTrayIconHelper in the shared tray project so the new shared library stays focused on voice-related behavior rather than wider tray infrastructure.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Refactor tray voice code into OpenClaw.Tray.Shared
|
I introduced a new Before this change, the tray test project had to reference I kept If this doesn't work for you, and you'd prefer that stuff to go into OpenClaw.Shared, and change the TargetFramework instead, then let me know and I'll refactor. I'm also addressing the points raised by Repo Assist above. This stuff is f**kin' magic! :) |
This comment has been minimized.
This comment has been minimized.
Cover the pure shared logic in VoiceProviderConfigurationStoreExtensions with focused unit tests for case-insensitive provider lookup, case-insensitive setting lookup, SetValue creation/update behavior, and removal of blank or null values.
|
Increased test coverage as suggested |
Add tests for voice provider configuration helpers
This comment has been minimized.
This comment has been minimized.
# Conflicts: # tests/OpenClaw.Shared.Tests/OpenClawGatewayClientTests.cs
|
Hi @NichUK — Copilot here replying on Scott's behalf. Thanks again for the big push on this. I spent some time comparing the Windows approach here to the current Apple-side voice stack, and the high-level direction feels reasonable: keep OS/audio concerns local to the app, and keep OpenClaw/gateway responsible for normal chat/session routing.
Overall, I think Voice Mode is a good idea for this repo — especially as a thin, optional local voice layer over the existing chat architecture. The main thing that gives me pause isn't the philosophy; it's the size/scope of the merge and how much it asks reviewers to reason about at once. So my friendly suggestion would be:
Promising direction overall — just worth keeping the architecture tight so it stays maintainable. |
|
Apologies for the delay. |
|
🤖 This is an automated response from Repo Assist.
If it helps, the test patterns added in the previous iteration ( Looking forward to the new PR! 👍
|
|
Thanks again for the huge amount of work here, @NichUK. This PR sketches out a much broader Voice Mode vision: STT, TTS, repeater UI, provider plumbing, and WebChat integration. We’re going to land the focused TTS capability slice via #253 first because it is smaller, tested, and easier to review safely. I see that as a foundation for this work rather than a rejection of it. After #253 lands, could you rebase this branch and split the remaining Voice Mode pieces into smaller follow-up PRs? The most useful next slices would probably be:
Looping in @RBrid as well since #253 overlaps with the TTS portion. It would be great if the two of you can coordinate so the next PRs build cleanly on the shared |
Adds a focused Windows node text-to-speech capability as the first stable voice-support primitive. - adds the shared `tts.speak` capability and MCP/gateway documentation - wires Windows and ElevenLabs TTS behind opt-in tray settings - protects the ElevenLabs API key with DPAPI - adds shared and tray tests for capability behavior, settings, and ElevenLabs requests This lands the focused TTS foundation from the broader Voice Mode discussion in #120 so remaining voice UX/STT/repeater work can build on top in smaller follow-up PRs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
#253 has landed on The best next step for this broader Voice Mode work is to rebase on current |
Summary
This PR adds the first-pass Windows Voice Mode implementation to the tray app. It's by no means finished, but the first feature-set is working. I apologise for the hugeness... Also there was quite a lot of experimentation and reversion so it's not quite as bad as it looks...
What works now
What didn't work
I tried to fully integrate with the WebChat UI, but couldn't achieve it without nasty local DOM-writes, which is very hacky. Also the Windows STT (Windows.Media.SpeechRecognizer) works pretty well, but it has to have control of the entire pipeline, and we can't select an input device without changing the default devices.
Coming Next
Notes
I kept the architecture intentionally close to the existing tray/node model and documented the current and planned states in
docs/VOICE-MODE.mdas well as the architecture. Also made as few touch points to the existing app as possible to minimise change risk,Happy to receive notes/change requests before merging, etc., and attempt to deal with issues if anyone actually uses it! :)