I would like Kokoro TTS to support SSML (Speech Synthesis Markup Language) tags to enable fine-grained control over speech output, such as pitch, rate, emphasis, and pauses. Additionally, enabling eSpeak support for processing SSML tags would enhance flexibility for voice synthesis.
Kokoro’s TTS engine is already fully functional, with latency around 250–300 ms, and adding SSML support on top of the existing model will make it more powerful and versatile for various applications.
###
Describe alternatives you've considered:
Currently, I haven’t found a way to use SSML with Kokoro. While PRs could potentially add basic parsing, native support for SSML or eSpeak integration would provide a more seamless and reliable solution.
###
Additional context:
This feature would allow developers to produce more expressive and natural-sounding speech, useful for accessibility tools, virtual assistants, and interactive applications.
I would like Kokoro TTS to support SSML (Speech Synthesis Markup Language) tags to enable fine-grained control over speech output, such as pitch, rate, emphasis, and pauses. Additionally, enabling eSpeak support for processing SSML tags would enhance flexibility for voice synthesis.
Kokoro’s TTS engine is already fully functional, with latency around 250–300 ms, and adding SSML support on top of the existing model will make it more powerful and versatile for various applications.
###
Describe alternatives you've considered:
Currently, I haven’t found a way to use SSML with Kokoro. While PRs could potentially add basic parsing, native support for SSML or eSpeak integration would provide a more seamless and reliable solution.
###
Additional context:
This feature would allow developers to produce more expressive and natural-sounding speech, useful for accessibility tools, virtual assistants, and interactive applications.