Skip to content

Support for SSML Tags and eSpeak Integration in Kokoro TTS #396

@muhammad-asif78

Description

@muhammad-asif78

I would like Kokoro TTS to support SSML (Speech Synthesis Markup Language) tags to enable fine-grained control over speech output, such as pitch, rate, emphasis, and pauses. Additionally, enabling eSpeak support for processing SSML tags would enhance flexibility for voice synthesis.

Kokoro’s TTS engine is already fully functional, with latency around 250–300 ms, and adding SSML support on top of the existing model will make it more powerful and versatile for various applications.
###
Describe alternatives you've considered:

Currently, I haven’t found a way to use SSML with Kokoro. While PRs could potentially add basic parsing, native support for SSML or eSpeak integration would provide a more seamless and reliable solution.
###
Additional context:

This feature would allow developers to produce more expressive and natural-sounding speech, useful for accessibility tools, virtual assistants, and interactive applications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions