MiraTTS

Fork changes to work with German language

This Fork allow to use the German Mira TTS variant made by Sebastian Bodza https://huggingface.co/SebastianBodza/MiraToffel_miraTTS_german

With the help of Sebastian in this thread https://huggingface.co/SebastianBodza/MiraToffel_miraTTS_german/discussions/1 the logic for sentence splitting for German language required entire adjustments. Sentences are now split by punctuation marks, which are typical in German language to end sentences. Before the split was done on capital letters. With the help of GPT 5.2 the programming was adjusted and works now for German language with improve chunking.

Introduction

MiraTTS is a finetune of the excellent Spark-TTS model for enhanced realism and stability performing on par with closed source models. This repository also heavily optimizes Mira with Lmdeploy and boosts quality by using FlashSR to generate high quality audio at over 100x realtime!

Key benefits

Incredibly fast: Over 100x realtime by using Lmdeploy and batching.
High quality: Generates clear and crisp 48khz audio outputs which is much higher quality then most models.
Memory efficient: Works within 6gb vram.
Low latency: Latency can be low as 100ms.

Usage

Simple 1 line installation:

uv pip install git+https://github.com/ysharma3501/MiraTTS.git

Running the model(bs=1):

from mira.model import MiraTTS
from IPython.display import Audio
mira_tts = MiraTTS('YatharthS/MiraTTS') ## downloads model from huggingface

file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
text = "Alright, so have you ever heard of a little thing named text to speech? Well, it allows you to convert text into speech! I know, that's super cool, isn't it?"

context_tokens = mira_tts.encode_audio(file)
audio = mira_tts.generate(text, context_tokens)

Audio(audio, rate=48000)

Running the model using batching:

file = "reference_file.wav" ## can be mp3/wav/ogg or anything that librosa supports
text = ["Hey, what's up! I am feeling SO happy!", "Honestly, this is really interesting, isn't it?"]

context_tokens = [mira_tts.encode_audio(file)]

audio = mira_tts.batch_generate(text, context_tokens)

Audio(audio, rate=48000)

Examples can be seen in the huggingface model

I recommend reading these 2 blogs to better easily understand LLM tts models and how I optimize them

How they work: https://huggingface.co/blog/YatharthS/llm-tts-models
How to optimize them: https://huggingface.co/blog/YatharthS/making-neutts-200x-realtime

Next steps

Release code and model
Support low latency streaming
Release native 48khz bicodec
Support multilingual models

Final notes

This fork is based on the project of https://github.com/Si-ris-B

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
app		app
docs		docs
mira		mira
.gitignore		.gitignore
API_README.md		API_README.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiraTTS

Fork changes to work with German language

Introduction

Key benefits

Usage

Next steps

Final notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiraTTS

Fork changes to work with German language

Introduction

Key benefits

Usage

Next steps

Final notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages