This repo is my first Custom node with my very basic knowledge of coding, ComfyUI_srt2speech
I tested my srt2speech with other ComfyUI_MegaTTS custom node on the Runpod, with ComfyUI native installing.
I make this custom node to read srt subtitle file and send it to MegaTTS node to generate dub Eng, or Chinese for now.
It should work with other TTS or similar for ComfyUI, in the end I will combine with Wan2.1 or any other Lipsync.
My aim try to do !1CLICK make all the job.
This instruction for myself is using Runpod.com Gpu cloud service, if you use your own local computer, please check your file path.
- Install ComfyUI-MegaTTS custom node via custom manager ComfyUI-MegaTTS by AIlab
- Go to custom node,
pip install -r requirements.txt
- Install ffmpeg, ffmprobe
apt update && apt install -y ffmpeg
- restart ComfyUI
- create your own 10sec-15sec *.wav file with mono! 16-24 kHz, or take any .wav file in this asset folder
- use Voice maker to download all the model first time !
- it will create new folder in /workspace/ComfyUI/models/TTS and all models from MegaTTS3
- copy .wav and .npy from assets example files into /workspace/ComfyUI/custom_nodes/ComfyUI-MegaTTS/voices
- now it ready to do Text2speech
===== to clone your own voice model from MegaTTS custom node ===
-
I use Audacity to record myself about 10-15sec, reading some English clearly avoid noise
-
Export it as a "*.wav" (Make file name for your to remember easily) with Mono, 16 - 24 kHz and upload here "wav_queue" folder of Google drive
**PS support only English, Chinese
** For security reason, they don't release encoder -
Wait a few days, Developer will give you a file with trained to your voice *.npy and *.wav here "user_batch_1"
To install my repo
- use git clone then activate virtual environment
source myvenv/bin/activate
- pip instal the requirements.txt
To download files in my asset to the Runpod workspace, activate venv first. If you use your local computer just copy/paste and ignore code below
These files for my own Convenient to use this demo voice for my code, it is the same from original TTS3 repo in the google drive above. If you are using Runpod, you can use code below to copy it into ComfyUI_MegaTTS/voices
source myvenv/bin/activate
To download all *.wav and *.npy from my asset folder, copy and paste the code below into the Runpod terminal in side voice folder of MegaTTS custom node
MegaTTS node require, .wav file or .nyp file put in their custom node | ComfyUI/custom_nodes/ComfyUI-MegaTTS/voices folder
python3 -c "
import os, requests
out = '/workspace/ComfyUI/custom_nodes/ComfyUI-MegaTTS/voices'
os.makedirs(out, exist_ok=True)
api_url = 'https://api.github.com/repos/gordon123/ComfyUI_srt2speech/contents/assets/wav-npy'
r = requests.get(api_url)
for f in r.json():
if f['name'].endswith(('.wav', '.npy')):
print('Downloading', f['name'])
data = requests.get(f['download_url']).content
with open(os.path.join(out, f['name']), 'wb') as out_file:
out_file.write(data)
"
To use my Custom node If you are using ComfyUI-MegaTTS same as mine, you can follow instruction here, otherwise other TTS should also work.
- install MegaTTS of AIlab via Custom manager
- go to ComfyUI-MegaTTS and run
pip install -r requirements.txt
- If you use local computer, check if you have ffmpeg, for me I use Runpod GPU cloud, linux system run this command
apt update && apt install -y ffmpeg
Researt Comfyui server, reload webui page
-
Go to my ComfyUI_srt2speech download any models of .wav and .npy both must be the same name, and use wav file upload into voice maker click Run first time. **For more example how to clone your own voice go to ComfyUI-MegaTTS github page.
-
Run MegaTTS voice maker once. It will be downloaded some models in this folder models/TTS/MegaTTS3
-
Restart ComfyUi, refresh webui
-
Upload all *.wav and *.npy into the ComfyUI-MegaTTS/voices/ folder (small voices)
-
Refresh webui and now you can use reference_voice in the MegaTTS3

-
set Run to run instant to be auto generated text2speech the whole subtitle

