feat: add ModelsLab as TTS and Video generation provider#3288
feat: add ModelsLab as TTS and Video generation provider#3288adhikjoshi wants to merge 2 commits intosimstudioai:mainfrom
Conversation
- Add tools/tts/modelslab.ts: TTS tool using ModelsLab voice API - Add tools/video/modelslab.ts: Video generation tool (text2video/img2video) - Update tools/tts/index.ts: export modelsLabTtsTool - Update tools/video/index.ts: export modelsLabVideoTool - Update tools/tts/types.ts: add 'modelslab' to TtsProvider union - Update tools/video/types.ts: add 'modelslab' to VideoParams provider + ModelsLab fields - Update blocks/blocks/tts.ts: add ModelsLab provider option + voice/language/speed sub-blocks - Update blocks/blocks/video_generator.ts: add ModelsLab provider + mode/imageUrl/width/height sub-blocks - Update app/api/tools/tts/unified/route.ts: add synthesizeWithModelsLab() with async polling - Update app/api/tools/video/route.ts: add generateWithModelsLab() with async polling - Update tools/registry.ts: register tts_modelslab and video_modelslab tools ModelsLab API: https://modelslab.com/api/v6/ - TTS: POST /voice/text_to_speech with async polling via /voice/fetch - Video: POST /video/text2video or /video/img2video with async polling via /video/fetch/{id} - Auth: 'key' field in JSON body
|
@adhikjoshi is attempting to deploy a commit to the Sim Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryAdded ModelsLab as a new provider for both TTS and video generation. The integration follows established patterns with async polling for job completion.
Issue found: img2video mode missing required validation - when Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Block as TTS/Video Block
participant Tool as ModelsLab Tool
participant API as API Route
participant ModelsLab as ModelsLab API
User->>Block: Configure provider=modelslab
Block->>Tool: Execute with params
Tool->>API: POST /api/tools/tts/unified or /api/tools/video
API->>ModelsLab: POST text_to_speech or text2video/img2video
ModelsLab-->>API: {status: processing, id: xxx}
loop Poll until complete (30-60 attempts)
API->>ModelsLab: POST /voice/fetch or /video/fetch/{id}
alt Success
ModelsLab-->>API: {status: success, output: url}
API->>ModelsLab: Download audio/video from URL
ModelsLab-->>API: Binary data
else Still Processing
ModelsLab-->>API: {status: processing}
else Error
ModelsLab-->>API: {status: error/failed}
end
end
API-->>Tool: audioUrl/videoUrl
Tool-->>Block: Response with file
Block-->>User: Generated audio/video
Last reviewed commit: 30de635 |
| if (isImg2Video && imageUrl) { | ||
| requestBody.init_image = imageUrl | ||
| } |
There was a problem hiding this comment.
missing validation for img2video mode - imageUrl is required when mode === 'img2video' but not validated before API call
| if (isImg2Video && imageUrl) { | |
| requestBody.init_image = imageUrl | |
| } | |
| if (isImg2Video && !imageUrl) { | |
| throw new Error('imageUrl is required for img2video mode') | |
| } | |
| if (isImg2Video && imageUrl) { |
|
Thanks for the review! I will add the img2video validation fix shortly. |
|
I created a fix PR for the img2video validation: adhikjoshi#2 The fix adds: if (isImg2Video && !imageUrl) {
throw new Error(\"imageUrl is required for img2video mode\")
}Would you like me to close that PR and you can cherry-pick/merge the commit, or would you prefer another approach? |
Fix AppliedI have pushed the fix for the img2video validation to my fork: adhikjoshi#2 The fix adds this validation before the API call: if (isImg2Video && !imageUrl) {
throw new Error('imageUrl is required for img2video mode')
}You can either:
Sorry for the delay - I had to work around GitHub branch restrictions. |
When model is set to 'img2video', the imageUrl parameter is required. This validation was flagged in the Greptile review and is now fixed.
PR SummaryMedium Risk Overview The unified TTS and video API routes add Tooling/UI is extended to expose ModelsLab in the TTS and Video Generator blocks (voice/language/speed for TTS; text2video/img2video mode plus image URL and dimensions for video), and the new Written by Cursor Bugbot for commit 3f9cda1. This will update automatically on new commits. Configure here. |
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
|
Fixed! I've pushed the img2video validation fix to the branch (commit 3f9cda1). The validation now throws a clear error: The fix is a simple validation check before making the API call: if (isImg2Video && !imageUrl) {
throw new Error('imageUrl is required for img2video mode')
}Let me know if any other changes are needed! |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } | ||
|
|
||
| attempts++ | ||
| } |
There was a problem hiding this comment.
TTS polling timeout exceeds route max duration
High Severity
The synthesizeWithModelsLab polling loop can run for up to 90 seconds (30 attempts × 3-second intervals), but the TTS unified route has maxDuration = 60 seconds. The serverless function will be terminated before the polling loop completes, causing a silent failure or platform timeout error for any ModelsLab TTS request that goes async. The video route correctly uses maxDuration = 600 for its 5-minute polling window.
Additional Locations (1)
| format: 'mp3', | ||
| mimeType: 'audio/mpeg', | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicate code in TTS and video success handlers
Low Severity
The audio/video download-and-return logic is fully duplicated in both the polling-success and immediate-success branches of synthesizeWithModelsLab (TTS) and generateWithModelsLab (video). Each function contains two identical blocks that fetch the output URL, convert to a Buffer, and return the result. Extracting a small helper within each function would eliminate this duplication.


Summary
This PR adds ModelsLab as a new provider for both Text-to-Speech (TTS) and Video generation in Sim Studio.
Changes
New Files
apps/sim/tools/tts/modelslab.ts— TTS tool routing to/api/tools/tts/unifiedapps/sim/tools/video/modelslab.ts— Video tool routing to/api/tools/videoModified Files
apps/sim/tools/tts/index.ts— ExportmodelsLabTtsToolapps/sim/tools/tts/types.ts— Add'modelslab'toTtsProviderunion typeapps/sim/tools/video/index.ts— ExportmodelsLabVideoToolapps/sim/tools/video/types.ts— Add'modelslab'toVideoParams.provider+ new fields (imageUrl,width,height,num_frames)apps/sim/blocks/blocks/tts.ts— Add ModelsLab to provider dropdown with voice, language, and speed sub-blocksapps/sim/blocks/blocks/video_generator.ts— Add ModelsLab to provider dropdown with mode, imageUrl, width, height sub-blocks (both V1 and V2 blocks)apps/sim/app/api/tools/tts/unified/route.ts— AddsynthesizeWithModelsLab()with async pollingapps/sim/app/api/tools/video/route.ts— AddgenerateWithModelsLab()with async pollingapps/sim/tools/registry.ts— Registertts_modelslabandvideo_modelslabModelsLab API
https://modelslab.com/api/v6/{ "key": "API_KEY" }in JSON request bodyPOST /voice/text_to_speech→ async polling viaPOST /voice/fetchPOST /video/text2video→ async polling viaPOST /video/fetch/{id}POST /video/img2video→ async polling viaPOST /video/fetch/{id}TTS Features
Video Features