Skip to content

feat: add ModelsLab as TTS and Video generation provider#3288

Open
adhikjoshi wants to merge 2 commits intosimstudioai:mainfrom
adhikjoshi:feat/modelslab-provider
Open

feat: add ModelsLab as TTS and Video generation provider#3288
adhikjoshi wants to merge 2 commits intosimstudioai:mainfrom
adhikjoshi:feat/modelslab-provider

Conversation

@adhikjoshi
Copy link

@adhikjoshi adhikjoshi commented Feb 21, 2026

Summary

This PR adds ModelsLab as a new provider for both Text-to-Speech (TTS) and Video generation in Sim Studio.

Changes

New Files

  • apps/sim/tools/tts/modelslab.ts — TTS tool routing to /api/tools/tts/unified
  • apps/sim/tools/video/modelslab.ts — Video tool routing to /api/tools/video

Modified Files

  • apps/sim/tools/tts/index.ts — Export modelsLabTtsTool
  • apps/sim/tools/tts/types.ts — Add 'modelslab' to TtsProvider union type
  • apps/sim/tools/video/index.ts — Export modelsLabVideoTool
  • apps/sim/tools/video/types.ts — Add 'modelslab' to VideoParams.provider + new fields (imageUrl, width, height, num_frames)
  • apps/sim/blocks/blocks/tts.ts — Add ModelsLab to provider dropdown with voice, language, and speed sub-blocks
  • apps/sim/blocks/blocks/video_generator.ts — Add ModelsLab to provider dropdown with mode, imageUrl, width, height sub-blocks (both V1 and V2 blocks)
  • apps/sim/app/api/tools/tts/unified/route.ts — Add synthesizeWithModelsLab() with async polling
  • apps/sim/app/api/tools/video/route.ts — Add generateWithModelsLab() with async polling
  • apps/sim/tools/registry.ts — Register tts_modelslab and video_modelslab

ModelsLab API

  • Base URL: https://modelslab.com/api/v6/
  • Auth: { "key": "API_KEY" } in JSON request body
  • TTS endpoint: POST /voice/text_to_speech → async polling via POST /voice/fetch
  • Video (text2video): POST /video/text2video → async polling via POST /video/fetch/{id}
  • Video (img2video): POST /video/img2video → async polling via POST /video/fetch/{id}

TTS Features

  • Voice selection (Madison, Joanna, Matthew, Salli, etc.)
  • Language support (English, Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Chinese)
  • Speed control (0.5–2.0)

Video Features

  • Text-to-Video mode
  • Image-to-Video mode (with image URL input)
  • Configurable width and height (256/512/768/1024px)
  • Async job polling with timeout

- Add tools/tts/modelslab.ts: TTS tool using ModelsLab voice API
- Add tools/video/modelslab.ts: Video generation tool (text2video/img2video)
- Update tools/tts/index.ts: export modelsLabTtsTool
- Update tools/video/index.ts: export modelsLabVideoTool
- Update tools/tts/types.ts: add 'modelslab' to TtsProvider union
- Update tools/video/types.ts: add 'modelslab' to VideoParams provider + ModelsLab fields
- Update blocks/blocks/tts.ts: add ModelsLab provider option + voice/language/speed sub-blocks
- Update blocks/blocks/video_generator.ts: add ModelsLab provider + mode/imageUrl/width/height sub-blocks
- Update app/api/tools/tts/unified/route.ts: add synthesizeWithModelsLab() with async polling
- Update app/api/tools/video/route.ts: add generateWithModelsLab() with async polling
- Update tools/registry.ts: register tts_modelslab and video_modelslab tools

ModelsLab API: https://modelslab.com/api/v6/
- TTS: POST /voice/text_to_speech with async polling via /voice/fetch
- Video: POST /video/text2video or /video/img2video with async polling via /video/fetch/{id}
- Auth: 'key' field in JSON body
@vercel
Copy link

vercel bot commented Feb 21, 2026

@adhikjoshi is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@adhikjoshi adhikjoshi changed the title feat: Add ModelsLab as TTS and Video generation provider feat: add ModelsLab as TTS and Video generation provider Feb 21, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 21, 2026

Greptile Summary

Added ModelsLab as a new provider for both TTS and video generation. The integration follows established patterns with async polling for job completion.

  • TTS: supports voice selection, language, and speed control via /api/v6/voice/text_to_speech endpoint
  • Video: supports text-to-video and image-to-video modes via /api/v6/video/text2video and /api/v6/video/img2video
  • Both implementations use async polling with timeouts (30 attempts for TTS, 60 for video)
  • API keys correctly use user-only visibility per custom rule 2851870a
  • Properly registered in tool registry and exported from index files

Issue found: img2video mode missing required validation - when mode === 'img2video', imageUrl parameter is required but not validated before making the API call, which could result in API errors.

Confidence Score: 4/5

  • Safe to merge with one validation fix needed
  • Implementation follows existing patterns and adheres to style guides. API keys use correct visibility. One logical issue exists: missing validation for required imageUrl in img2video mode could cause runtime errors. Otherwise, code is well-structured with proper error handling and async polling.
  • Pay close attention to apps/sim/app/api/tools/video/route.ts - add validation for img2video mode

Important Files Changed

Filename Overview
apps/sim/tools/tts/modelslab.ts new ModelsLab TTS tool with correct API key visibility and proper error handling
apps/sim/tools/video/modelslab.ts new ModelsLab video tool with proper structure and API key visibility
apps/sim/app/api/tools/tts/unified/route.ts added ModelsLab TTS synthesis with async polling, proper error handling
apps/sim/app/api/tools/video/route.ts added ModelsLab video generation but missing required validation for img2video mode
apps/sim/blocks/blocks/tts.ts added ModelsLab provider option with voice, language, and speed configuration
apps/sim/blocks/blocks/video_generator.ts added ModelsLab provider to both V1 and V2 blocks with mode, dimensions, and imageUrl config

Sequence Diagram

sequenceDiagram
    participant User
    participant Block as TTS/Video Block
    participant Tool as ModelsLab Tool
    participant API as API Route
    participant ModelsLab as ModelsLab API

    User->>Block: Configure provider=modelslab
    Block->>Tool: Execute with params
    Tool->>API: POST /api/tools/tts/unified or /api/tools/video
    API->>ModelsLab: POST text_to_speech or text2video/img2video
    ModelsLab-->>API: {status: processing, id: xxx}
    
    loop Poll until complete (30-60 attempts)
        API->>ModelsLab: POST /voice/fetch or /video/fetch/{id}
        alt Success
            ModelsLab-->>API: {status: success, output: url}
            API->>ModelsLab: Download audio/video from URL
            ModelsLab-->>API: Binary data
        else Still Processing
            ModelsLab-->>API: {status: processing}
        else Error
            ModelsLab-->>API: {status: error/failed}
        end
    end
    
    API-->>Tool: audioUrl/videoUrl
    Tool-->>Block: Response with file
    Block-->>User: Generated audio/video
Loading

Last reviewed commit: 30de635

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +992 to +994
if (isImg2Video && imageUrl) {
requestBody.init_image = imageUrl
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing validation for img2video mode - imageUrl is required when mode === 'img2video' but not validated before API call

Suggested change
if (isImg2Video && imageUrl) {
requestBody.init_image = imageUrl
}
if (isImg2Video && !imageUrl) {
throw new Error('imageUrl is required for img2video mode')
}
if (isImg2Video && imageUrl) {

@adhikjoshi
Copy link
Author

Thanks for the review! I will add the img2video validation fix shortly.

@adhikjoshi
Copy link
Author

I created a fix PR for the img2video validation: adhikjoshi#2

The fix adds:

if (isImg2Video && !imageUrl) {
  throw new Error(\"imageUrl is required for img2video mode\")
}

Would you like me to close that PR and you can cherry-pick/merge the commit, or would you prefer another approach?

@adhikjoshi
Copy link
Author

Fix Applied

I have pushed the fix for the img2video validation to my fork: adhikjoshi#2

The fix adds this validation before the API call:

if (isImg2Video && !imageUrl) {
  throw new Error('imageUrl is required for img2video mode')
}

You can either:

  1. Pull commit 77b846b from adhikjoshi/sim@fix/modelslab-img2video-validation
  2. Or let me know if you'd like me to update the PR directly

Sorry for the delay - I had to work around GitHub branch restrictions.

When model is set to 'img2video', the imageUrl parameter is required.
This validation was flagged in the Greptile review and is now fixed.
@cursor
Copy link

cursor bot commented Mar 6, 2026

PR Summary

Medium Risk
Adds new third-party ModelsLab integrations in the TTS and video proxy APIs, including async job polling and downloading remote media, which can affect reliability/timeouts and error handling paths.

Overview
ModelsLab is now supported as a new provider for both Text-to-Speech and video generation.

The unified TTS and video API routes add modelslab branches that call ModelsLab endpoints, poll async jobs to completion, and download the resulting media before storing it like other providers.

Tooling/UI is extended to expose ModelsLab in the TTS and Video Generator blocks (voice/language/speed for TTS; text2video/img2video mode plus image URL and dimensions for video), and the new tts_modelslab/video_modelslab tools are added to exports and registered in the tool registry/types.

Written by Cursor Bugbot for commit 3f9cda1. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Mar 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Mar 6, 2026 5:37am

Request Review

@adhikjoshi
Copy link
Author

Fixed! I've pushed the img2video validation fix to the branch (commit 3f9cda1).

The validation now throws a clear error: imageUrl is required for img2video mode when model is set to 'img2video' but no imageUrl is provided.

The fix is a simple validation check before making the API call:

if (isImg2Video && !imageUrl) {
  throw new Error('imageUrl is required for img2video mode')
}

Let me know if any other changes are needed!

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}

attempts++
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTS polling timeout exceeds route max duration

High Severity

The synthesizeWithModelsLab polling loop can run for up to 90 seconds (30 attempts × 3-second intervals), but the TTS unified route has maxDuration = 60 seconds. The serverless function will be terminated before the polling loop completes, causing a silent failure or platform timeout error for any ModelsLab TTS request that goes async. The video route correctly uses maxDuration = 600 for its 5-minute polling window.

Additional Locations (1)

Fix in Cursor Fix in Web

format: 'mp3',
mimeType: 'audio/mpeg',
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code in TTS and video success handlers

Low Severity

The audio/video download-and-return logic is fully duplicated in both the polling-success and immediate-success branches of synthesizeWithModelsLab (TTS) and generateWithModelsLab (video). Each function contains two identical blocks that fetch the output URL, convert to a Buffer, and return the result. Extracting a small helper within each function would eliminate this duplication.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant