dynamo/docs/features/multimodal/README.md at main · CentML/dynamo

title	Multimodal Model Serving
subtitle	Deploy multimodal models with image, video, and audio support in Dynamo

Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text.

**Security Requirement**: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation ([vLLM](multimodal-vllm.md), [SGLang](multimodal-sglang.md), [TRT-LLM](multimodal-trtllm.md)) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.

---
title: Sample flow for an aggregated VLM serving scenario
---
flowchart TD
    A[Request] --> B{KV cache hit?}
    B -->|Yes| C[Use KV]
    B -->|No| D{Embedding cache hit?}
    D -->|Yes| E[Load embedding]
    D -->|No| F[Run encoder]
    F --> G[save to cache]
    G --> H["PREFILL (image tokens + text tokens → KV cache)"]
    E --> H
    C --> I[DECODE]
    H --> I
    I --> J[Response]

Key Features

Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:

Feature	Description
Embedding Cache	CPU-side LRU cache that skips re-encoding repeated images
Encoder Disaggregation	Separate vision encoder worker for independent scaling
Multimodal KV Routing	MM-aware KV cache routing for optimal worker selection

Support Matrix

Stack	Image	Video	Audio
vLLM	✅	🧪	🧪
TRT-LLM	✅	❌	❌
SGLang	✅	🧪	❌

Status: ✅ Supported | 🧪 Experimental | ❌ Not supported

Example Workflows

Reference implementations for deploying multimodal models:

Backend Documentation

Detailed deployment guides, configuration, and examples for each backend:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features

Support Matrix

Example Workflows

Backend Documentation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Key Features

Support Matrix

Example Workflows

Backend Documentation