-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Description
Problem Statement
Currently, the ADK LiteLLM adapter does not support passing image URLs directly to vision models (like OpenAI's GPT-4 Vision, Qwen-VL, etc.) when using file_data.file_uri. This forces developers to download images, convert them to bytes, and use inline_data instead, which introduces unnecessary network overhead and CPU-intensive base64 encoding.
Current Behavior
When a Part contains file_data.file_uri with an image URL, the LiteLLM adapter converts it to "type": "file" format instead of "type": "image_url":
Current code (lite_llm.py:551-558):
elif part.file_data and part.file_data.file_uri:
file_object: ChatCompletionFileUrlObject = {
"file_id": part.file_data.file_uri,
}
content_objects.append({
"type": "file",
"file": file_object,
})This causes errors like:
litellm.exceptions.BadRequestError: OpenAIException - Failed to deserialize the JSON body into the target type: messages[1]: data did not match any variant of untagged enum ChatMessageContent
Expected Behavior
For image MIME types, file_data.file_uri should be converted to OpenAI's Vision API format:
elif part.file_data and part.file_data.file_uri:
if part.file_data.mime_type.startswith("image/"):
# For image URLs, use image_url format
content_objects.append({
"type": "image_url",
"image_url": {"url": part.file_data.file_uri}
})
else:
# For other file types, use existing file format
file_object: ChatCompletionFileUrlObject = {
"file_id": part.file_data.file_uri,
}
content_objects.append({
"type": "file",
"file": file_object,
})Use Case
This is particularly important for applications that:
- Store images in cloud storage (S3, GCS, etc.) with presigned URLs
- Process user-uploaded images through multimodal AI agents
- Need to minimize latency and bandwidth usage
- Want to avoid redundant downloads and base64 encoding
Current Workaround
Developers must implement custom callbacks to download images and convert to inline_data:
def vision_model_callback(callback_context, llm_request):
for content in llm_request.contents:
if content.role == 'user':
new_parts = []
for part in content.parts:
if hasattr(part, 'file_data') and part.file_data:
file_uri = part.file_data.file_uri
if is_image_url(file_uri):
# Must download the image
response = httpx.get(file_uri)
image_data = response.content
# Convert to inline_data
new_parts.append(types.Part(
inline_data=types.Blob(
mime_type='image/png',
data=image_data # ADK will base64 encode
)
))
content.parts = new_partsThis workaround:
- Adds network latency (download image from cloud storage)
- Wastes CPU on unnecessary base64 encoding
- Increases memory usage (storing image bytes)
- Complicates application code
Benefits of This Feature
- Performance: Eliminates redundant image downloads
- Simplicity: Developers can use
file_data.file_uridirectly - Consistency: Matches OpenAI Vision API's native URL support
- Cost efficiency: Reduces bandwidth and compute costs
Suggested Implementation
Modify _convert_content_parts_to_litellm in lite_llm.py to check MIME type and use image_url format for images:
elif part.file_data and part.file_data.file_uri:
mime_type = part.file_data.mime_type or ""
# Handle image URLs specially
if mime_type.startswith("image/"):
content_objects.append({
"type": "image_url",
"image_url": {
"url": part.file_data.file_uri,
# Optional: support detail parameter
# "detail": "auto"
}
})
# Handle video URLs
elif mime_type.startswith("video/"):
content_objects.append({
"type": "video_url",
"video_url": {"url": part.file_data.file_uri}
})
# Keep existing file handling for other types
else:
file_object: ChatCompletionFileUrlObject = {
"file_id": part.file_data.file_uri,
}
content_objects.append({
"type": "file",
"file": file_object,
})Additional Context
- OpenAI Vision API documentation: https://platform.openai.com/docs/guides/vision
- LiteLLM multimodal support: https://docs.litellm.ai/docs/providers/openai#multimodal-models
- Many vision models (Qwen-VL, Claude, Gemini, etc.) support direct URL input via LiteLLM
Environment
- ADK Version: 1.21.0
- Python Version: 3.12