Skip to content

Support direct image URL passing in file_data.file_uri for LiteLLM vision models #4112

@Victory-7291

Description

@Victory-7291

Description

Problem Statement

Currently, the ADK LiteLLM adapter does not support passing image URLs directly to vision models (like OpenAI's GPT-4 Vision, Qwen-VL, etc.) when using file_data.file_uri. This forces developers to download images, convert them to bytes, and use inline_data instead, which introduces unnecessary network overhead and CPU-intensive base64 encoding.

Current Behavior

When a Part contains file_data.file_uri with an image URL, the LiteLLM adapter converts it to "type": "file" format instead of "type": "image_url":

Current code (lite_llm.py:551-558):

elif part.file_data and part.file_data.file_uri:
    file_object: ChatCompletionFileUrlObject = {
        "file_id": part.file_data.file_uri,
    }
    content_objects.append({
        "type": "file",
        "file": file_object,
    })

This causes errors like:

litellm.exceptions.BadRequestError: OpenAIException - Failed to deserialize the JSON body into the target type: messages[1]: data did not match any variant of untagged enum ChatMessageContent

Expected Behavior

For image MIME types, file_data.file_uri should be converted to OpenAI's Vision API format:

elif part.file_data and part.file_data.file_uri:
    if part.file_data.mime_type.startswith("image/"):
        # For image URLs, use image_url format
        content_objects.append({
            "type": "image_url",
            "image_url": {"url": part.file_data.file_uri}
        })
    else:
        # For other file types, use existing file format
        file_object: ChatCompletionFileUrlObject = {
            "file_id": part.file_data.file_uri,
        }
        content_objects.append({
            "type": "file",
            "file": file_object,
        })

Use Case

This is particularly important for applications that:

  1. Store images in cloud storage (S3, GCS, etc.) with presigned URLs
  2. Process user-uploaded images through multimodal AI agents
  3. Need to minimize latency and bandwidth usage
  4. Want to avoid redundant downloads and base64 encoding

Current Workaround

Developers must implement custom callbacks to download images and convert to inline_data:

def vision_model_callback(callback_context, llm_request):
    for content in llm_request.contents:
        if content.role == 'user':
            new_parts = []
            for part in content.parts:
                if hasattr(part, 'file_data') and part.file_data:
                    file_uri = part.file_data.file_uri
                    if is_image_url(file_uri):
                        # Must download the image
                        response = httpx.get(file_uri)
                        image_data = response.content
                        
                        # Convert to inline_data
                        new_parts.append(types.Part(
                            inline_data=types.Blob(
                                mime_type='image/png',
                                data=image_data  # ADK will base64 encode
                            )
                        ))
            content.parts = new_parts

This workaround:

  • Adds network latency (download image from cloud storage)
  • Wastes CPU on unnecessary base64 encoding
  • Increases memory usage (storing image bytes)
  • Complicates application code

Benefits of This Feature

  1. Performance: Eliminates redundant image downloads
  2. Simplicity: Developers can use file_data.file_uri directly
  3. Consistency: Matches OpenAI Vision API's native URL support
  4. Cost efficiency: Reduces bandwidth and compute costs

Suggested Implementation

Modify _convert_content_parts_to_litellm in lite_llm.py to check MIME type and use image_url format for images:

elif part.file_data and part.file_data.file_uri:
    mime_type = part.file_data.mime_type or ""
    
    # Handle image URLs specially
    if mime_type.startswith("image/"):
        content_objects.append({
            "type": "image_url",
            "image_url": {
                "url": part.file_data.file_uri,
                # Optional: support detail parameter
                # "detail": "auto"  
            }
        })
    # Handle video URLs
    elif mime_type.startswith("video/"):
        content_objects.append({
            "type": "video_url",
            "video_url": {"url": part.file_data.file_uri}
        })
    # Keep existing file handling for other types
    else:
        file_object: ChatCompletionFileUrlObject = {
            "file_id": part.file_data.file_uri,
        }
        content_objects.append({
            "type": "file",
            "file": file_object,
        })

Additional Context

Environment

  • ADK Version: 1.21.0
  • Python Version: 3.12

Metadata

Metadata

Assignees

Labels

models[Component] Issues related to model support

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions