Support direct image URL passing in file_data.file_uri for LiteLLM vision models

## Description

### Problem Statement

Currently, the ADK LiteLLM adapter does not support passing image URLs directly to vision models (like OpenAI's GPT-4 Vision, Qwen-VL, etc.) when using `file_data.file_uri`. This forces developers to download images, convert them to bytes, and use `inline_data` instead, which introduces unnecessary network overhead and CPU-intensive base64 encoding.

### Current Behavior

When a `Part` contains `file_data.file_uri` with an image URL, the LiteLLM adapter converts it to `"type": "file"` format instead of `"type": "image_url"`:

**Current code** (`lite_llm.py:551-558`):

```python
elif part.file_data and part.file_data.file_uri:
    file_object: ChatCompletionFileUrlObject = {
        "file_id": part.file_data.file_uri,
    }
    content_objects.append({
        "type": "file",
        "file": file_object,
    })
```

This causes errors like:

```
litellm.exceptions.BadRequestError: OpenAIException - Failed to deserialize the JSON body into the target type: messages[1]: data did not match any variant of untagged enum ChatMessageContent
```

### Expected Behavior

For image MIME types, `file_data.file_uri` should be converted to OpenAI's Vision API format:

```python
elif part.file_data and part.file_data.file_uri:
    if part.file_data.mime_type.startswith("image/"):
        # For image URLs, use image_url format
        content_objects.append({
            "type": "image_url",
            "image_url": {"url": part.file_data.file_uri}
        })
    else:
        # For other file types, use existing file format
        file_object: ChatCompletionFileUrlObject = {
            "file_id": part.file_data.file_uri,
        }
        content_objects.append({
            "type": "file",
            "file": file_object,
        })
```

### Use Case

This is particularly important for applications that:

1. Store images in cloud storage (S3, GCS, etc.) with presigned URLs
2. Process user-uploaded images through multimodal AI agents
3. Need to minimize latency and bandwidth usage
4. Want to avoid redundant downloads and base64 encoding

### Current Workaround

Developers must implement custom callbacks to download images and convert to `inline_data`:

```python
def vision_model_callback(callback_context, llm_request):
    for content in llm_request.contents:
        if content.role == 'user':
            new_parts = []
            for part in content.parts:
                if hasattr(part, 'file_data') and part.file_data:
                    file_uri = part.file_data.file_uri
                    if is_image_url(file_uri):
                        # Must download the image
                        response = httpx.get(file_uri)
                        image_data = response.content
                        
                        # Convert to inline_data
                        new_parts.append(types.Part(
                            inline_data=types.Blob(
                                mime_type='image/png',
                                data=image_data  # ADK will base64 encode
                            )
                        ))
            content.parts = new_parts
```

This workaround:

- Adds network latency (download image from cloud storage)
- Wastes CPU on unnecessary base64 encoding
- Increases memory usage (storing image bytes)
- Complicates application code

### Benefits of This Feature

1. **Performance**: Eliminates redundant image downloads
2. **Simplicity**: Developers can use `file_data.file_uri` directly
3. **Consistency**: Matches OpenAI Vision API's native URL support
4. **Cost efficiency**: Reduces bandwidth and compute costs

### Suggested Implementation

Modify `_convert_content_parts_to_litellm` in `lite_llm.py` to check MIME type and use `image_url` format for images:

```python
elif part.file_data and part.file_data.file_uri:
    mime_type = part.file_data.mime_type or ""
    
    # Handle image URLs specially
    if mime_type.startswith("image/"):
        content_objects.append({
            "type": "image_url",
            "image_url": {
                "url": part.file_data.file_uri,
                # Optional: support detail parameter
                # "detail": "auto"  
            }
        })
    # Handle video URLs
    elif mime_type.startswith("video/"):
        content_objects.append({
            "type": "video_url",
            "video_url": {"url": part.file_data.file_uri}
        })
    # Keep existing file handling for other types
    else:
        file_object: ChatCompletionFileUrlObject = {
            "file_id": part.file_data.file_uri,
        }
        content_objects.append({
            "type": "file",
            "file": file_object,
        })
```

### Additional Context

- OpenAI Vision API documentation: [https://platform.openai.com/docs/guides/vision](https://platform.openai.com/docs/guides/vision)
- LiteLLM multimodal support: [https://docs.litellm.ai/docs/providers/openai#multimodal-models](https://docs.litellm.ai/docs/providers/openai#multimodal-models)
- Many vision models (Qwen-VL, Claude, Gemini, etc.) support direct URL input via LiteLLM

### Environment

- ADK Version: 1.21.0
- Python Version: 3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support direct image URL passing in file_data.file_uri for LiteLLM vision models #4112

Description

Problem Statement

Current Behavior

Expected Behavior

Use Case

Current Workaround

Benefits of This Feature

Suggested Implementation

Additional Context

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support direct image URL passing in file_data.file_uri for LiteLLM vision models #4112

Description

Description

Problem Statement

Current Behavior

Expected Behavior

Use Case

Current Workaround

Benefits of This Feature

Suggested Implementation

Additional Context

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions