Skip to content

Consistent output shape from get_image_features#46405

Open
zucchini-nlp wants to merge 3 commits into
huggingface:mainfrom
zucchini-nlp:image-output-shapes
Open

Consistent output shape from get_image_features#46405
zucchini-nlp wants to merge 3 commits into
huggingface:mainfrom
zucchini-nlp:image-output-shapes

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Jun 4, 2026

What does this PR do?

as per title, branches off from #45783

The pooled image (OR video, NOT audio yet) feature output will now always satisfy three cond, where image features can be a list or a 3D tensor:

  1. len(image_features) = len(input_images) # NOTE: or num_videos, not number of total video frames
  2. image_features[0].ndim == 2
  3. image_features[0].shape == {{actual seq length of this image, LM hidden size}}

Could make it complete BC and return the "correct" output when a certain flag is passed, but for now decided to update directly smaller utility fn in a breaking way

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Copy Markdown
Member Author

run-slow: aya_vision, cohere2_vision, deepseek_ocr2, gemma4, paddleocr_vl, qwen2_5_omni, qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/aya_vision", "models/cohere2_vision", "models/deepseek_ocr2", "models/gemma4", "models/paddleocr_vl", "models/qwen2_5_omni", "models/qwen3_omni_moe"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 5e5f7bbe workflow commit (merge commit)
PR 52788433 branch commit (from PR)
main b07d99be base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: aya_vision, cohere2_vision, deepseek_ocr2, gemma4, paddleocr_vl, qwen2_5_omni, qwen3_omni_moe

@zucchini-nlp zucchini-nlp requested a review from vasqu June 4, 2026 12:42
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46405&sha=80f4bb

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

CI Dashboard: View test results in Grafana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants