Consistent output shape from get_image_features#46405
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: aya_vision, cohere2_vision, deepseek_ocr2, gemma4, paddleocr_vl, qwen2_5_omni, qwen3_omni_moe |
|
This comment contains models: ["models/aya_vision", "models/cohere2_vision", "models/deepseek_ocr2", "models/gemma4", "models/paddleocr_vl", "models/qwen2_5_omni", "models/qwen3_omni_moe"] |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aya_vision, cohere2_vision, deepseek_ocr2, gemma4, paddleocr_vl, qwen2_5_omni, qwen3_omni_moe |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46405&sha=80f4bb |
|
CI Dashboard: View test results in Grafana |
What does this PR do?
as per title, branches off from #45783
The pooled image (OR video, NOT audio yet) feature output will now always satisfy three cond, where image features can be a list or a 3D tensor:
Could make it complete BC and return the "correct" output when a certain flag is passed, but for now decided to update directly smaller utility fn in a breaking way