From d2d18a69d0a595c567cc8fca8f84d8ed8787ccad Mon Sep 17 00:00:00 2001 From: Abukhoyer Shaik Date: Mon, 25 May 2026 04:49:39 +0000 Subject: [PATCH 1/2] release/v1.21.6 doc update Signed-off-by: Abukhoyer Shaik --- docs/source/release_docs.md | 60 +++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/docs/source/release_docs.md b/docs/source/release_docs.md index 880c3a4e4c..afc48a95a1 100644 --- a/docs/source/release_docs.md +++ b/docs/source/release_docs.md @@ -1,3 +1,63 @@ +# Efficient Transformer Library - 1.21.6 Release Notes + +Welcome to the official release of **Efficient Transformer Library v1.21.6**! This targeted release builds on the v1.21 line with multi-resolution Vision Language Model workflows, Qwen3-VL stability fixes, on-device sampling enablement, and compatibility updates for newer model and framework APIs. + +> ✅ The exact release content is available on the [`release/v1.21.6`](https://github.com/quic/efficient-transformers/tree/release/v1.21.6) branch. The package version for this branch is `1.21.6.0`. + +--- + +## Branch Summary + +- **Release branch**: [`release/v1.21.6`](https://github.com/quic/efficient-transformers/tree/release/v1.21.6) +- **Release head**: `25e7c53` (`Updated release version to 1.21.6.0`) +- **Mainline comparison**: Reviewed against `upstream/main`; the release branch contains 11 release commits from merge base `d02f717`. + +--- + +## Key Features & Enhancements + +- **Multi-specialization vision compilation for Qwen VLMs** + - Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE can compile multiple vision resolution and frame configurations in one pass. + - `height`, `width`, and `num_frames` can be supplied as lists when building specializations. + - Runtime generation can select the matching specialization through the multi-frame generation path. + - New example scripts are available for [Qwen2.5-VL](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen2_5_vl), [Qwen3-VL Dense](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3vl), and [Qwen3-VL-MoE](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3_vl_moe). + +- **Qwen3-VL Dense on-device sampling** + - Registers Qwen3-VL Dense with the sampler transform path. + - Handles Qwen3-VL Dense deepstack feature inputs and outputs for on-device sampling. + - Adds sampler coverage to validate the new transform behavior. + +- **Large embedding export robustness** + - Adds `SplitTensorsTransform` to `QEFFAutoModel` ONNX transforms so large initializers are emitted as `*.onnx.data` sidecar files. + - Prevents ONNX ModelProto parser failures when exports exceed the 2 GB protobuf limit. + - Adds regression coverage for large embedding and reranker model export flows. + +- **Qwen VLM runtime stability** + - Fixes RoPE handling for Qwen3-VL-MoE disaggregated mode. + - Fixes Qwen3-VL Dense continuous batching with multi-image, multi-prompt inputs by preserving the complete hidden-state tensor during broadcast. + - Handles multi-resolution `vision_embeds` edge cases for Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE. + - Moves Qwen2.5-VL examples into a dedicated `qwen2_5_vl` example directory. + +- **Gemma3 configuration compatibility** + - Updates Gemma3 cache handling for the newer `_sliding_window_pattern` config field. + - Preserves sliding-window behavior for Gemma3 models using updated Transformers configs. + +- **Llama4 compatibility with Transformers `4.57.3`** + - Adds `**kwargs` support to `QEffLlama4VisionModel.forward()`. + - Accepts `vision_feature_layer` and `vision_feature_select_strategy` forwarded by newer Transformers Llama4 APIs. + - Fixes ONNX export failures for Llama4 vision models while remaining backward compatible. + +--- + +## Validation & Quality Updates + +- Added tests for Qwen3-VL Dense on-device sampling transformations. +- Added regression tests that verify large ONNX initializers are split into external data files. +- Updated image-text model configs and Qwen3-VL examples for continuous batching and multi-specialization workflows. +- Reverted a temporary Qwen VLM multi-image test/config change before landing the stable Qwen3-VL Dense continuous batching fix. + +--- + # Efficient Transformer Library - 1.21.0 Release Notes Welcome to the official release of **Efficient Transformer Library v1.21.0**! This release introduces advanced attention mechanisms, expanded model support, optimized serving capabilities, and significant improvements to fine-tuning and deployment workflows. From 2bae9804087fd0ebecc588ddf9cc5a5348aa6421 Mon Sep 17 00:00:00 2001 From: Abukhoyer Shaik Date: Tue, 26 May 2026 01:38:07 +0000 Subject: [PATCH 2/2] comments are incorporated Signed-off-by: Abukhoyer Shaik --- docs/source/release_docs.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/docs/source/release_docs.md b/docs/source/release_docs.md index afc48a95a1..9cd7332eeb 100644 --- a/docs/source/release_docs.md +++ b/docs/source/release_docs.md @@ -1,6 +1,6 @@ # Efficient Transformer Library - 1.21.6 Release Notes -Welcome to the official release of **Efficient Transformer Library v1.21.6**! This targeted release builds on the v1.21 line with multi-resolution Vision Language Model workflows, Qwen3-VL stability fixes, on-device sampling enablement, and compatibility updates for newer model and framework APIs. +Welcome to the official release of **Efficient Transformer Library v1.21.6**! This targeted release builds on the v1.21 line with multi-resolution Vision Language Model workflows, Qwen3-VL stability fixes, on-device sampling enablement, online serving support for Gemma4 through vLLM, and compatibility updates for newer model and framework APIs. > ✅ The exact release content is available on the [`release/v1.21.6`](https://github.com/quic/efficient-transformers/tree/release/v1.21.6) branch. The package version for this branch is `1.21.6.0`. @@ -17,10 +17,10 @@ Welcome to the official release of **Efficient Transformer Library v1.21.6**! Th ## Key Features & Enhancements - **Multi-specialization vision compilation for Qwen VLMs** - - Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE can compile multiple vision resolution and frame configurations in one pass. + - Qwen2.5-VL, Qwen3-VL Dense can compile multiple vision resolution and frame configurations in one pass. - `height`, `width`, and `num_frames` can be supplied as lists when building specializations. - Runtime generation can select the matching specialization through the multi-frame generation path. - - New example scripts are available for [Qwen2.5-VL](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen2_5_vl), [Qwen3-VL Dense](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3vl), and [Qwen3-VL-MoE](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3_vl_moe). + - New example scripts are available for [Qwen2.5-VL](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen2_5_vl), [Qwen3-VL Dense](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3vl). - **Qwen3-VL Dense on-device sampling** - Registers Qwen3-VL Dense with the sampler transform path. @@ -33,7 +33,6 @@ Welcome to the official release of **Efficient Transformer Library v1.21.6**! Th - Adds regression coverage for large embedding and reranker model export flows. - **Qwen VLM runtime stability** - - Fixes RoPE handling for Qwen3-VL-MoE disaggregated mode. - Fixes Qwen3-VL Dense continuous batching with multi-image, multi-prompt inputs by preserving the complete hidden-state tensor during broadcast. - Handles multi-resolution `vision_embeds` edge cases for Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE. - Moves Qwen2.5-VL examples into a dedicated `qwen2_5_vl` example directory. @@ -41,12 +40,16 @@ Welcome to the official release of **Efficient Transformer Library v1.21.6**! Th - **Gemma3 configuration compatibility** - Updates Gemma3 cache handling for the newer `_sliding_window_pattern` config field. - Preserves sliding-window behavior for Gemma3 models using updated Transformers configs. + - Added online serving support for Gemma3 through vLLM - **Llama4 compatibility with Transformers `4.57.3`** - Adds `**kwargs` support to `QEffLlama4VisionModel.forward()`. - Accepts `vision_feature_layer` and `vision_feature_select_strategy` forwarded by newer Transformers Llama4 APIs. - Fixes ONNX export failures for Llama4 vision models while remaining backward compatible. +- **GPT-OSS batch size flexibility** + - Added GPT OSS 120B with BS>1 and GPT OSS 20B BS>2 support is enabled + --- ## Validation & Quality Updates