Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions docs/source/release_docs.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,66 @@
# Efficient Transformer Library - 1.21.6 Release Notes

Welcome to the official release of **Efficient Transformer Library v1.21.6**! This targeted release builds on the v1.21 line with multi-resolution Vision Language Model workflows, Qwen3-VL stability fixes, on-device sampling enablement, online serving support for Gemma4 through vLLM, and compatibility updates for newer model and framework APIs.

Comment thread
abukhoy marked this conversation as resolved.
> ✅ The exact release content is available on the [`release/v1.21.6`](https://github.com/quic/efficient-transformers/tree/release/v1.21.6) branch. The package version for this branch is `1.21.6.0`.

---

## Branch Summary

- **Release branch**: [`release/v1.21.6`](https://github.com/quic/efficient-transformers/tree/release/v1.21.6)
- **Release head**: `25e7c53` (`Updated release version to 1.21.6.0`)
- **Mainline comparison**: Reviewed against `upstream/main`; the release branch contains 11 release commits from merge base `d02f717`.

---

## Key Features & Enhancements

- **Multi-specialization vision compilation for Qwen VLMs**
- Qwen2.5-VL, Qwen3-VL Dense can compile multiple vision resolution and frame configurations in one pass.
- `height`, `width`, and `num_frames` can be supplied as lists when building specializations.
- Runtime generation can select the matching specialization through the multi-frame generation path.
- New example scripts are available for [Qwen2.5-VL](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen2_5_vl), [Qwen3-VL Dense](https://github.com/quic/efficient-transformers/tree/release/v1.21.6/examples/image_text_to_text/models/qwen3vl).

- **Qwen3-VL Dense on-device sampling**
- Registers Qwen3-VL Dense with the sampler transform path.
- Handles Qwen3-VL Dense deepstack feature inputs and outputs for on-device sampling.
- Adds sampler coverage to validate the new transform behavior.

- **Large embedding export robustness**
- Adds `SplitTensorsTransform` to `QEFFAutoModel` ONNX transforms so large initializers are emitted as `*.onnx.data` sidecar files.
- Prevents ONNX ModelProto parser failures when exports exceed the 2 GB protobuf limit.
- Adds regression coverage for large embedding and reranker model export flows.

- **Qwen VLM runtime stability**
- Fixes Qwen3-VL Dense continuous batching with multi-image, multi-prompt inputs by preserving the complete hidden-state tensor during broadcast.
- Handles multi-resolution `vision_embeds` edge cases for Qwen2.5-VL, Qwen3-VL Dense, and Qwen3-VL-MoE.
- Moves Qwen2.5-VL examples into a dedicated `qwen2_5_vl` example directory.

- **Gemma3 configuration compatibility**
- Updates Gemma3 cache handling for the newer `_sliding_window_pattern` config field.
- Preserves sliding-window behavior for Gemma3 models using updated Transformers configs.
Comment thread
abukhoy marked this conversation as resolved.
- Added online serving support for Gemma3 through vLLM

- **Llama4 compatibility with Transformers `4.57.3`**
- Adds `**kwargs` support to `QEffLlama4VisionModel.forward()`.
- Accepts `vision_feature_layer` and `vision_feature_select_strategy` forwarded by newer Transformers Llama4 APIs.
- Fixes ONNX export failures for Llama4 vision models while remaining backward compatible.

- **GPT-OSS batch size flexibility**
- Added GPT OSS 120B with BS>1 and GPT OSS 20B BS>2 support is enabled

---
Comment thread
abukhoy marked this conversation as resolved.

## Validation & Quality Updates

- Added tests for Qwen3-VL Dense on-device sampling transformations.
- Added regression tests that verify large ONNX initializers are split into external data files.
- Updated image-text model configs and Qwen3-VL examples for continuous batching and multi-specialization workflows.
- Reverted a temporary Qwen VLM multi-image test/config change before landing the stable Qwen3-VL Dense continuous batching fix.

---

# Efficient Transformer Library - 1.21.0 Release Notes

Welcome to the official release of **Efficient Transformer Library v1.21.0**! This release introduces advanced attention mechanisms, expanded model support, optimized serving capabilities, and significant improvements to fine-tuning and deployment workflows.
Expand Down