[None][fix] Honor Qwen Image quant ignore list#15599
Conversation
d846c5d to
c94b440
Compare
📝 WalkthroughWalkthroughQwenImageTransformer2DModel now applies per-module quantization exclusions during initialization by replacing excluded Linear submodules' quant configs with a no-quantization config that preserves the KV-cache quantization algorithm. A unit test now checks excluded modules lose quantization while non-excluded modules retain NVFP4. ChangesQwen-Image transformer quantization exclusions
Sequence Diagram(s)sequenceDiagram
participant QwenImageTransformer2DModel
participant model_config_quant_config as model_config.quant_config
participant Linear
participant QuantConfig
QwenImageTransformer2DModel->>model_config_quant_config: read quant_config.exclude_modules
QwenImageTransformer2DModel->>QwenImageTransformer2DModel: call apply_quant_config_exclude_modules()
QwenImageTransformer2DModel->>Linear: inspect named_modules() for excluded submodules
QwenImageTransformer2DModel->>QuantConfig: build a no-quantization config with the KV-cache algorithm
QwenImageTransformer2DModel->>Linear: replace quant_config on excluded Linear modules
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py (1)
829-891: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftApply exclusions before
Linear.create_weights()consumes the quant config.
Linear.__init__creates weights immediately unlessskip_create_weights_in_init=True, andcreate_weights()cachesquant_methodfrom the original NVFP4 config. Mutating onlymodule.quant_configafterward can leave excluded modules with an NVFP4quant_methodand quantized weight layout, so forward/load/post-load paths can still behave quantized despitequant_algo is None. Move this exclusion before weight creation, or rebuild/reset the effectivequant_methodand weights for already-createdLinearmodules.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py` around lines 829 - 891, Apply the quantization exclusions before Linear.create_weights() locks in the original quant_method and weight layout. Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so excluded Linear modules are handled before or during construction/weight creation, or explicitly reset their effective quant_config, quant_method, and weights after mutation; otherwise excluded modules may still behave as NVFP4-quantized even when quant_config.quant_algo is None.
🧹 Nitpick comments (1)
tests/unittest/_torch/visual_gen/test_qwen_image_registry.py (1)
70-70: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd the return annotation for the new test.
The Python guidelines require all functions to be annotated.
Proposed fix
-def test_transformer_applies_quant_config_ignore_list(): +def test_transformer_applies_quant_config_ignore_list() -> None:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py` at line 70, The new test function test_transformer_applies_quant_config_ignore_list is missing a return annotation, and the Python guidelines require every function to be annotated. Update the test definition to include the appropriate return type annotation for this test in test_qwen_image_registry.py, keeping the rest of the test logic unchanged.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Around line 86-92: The current assertions only verify quant_config and can
miss cases where Linear.create_weights() has already cached a different
quant_method. Update the qwen image registry test to assert the effective
quantization behavior on the relevant modules, using symbols like
Linear.create_weights, quant_method, txt_in, proj_out, and transformer_blocks:
confirm excluded modules keep the unquantized quant_method while non-excluded
modules still resolve to NVFP4. Keep the coverage focused on the TensorRT-LLM
effective path, not just the config field.
---
Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`:
- Around line 829-891: Apply the quantization exclusions before
Linear.create_weights() locks in the original quant_method and weight layout.
Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so
excluded Linear modules are handled before or during construction/weight
creation, or explicitly reset their effective quant_config, quant_method, and
weights after mutation; otherwise excluded modules may still behave as
NVFP4-quantized even when quant_config.quant_algo is None.
---
Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Line 70: The new test function
test_transformer_applies_quant_config_ignore_list is missing a return
annotation, and the Python guidelines require every function to be annotated.
Update the test definition to include the appropriate return type annotation for
this test in test_qwen_image_registry.py, keeping the rest of the test logic
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cf7ce349-5021-44bc-ad2e-d60309717bea
📒 Files selected for processing (2)
tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.pytests/unittest/_torch/visual_gen/test_qwen_image_registry.py
|
/bot run |
|
PR_Github #55626 [ run ] triggered by Bot. Commit: |
| model.load_weights({}) | ||
|
|
||
|
|
||
| def test_transformer_applies_quant_config_ignore_list(): |
There was a problem hiding this comment.
I think an E2E test might be more beneficial here. Could you add another E2E test with quant ignore list to tests/integration/defs/examples/visual_gen/test_visual_gen.py and make sure output can be generated?
There was a problem hiding this comment.
Added in 28d5424: test_qwen_image_example_with_quant_ignore in tests/integration/defs/examples/visual_gen/test_visual_gen.py. It writes a Qwen-Image dynamic FP8 config with an ignore list, runs examples/visual_gen/models/qwen_image.py, and asserts the PNG output is generated.
Signed-off-by: Alex Steiner <asteiner@nvidia.com>
c94b440 to
28d5424
Compare
|
PR_Github #55626 [ run ] completed with state
|
|
@pst2154 CI is currently blocked on the Pre-commit Check (GitHub Actions) — not a test or infra failure. The It's a single missing blank line after the imports in @@ -14,6 +14,7 @@ import pytest
from tensorrt_llm._torch.modules.linear import NVFP4LinearMethod, UnquantizedLinearMethod
+
# Importing the models package side-effects the ``@register_pipeline``To unblock, run pre-commit locally, then re-stage + re-push: pre-commit run --files tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git add tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git commit -s # (hook already applied the fix; just re-stage & commit)
git pushOnce Pre-commit Check is green, the full |
Description
Fix Qwen-Image dynamic quantization so
quant_config.ignoreis applied to the module graph, matching the behavior used by other VisualGen transformers such as WAN and FLUX.The Qwen-Image dynamic weight loader already skipped load-time quantization for ignored module names, but the
Linearmodules were still constructed with the global quant config. That meant excluded modules could still retain NVFP4/FP8 module state and activation behavior, making selective quantization appear ineffective.This change adds an exclusion pass in
QwenImageTransformer2DModelthat replaces ignoredLinear.quant_configvalues with a no-op weight quant config, while preserving any KV cache quant setting. It also adds a unit test that verifies ignored Qwen modules are left unquantized while non-ignored modules keep NVFP4.Testing
python -m py_compile tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py tests/unittest/_torch/visual_gen/test_qwen_image_registry.pygit diff --checkCould not run targeted pytest locally because
tests/unittest/conftest.pyimportsmpi4py, which is not installed in this environment.Summary by CodeRabbit