[None][fix] Honor Qwen Image quant ignore list by pst2154 · Pull Request #15599 · NVIDIA/TensorRT-LLM

pst2154 · 2026-06-24T16:16:38Z

Description

Fix Qwen-Image dynamic quantization so quant_config.ignore is applied to the module graph, matching the behavior used by other VisualGen transformers such as WAN and FLUX.

The Qwen-Image dynamic weight loader already skipped load-time quantization for ignored module names, but the Linear modules were still constructed with the global quant config. That meant excluded modules could still retain NVFP4/FP8 module state and activation behavior, making selective quantization appear ineffective.

This change adds an exclusion pass in QwenImageTransformer2DModel that replaces ignored Linear.quant_config values with a no-op weight quant config, while preserving any KV cache quant setting. It also adds a unit test that verifies ignored Qwen modules are left unquantized while non-ignored modules keep NVFP4.

Testing

python -m py_compile tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git diff --check

Could not run targeted pytest locally because tests/unittest/conftest.py imports mpi4py, which is not installed in this environment.

Summary by CodeRabbit

New Features
- Qwen Image models now respect quantization exclusion settings, allowing selected layers to stay unquantized while the rest of the model still uses the configured quantization behavior.
- Improved initialization so excluded components are handled automatically at model load time.

coderabbitai · 2026-06-24T21:43:24Z

📝 Walkthrough

Walkthrough

QwenImageTransformer2DModel now applies per-module quantization exclusions during initialization by replacing excluded Linear submodules' quant configs with a no-quantization config that preserves the KV-cache quantization algorithm. A unit test now checks excluded modules lose quantization while non-excluded modules retain NVFP4.

Changes

Qwen-Image transformer quantization exclusions

Layer / File(s)	Summary
Initialization hook and exclusion helper `tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`	Adds `QuantConfig` import, calls the exclusion helper from `__init__`, and defines logic that rewrites excluded Linear submodules' `quant_config` values.
Registry test for excluded submodules `tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`	Adds a unit test that builds an NVFP4 dynamic quantization config with an exclude list and checks excluded modules disable quantization while others keep it.

Sequence Diagram(s)

sequenceDiagram
  participant QwenImageTransformer2DModel
  participant model_config_quant_config as model_config.quant_config
  participant Linear
  participant QuantConfig
  QwenImageTransformer2DModel->>model_config_quant_config: read quant_config.exclude_modules
  QwenImageTransformer2DModel->>QwenImageTransformer2DModel: call apply_quant_config_exclude_modules()
  QwenImageTransformer2DModel->>Linear: inspect named_modules() for excluded submodules
  QwenImageTransformer2DModel->>QuantConfig: build a no-quantization config with the KV-cache algorithm
  QwenImageTransformer2DModel->>Linear: replace quant_config on excluded Linear modules

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows the required [None][fix] format and clearly summarizes the Qwen Image quant ignore-list fix.
Description check	✅ Passed	The description covers the bug, fix, and testing, and is mostly complete despite missing the explicit PR Checklist and Test Coverage headings.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py (1)
829-891: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Apply exclusions before Linear.create_weights() consumes the quant config.

Linear.__init__ creates weights immediately unless skip_create_weights_in_init=True, and create_weights() caches quant_method from the original NVFP4 config. Mutating only module.quant_config afterward can leave excluded modules with an NVFP4 quant_method and quantized weight layout, so forward/load/post-load paths can still behave quantized despite quant_algo is None. Move this exclusion before weight creation, or rebuild/reset the effective quant_method and weights for already-created Linear modules.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`
around lines 829 - 891, Apply the quantization exclusions before
Linear.create_weights() locks in the original quant_method and weight layout.
Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so
excluded Linear modules are handled before or during construction/weight
creation, or explicitly reset their effective quant_config, quant_method, and
weights after mutation; otherwise excluded modules may still behave as
NVFP4-quantized even when quant_config.quant_algo is None.

🧹 Nitpick comments (1)

tests/unittest/_torch/visual_gen/test_qwen_image_registry.py (1)
70-70: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the return annotation for the new test.

The Python guidelines require all functions to be annotated.
Proposed fix
-def test_transformer_applies_quant_config_ignore_list():
+def test_transformer_applies_quant_config_ignore_list() -> None:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py` at line 70, The
new test function test_transformer_applies_quant_config_ignore_list is missing a
return annotation, and the Python guidelines require every function to be
annotated. Update the test definition to include the appropriate return type
annotation for this test in test_qwen_image_registry.py, keeping the rest of the
test logic unchanged.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Around line 86-92: The current assertions only verify quant_config and can
miss cases where Linear.create_weights() has already cached a different
quant_method. Update the qwen image registry test to assert the effective
quantization behavior on the relevant modules, using symbols like
Linear.create_weights, quant_method, txt_in, proj_out, and transformer_blocks:
confirm excluded modules keep the unquantized quant_method while non-excluded
modules still resolve to NVFP4. Keep the coverage focused on the TensorRT-LLM
effective path, not just the config field.

---

Outside diff comments:
In `@tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py`:
- Around line 829-891: Apply the quantization exclusions before
Linear.create_weights() locks in the original quant_method and weight layout.
Update QwenImageTransformer2DModel.apply_quant_config_exclude_modules so
excluded Linear modules are handled before or during construction/weight
creation, or explicitly reset their effective quant_config, quant_method, and
weights after mutation; otherwise excluded modules may still behave as
NVFP4-quantized even when quant_config.quant_algo is None.

---

Nitpick comments:
In `@tests/unittest/_torch/visual_gen/test_qwen_image_registry.py`:
- Line 70: The new test function
test_transformer_applies_quant_config_ignore_list is missing a return
annotation, and the Python guidelines require every function to be annotated.
Update the test definition to include the appropriate return type annotation for
this test in test_qwen_image_registry.py, keeping the rest of the test logic
unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cf7ce349-5021-44bc-ad2e-d60309717bea

📥 Commits

Reviewing files that changed from the base of the PR and between 7193f41 and c94b440.

📒 Files selected for processing (2)

tensorrt_llm/_torch/visual_gen/models/qwen_image/transformer_qwen_image.py
tests/unittest/_torch/visual_gen/test_qwen_image_registry.py

chang-l · 2026-06-25T00:13:59Z

/bot run

tensorrt-cicd · 2026-06-25T00:19:51Z

PR_Github #55626 [ run ] triggered by Bot. Commit: c94b440 Link to invocation

yibinl-nvidia · 2026-06-25T00:28:47Z

        model.load_weights({})


+def test_transformer_applies_quant_config_ignore_list():


I think an E2E test might be more beneficial here. Could you add another E2E test with quant ignore list to tests/integration/defs/examples/visual_gen/test_visual_gen.py and make sure output can be generated?

Added in 28d5424: test_qwen_image_example_with_quant_ignore in tests/integration/defs/examples/visual_gen/test_visual_gen.py. It writes a Qwen-Image dynamic FP8 config with an ignore list, runs examples/visual_gen/models/qwen_image.py, and asserts the PNG output is generated.

Signed-off-by: Alex Steiner <asteiner@nvidia.com>

tensorrt-cicd · 2026-06-25T02:26:56Z

PR_Github #55626 [ run ] completed with state FAILURE. Commit: c94b440
/LLM/main/L0_MergeRequest_PR pipeline #44541 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

chang-l · 2026-06-25T04:37:12Z

@pst2154 CI is currently blocked on the Pre-commit Check (GitHub Actions) — not a test or infra failure. The ruff hook auto-fixes a formatting issue but pre-commit fails whenever a hook modifies a file:

ruff .... Failed  (Found 1 error, 1 fixed, 0 remaining; files were modified by this hook)

It's a single missing blank line after the imports in tests/unittest/_torch/visual_gen/test_qwen_image_registry.py:

@@ -14,6 +14,7 @@ import pytest
 from tensorrt_llm._torch.modules.linear import NVFP4LinearMethod, UnquantizedLinearMethod
+
 # Importing the models package side-effects the ``@register_pipeline``

To unblock, run pre-commit locally, then re-stage + re-push:

pre-commit run --files tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git add tests/unittest/_torch/visual_gen/test_qwen_image_registry.py
git commit -s   # (hook already applied the fix; just re-stage & commit)
git push

Once Pre-commit Check is green, the full /bot run pipeline can proceed.

github-actions Bot assigned pst2154 Jun 24, 2026

chang-l approved these changes Jun 24, 2026

View reviewed changes

chang-l requested a review from yibinl-nvidia June 24, 2026 19:02

pst2154 force-pushed the codex/qwen-image-quant-ignore branch from d846c5d to c94b440 Compare June 24, 2026 21:22

pst2154 marked this pull request as ready for review June 24, 2026 21:39

pst2154 requested a review from a team as a code owner June 24, 2026 21:39

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread tests/unittest/_torch/visual_gen/test_qwen_image_registry.py

yibinl-nvidia approved these changes Jun 25, 2026

View reviewed changes

[None][fix] Honor Qwen Image quant ignore list

28d5424

Signed-off-by: Alex Steiner <asteiner@nvidia.com>

pst2154 force-pushed the codex/qwen-image-quant-ignore branch from c94b440 to 28d5424 Compare June 25, 2026 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][fix] Honor Qwen Image quant ignore list#15599

[None][fix] Honor Qwen Image quant ignore list#15599
pst2154 wants to merge 1 commit into
NVIDIA:mainfrom
pst2154:codex/qwen-image-quant-ignore

pst2154 commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

chang-l commented Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

yibinl-nvidia Jun 25, 2026

Uh oh!

pst2154 Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

chang-l commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		model.load_weights({})


		def test_transformer_applies_quant_config_ignore_list():

Uh oh!

Conversation

pst2154 commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chang-l commented Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

yibinl-nvidia Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

pst2154 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

chang-l commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pst2154 commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading