Skip to content

[6106576] Restore qdq_utils symbols used by edgellm 0.6.1#1408

Open
ajrasane wants to merge 2 commits intomainfrom
ajrasane/nvbug_6106576_fix
Open

[6106576] Restore qdq_utils symbols used by edgellm 0.6.1#1408
ajrasane wants to merge 2 commits intomainfrom
ajrasane/nvbug_6106576_fix

Conversation

@ajrasane
Copy link
Copy Markdown
Contributor

@ajrasane ajrasane commented May 7, 2026

What does this PR do?

Type of change: Bug fix (backward-compat restoration), follow-up to #1356.

#1356 restored modelopt.onnx.llm_export_utils as a deprecation shim so that TensorRT-Edge-LLM 0.6.1 could resume importing fold_fp8_qdq_to_dq. QA reports that the next line of tensorrt_edgellm/onnx_export/onnx_utils.py still aborts at module load:

from modelopt.onnx.llm_export_utils.surgeon_utils import fold_fp8_qdq_to_dq   # OK
from modelopt.onnx.quantization.qdq_utils import (fp4qdq_to_2dq,
                                                  quantize_weights_to_int4,
                                                  quantize_weights_to_mxfp8)
# ImportError: cannot import name 'fp4qdq_to_2dq' from 'modelopt.onnx.quantization.qdq_utils'

Those three symbols were removed when the LLM ONNX export pipeline was refactored into the staged modelopt.onnx.export exporters (INT4QuantExporter, NVFP4QuantExporter, MXFP8QuantExporter). The new exporters still implement the same conversions, but edgellm 0.6.1 imports the old top-level functions unconditionally, so every tensorrt-edgellm-* CLI fails immediately — same 5/5 (100%) failure rate as before, just one line further in.

What this PR does

  • Restores fp4qdq_to_2dq, quantize_weights_to_int4, and quantize_weights_to_mxfp8 on modelopt.onnx.quantization.qdq_utils as deprecation shims that:
    • emit a DeprecationWarning pointing readers at the new modelopt.onnx.export exporters and at TensorRT-Edge-LLM as the long-term migration target,
    • reuse the existing in-tree helpers (_cast_fp8 here, _cast_fp4 and _replace_fp4qdq_with_2dq from modelopt.onnx.export.nvfp4_exporter via a lazy import to avoid a circular import — nvfp4_exporter already imports onnx_dtype_map from this module),
    • reuse pack_weights_to_int4, get_amax, compute_e8m0, get_weights_scaling_factor[_2], quantize, get_attribute, has_attribute, and read_f16_tensor_as_fp32 from their canonical locations rather than duplicating them.
  • Adds the matching imports at the top of qdq_utils.py. No new files, no public API removals, no behavior changes for any existing caller.

I also audited every other modelopt import edgellm 0.6.1 makes (modelopt.torch.quantization, modelopt.torch.opt, modelopt.torch.export.quant_utils, modelopt.torch.opt.plugins.huggingface, modelopt.onnx.quantization.gs_patching, modelopt.torch.quantization.nn.{QuantLinear,TensorQuantizer}, modelopt.torch.quantization.utils.{is_quantized,is_quantized_linear}) and confirmed each still resolves on current main; only the three qdq_utils symbols above were missing.

Testing

  • All four edgellm-side import lines from the failing traceback now succeed against this branch (verified by importing them from the v0.6.1 source tree).
  • tests/unit/onnx/quantization/test_qdq_utils.py + tests/unit/onnx/test_onnx_utils.py: 36 passed in 5s.
  • Each restored shim emits DeprecationWarning when called and runs to completion on an empty ModelProto smoke input.
  • pre-commit run --files modelopt/onnx/quantization/qdq_utils.py: ruff, ruff-format, mypy, bandit, license header all pass.

Notes for reviewers

  • The shim functions are intentionally near-verbatim copies of the pre-removal implementations rather than thin adapters over the new exporters: the new exporters split work across compute_scales / compress_weights / post_process stages, while edgellm calls the legacy single-shot entry points and depends on side effects (e.g. casting bias to fp16 inside quantize_weights_to_int4) that the staged exporters do not all reproduce in one call. Wrapping them up was higher-risk than restoring the legacy paths verbatim with a deprecation warning.
  • Removal target: same as the existing llm_export_utils shim — drop together once edgellm releases a version that targets modelopt.onnx.export directly.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes — purely additive; no existing API changes.
  • Did you write any new necessary tests?: relied on existing qdq_utils unit tests + a manual repro of the failing edgellm import path.
  • Did you add or update any necessary documentation?: deprecation message in each shim points to the migration target.
  • Did you update Changelog?: not applicable for a compat shim.

Summary by CodeRabbit

  • Chores
    • Added deprecated legacy quantization conversion utilities to preserve compatibility; invoking them now emits deprecation warnings.
  • Documentation
    • Updated helper docstrings to note compatibility/shim usage.
  • Tests
    • Added end-to-end smoke tests verifying the deprecated shims emit warnings and produce expected transformed/quantized models.

@ajrasane ajrasane requested a review from a team as a code owner May 7, 2026 19:01
@ajrasane ajrasane requested a review from galagam May 7, 2026 19:01
@ajrasane ajrasane added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 7, 2026
@ajrasane ajrasane requested a review from gcunhase May 7, 2026 19:01
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds three deprecated legacy compatibility shims to qdq_utils.py: quantize_weights_to_int4, quantize_weights_to_mxfp8, and fp4qdq_to_2dq, plus a shared deprecation message constant, expanded quantization utility imports, docstring notes in nvfp4_exporter.py, and unit tests exercising the shims.

Changes

Legacy Export Compatibility Layer

Layer / File(s) Summary
NVFP4 exporter docstring notes
modelopt/onnx/export/nvfp4_exporter.py
Docstrings for _cast_fp4 and _replace_fp4qdq_with_2dq note reuse by the deprecated fp4qdq_to_2dq shim and warn against changing signatures.
Quantization Utility Imports
modelopt/onnx/quantization/qdq_utils.py
Adds warnings import and expands imports from quant_utils (INT4 packing, e8m0/amax helpers, quantization helpers).
Deprecation Message Constant
modelopt/onnx/quantization/qdq_utils.py
Completes prior function end and defines _LEGACY_LLM_EXPORT_DEPRECATION_MSG used by legacy shim functions.
Legacy Quantization Shims
modelopt/onnx/quantization/qdq_utils.py
Adds deprecated quantize_weights_to_int4() (rewrites graph, packs INT4 weights), quantize_weights_to_mxfp8() (computes per-block e8m0, converts weights to FP8, updates DQ/Gelu), and fp4qdq_to_2dq() (rewrites TRT_FP4QDQ subgraphs into 2-DQ form and updates casts/initializers).
Tests
tests/unit/onnx/quantization/test_qdq_utils.py
Adds TestLegacyEdgeLLMShims with smoke tests asserting DeprecationWarning and validating transformed model graph nodes and initializer dtypes/names.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: restoring three qdq_utils symbols (fp4qdq_to_2dq, quantize_weights_to_int4, quantize_weights_to_mxfp8) needed by edgellm 0.6.1.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found. Verified: no unsafe deserialization, no eval/exec, no nosec comments, no hardcoded secrets, no new dependencies. Deprecated shims properly emit DeprecationWarning.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/nvbug_6106576_fix

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Bug 6106576 reports that even after #1356 restored
``modelopt.onnx.llm_export_utils`` as a deprecation shim, TensorRT-Edge-LLM
0.6.1 still fails at module load. The next line in ``onnx_utils.py`` is

    from modelopt.onnx.quantization.qdq_utils import (
        fp4qdq_to_2dq, quantize_weights_to_int4, quantize_weights_to_mxfp8,
    )

These three symbols were removed when the LLM ONNX export pipeline was
refactored into the new staged ``modelopt.onnx.export`` exporters
(INT4/NVFP4/MXFP8). Their behavior is preserved by the new exporters, but
edgellm 0.6.1 imports the old top-level functions unconditionally, so the
import chain still aborts with ``ImportError``.

This restores the three functions on ``modelopt.onnx.quantization.qdq_utils``
as deprecation shims that emit ``DeprecationWarning`` and reuse the existing
helpers (``_cast_fp8`` here, ``_cast_fp4`` and ``_replace_fp4qdq_with_2dq``
from ``modelopt.onnx.export.nvfp4_exporter`` via lazy import to avoid a
circular import). No new files; only one file changes. The deprecation
message points readers to the new exporters and to TensorRT-Edge-LLM as the
long-term migration target.

I also audited every other ``modelopt`` import edgellm 0.6.1 makes
(``modelopt.torch.quantization``, ``modelopt.torch.opt``,
``modelopt.torch.export.quant_utils``, ``modelopt.onnx.quantization.gs_patching``,
etc.) and confirmed they all still resolve on current main; only the three
``qdq_utils`` symbols above were missing.

Verified with:

- All 4 edgellm-side import lines from the bug's traceback now succeed.
- ``tests/unit/onnx/quantization/test_qdq_utils.py`` and
  ``tests/unit/onnx/test_onnx_utils.py`` (36 tests) pass.
- Each restored shim emits ``DeprecationWarning`` when called.

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane force-pushed the ajrasane/nvbug_6106576_fix branch from 41828cb to c61ac69 Compare May 7, 2026 19:04
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1408/

Built to branch gh-pages at 2026-05-08 14:33 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
modelopt/onnx/quantization/qdq_utils.py (2)

1571-1573: 💤 Low value

Fragile attribute access pattern.

Line 1573 assumes the first attribute of TRT_FP4QDQ is always block_size. If the node's attributes are reordered in future ONNX exports, this would silently use the wrong value.

Consider iterating over attributes to find block_size by name, similar to how other functions in this file handle attributes (e.g., lines 1330-1332, 1444-1458).

🔧 Suggested fix
-        block_size = node.attribute[0].i
+        block_size = None
+        for attr in node.attribute:
+            if attr.name == "block_size":
+                block_size = attr.i
+                break
+        assert block_size is not None, f"block_size attribute not found for {node.name}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1571 - 1573, The code
currently assumes block_size is at node.attribute[0] for TRT_FP4QDQ (lines
around the assert using initializer_indices and block_size =
node.attribute[0].i); instead, iterate over node.attribute to find the attribute
with name "block_size" and use its .i value (fall back to raising a clear error
if not found). Keep the existing initializer lookup/assert
(initializer_indices.get(node.input[0], None)) but replace the positional access
to node.attribute[0] with a search loop that matches attr.name == "block_size"
to make the TRT_FP4QDQ handling robust to attribute reordering.

1555-1561: 💤 Low value

Magic number for dtype detection.

The check initializer.data_type == 16 uses a magic number for bfloat16. This is acceptable for deprecated code, but a brief comment would aid maintainability until removal.

📝 Suggested improvement
     def _get_precision_dtype() -> str:
         precision_dtype = "Half"
         for initializer in graph.initializer:
-            if initializer.data_type == 16:
+            if initializer.data_type == 16:  # BFLOAT16
                 precision_dtype = "BFloat16"
                 break
         return precision_dtype
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1555 - 1561, The
_get_precision_dtype function uses a magic number (16) to detect BFloat16 via
initializer.data_type; replace the literal with a named constant or reference
the ONNX TensorProto.DataType enum (e.g., TensorProto.BFLOAT16) to make intent
explicit, or at minimum add a short inline comment explaining that 16
corresponds to BFloat16; update the check in _get_precision_dtype (and any
similar occurrences) to use the named symbol instead of the raw number to
improve readability and maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/onnx/quantization/qdq_utils.py`:
- Around line 1571-1573: The code currently assumes block_size is at
node.attribute[0] for TRT_FP4QDQ (lines around the assert using
initializer_indices and block_size = node.attribute[0].i); instead, iterate over
node.attribute to find the attribute with name "block_size" and use its .i value
(fall back to raising a clear error if not found). Keep the existing initializer
lookup/assert (initializer_indices.get(node.input[0], None)) but replace the
positional access to node.attribute[0] with a search loop that matches attr.name
== "block_size" to make the TRT_FP4QDQ handling robust to attribute reordering.
- Around line 1555-1561: The _get_precision_dtype function uses a magic number
(16) to detect BFloat16 via initializer.data_type; replace the literal with a
named constant or reference the ONNX TensorProto.DataType enum (e.g.,
TensorProto.BFLOAT16) to make intent explicit, or at minimum add a short inline
comment explaining that 16 corresponds to BFloat16; update the check in
_get_precision_dtype (and any similar occurrences) to use the named symbol
instead of the raw number to improve readability and maintainability.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cc203c0e-66d9-42ce-ad4d-cf5c78638b50

📥 Commits

Reviewing files that changed from the base of the PR and between 555be6c and c61ac69.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/qdq_utils.py

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 86.12440% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.63%. Comparing base (097293b) to head (c7e10f4).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/qdq_utils.py 86.12% 29 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1408      +/-   ##
==========================================
- Coverage   76.73%   74.63%   -2.10%     
==========================================
  Files         476      476              
  Lines       51306    51939     +633     
==========================================
- Hits        39368    38766     -602     
- Misses      11938    13173    +1235     
Flag Coverage Δ
examples 38.81% <3.34%> (-0.37%) ⬇️
gpu 59.61% <3.34%> (-0.82%) ⬇️
unit 52.69% <86.12%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Compat-shim follow-up to #1356 that restores three removed symbols on qdq_utils (quantize_weights_to_int4, quantize_weights_to_mxfp8, fp4qdq_to_2dq) so edgellm 0.6.1 imports can resolve. Code looks fine: imports all resolve (pack_weights_to_int4, compute_e8m0, get_amax, quantize, get_weights_scaling_factor[_2] from quant_utils; get_attribute/has_attribute/read_f16_tensor_as_fp32 from modelopt.onnx.utils; _cast_fp8/cast_initializer_to_dtype already defined in this file; _cast_fp4/_replace_fp4qdq_with_2dq lazy-imported from nvfp4_exporter to avoid the documented circular import). Each shim emits a DeprecationWarning pointing at the staged exporters, as claimed. The implementations are near-verbatim copies of the pre-removal versions, so behavior should match what edgellm previously depended on.

Two things for a human to confirm before merging:

  1. No tests for the restored shims. tests/unit/onnx/quantization/test_qdq_utils.py exercises INT4QuantExporter / MXFP8QuantExporter / NVFP4QuantExporter.process_model(...), not the restored top-level quantize_weights_to_int4 / quantize_weights_to_mxfp8 / fp4qdq_to_2dq functions. The PR body claims these tests cover the shims, but they don't — the shims are ~390 lines of graph-manipulation logic with zero CI coverage on this branch. Since the staged exporters split work differently (per the PR's own "notes for reviewers"), passing tests on the exporters doesn't prove the legacy single-shot path still works. A couple of smoke tests that actually call each shim on a minimal ModelProto (as the PR description says were run manually) would be cheap insurance and would keep future refactors from silently re-breaking edgellm. Same gap was flagged for the sibling shim PR #1356.

  2. Private-symbol coupling. fp4qdq_to_2dq lazy-imports _cast_fp4 and _replace_fp4qdq_with_2dq (both underscore-prefixed) from modelopt.onnx.export.nvfp4_exporter. The dedup is nice, but this means any rename/signature change to those private helpers in the exporter will silently break the edgellm shim. Worth either (a) promoting them to module-private helpers in qdq_utils itself (duplication, but the shim is explicitly documented as a copy of the legacy path anyway), or (b) adding a comment in nvfp4_exporter.py noting that the shim depends on these symbols so a future refactor doesn't drop them.

Neither is a blocker — the PR is purely additive, scoped to an urgent compat fix, and removes as soon as edgellm moves to modelopt.onnx.export.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Compat-shim PR that restores fp4qdq_to_2dq, quantize_weights_to_int4, and quantize_weights_to_mxfp8 on qdq_utils for edgellm 0.6.1 — a follow-up to #1356. The code itself looks correct: all imports resolve to real symbols (pack_weights_to_int4, compute_e8m0, get_amax, quantize, get_weights_scaling_factor[_2] from quant_utils; get_attribute/has_attribute/read_f16_tensor_as_fp32 from modelopt.onnx.utils; _cast_fp4/_replace_fp4qdq_with_2dq lazy-imported from nvfp4_exporter with the circular-import rationale documented inline), each shim emits a DeprecationWarning, and the implementations are near-verbatim copies of the pre-removal versions.

Reason to nudge rather than approve: the previous review's most important comment — no direct test coverage for the restored shims — is still unaddressed. tests/unit/onnx/quantization/test_qdq_utils.py exercises INT4QuantExporter/MXFP8QuantExporter/NVFP4QuantExporter.process_model(...), not the three restored top-level functions. The PR body claims those tests cover the shims, but they don't — and the PR's own "notes for reviewers" section explains that the staged exporters split work differently than the legacy single-shot entry points (e.g. the bias-to-fp16 cast inside quantize_weights_to_int4), which is exactly why passing exporter tests doesn't prove the legacy path still works. A few smoke tests that actually call each shim on a minimal ModelProto would be cheap insurance and match what the PR description says was already run manually. The two CodeRabbit nitpicks (magic 16 for BFLOAT16, positional node.attribute[0] access for block_size) are minor and inherited from the pre-removal code, so I'd leave those to author discretion.

A human reviewer should confirm whether shipping this compat fix without direct shim tests is acceptable given the urgency.

- fp4qdq_to_2dq: look up block_size by attribute name instead of position so
  the shim does not silently use the wrong attribute if TRT_FP4QDQ attribute
  ordering changes.
- _get_precision_dtype: use onnx.TensorProto.BFLOAT16 instead of the literal
  16 for readability.
- nvfp4_exporter: note in the docstrings of _cast_fp4 and
  _replace_fp4qdq_with_2dq that they are reused by the deprecated
  qdq_utils.fp4qdq_to_2dq shim, so a future refactor does not silently drop
  them.
- Add direct smoke tests for quantize_weights_to_int4,
  quantize_weights_to_mxfp8, and fp4qdq_to_2dq that assert each shim emits a
  DeprecationWarning and produces the expected end-state graph (the existing
  tests only exercise the staged exporters).

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: all previous comments have been addressed.

  • Critical (cjluo-nv bot, repeated across two reviews): missing direct tests for the restored shims — addressed. The new TestLegacyEdgeLLMShims class in tests/unit/onnx/quantization/test_qdq_utils.py calls each shim directly on minimal ModelProto fixtures, asserts DeprecationWarning is emitted, and verifies the expected transformations (INT4 data type + Reshape/Transpose removal, FLOAT8E4M3FN + Gelu tanh approximation, TRT_FP4QDQ → 2×DQ with the three expected initializers).
  • Minor (cjluo-nv): private-symbol coupling from fp4qdq_to_2dq to _cast_fp4/_replace_fp4qdq_with_2dq — addressed via docstring warnings in nvfp4_exporter.py marking both helpers as load-bearing re-exports.
  • Minor (CodeRabbit): fragile node.attribute[0] positional access for block_size — addressed with a name-based lookup (next((attr for attr in node.attribute if attr.name == "block_size"), None)) plus an explicit assert.
  • Minor (CodeRabbit): magic number 16 for BFLOAT16 — addressed by using onnx.TensorProto.BFLOAT16.

Correctness is fine: imports are at the top, the one lazy import (_cast_fp4/_replace_fp4qdq_with_2dq) has a documented circular-import rationale, deprecation warnings use stacklevel=2, and the shims are near-verbatim copies of the pre-removal implementations per the PR's documented rationale. Purely additive compat shim, no licensing changes, no API removals.

Testing: Test plan has 1 unchecked item(s) out of 5. Finish or remove them before approving.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review: all previous critical and minor comments have been addressed.

  • Critical (missing direct shim tests): Addressed. TestLegacyEdgeLLMShims now has three tests (four with the with_transpose parametrization) that directly call quantize_weights_to_int4, quantize_weights_to_mxfp8, and fp4qdq_to_2dq on minimal ModelProto fixtures, assert DeprecationWarning is emitted with the function name, and verify the expected graph transformations (INT4 data type + Reshape/Transpose removal; FLOAT8E4M3FN + Gelu tanh approximation; TRT_FP4QDQ → 2×DQ with the three expected initializers and original weight removed).
  • Minor (private-symbol coupling on _cast_fp4/_replace_fp4qdq_with_2dq): Addressed with docstring warnings in nvfp4_exporter.py marking both helpers as load-bearing re-exports.
  • Minor (magic number 16 for BFLOAT16): Addressed — now uses onnx.TensorProto.BFLOAT16.
  • Minor (positional node.attribute[0] for block_size): Addressed — name-based lookup with explicit assert.

Correctness is fine: top-level imports for all symbols except the one documented circular-import workaround (_cast_fp4/_replace_fp4qdq_with_2dq in fp4qdq_to_2dq), stacklevel=2 on warnings.warn, shims are near-verbatim copies of the pre-removal implementations per the PR's documented rationale. Purely additive, no API removals, standard NVIDIA headers preserved, no licensing changes. PR body's test checklist items all appear checked.

Minor (non-blocking) nit: the new test methods put import warnings and from modelopt.onnx.quantization.qdq_utils import ... inside the methods rather than at the top of the test file; would be cleaner to hoist, but this is a test-file convention and not worth blocking.

@ajrasane ajrasane enabled auto-merge (squash) May 8, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants