[6106576] Restore qdq_utils symbols used by edgellm 0.6.1#1408
[6106576] Restore qdq_utils symbols used by edgellm 0.6.1#1408
Conversation
📝 WalkthroughWalkthroughAdds three deprecated legacy compatibility shims to ChangesLegacy Export Compatibility Layer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
Bug 6106576 reports that even after #1356 restored ``modelopt.onnx.llm_export_utils`` as a deprecation shim, TensorRT-Edge-LLM 0.6.1 still fails at module load. The next line in ``onnx_utils.py`` is from modelopt.onnx.quantization.qdq_utils import ( fp4qdq_to_2dq, quantize_weights_to_int4, quantize_weights_to_mxfp8, ) These three symbols were removed when the LLM ONNX export pipeline was refactored into the new staged ``modelopt.onnx.export`` exporters (INT4/NVFP4/MXFP8). Their behavior is preserved by the new exporters, but edgellm 0.6.1 imports the old top-level functions unconditionally, so the import chain still aborts with ``ImportError``. This restores the three functions on ``modelopt.onnx.quantization.qdq_utils`` as deprecation shims that emit ``DeprecationWarning`` and reuse the existing helpers (``_cast_fp8`` here, ``_cast_fp4`` and ``_replace_fp4qdq_with_2dq`` from ``modelopt.onnx.export.nvfp4_exporter`` via lazy import to avoid a circular import). No new files; only one file changes. The deprecation message points readers to the new exporters and to TensorRT-Edge-LLM as the long-term migration target. I also audited every other ``modelopt`` import edgellm 0.6.1 makes (``modelopt.torch.quantization``, ``modelopt.torch.opt``, ``modelopt.torch.export.quant_utils``, ``modelopt.onnx.quantization.gs_patching``, etc.) and confirmed they all still resolve on current main; only the three ``qdq_utils`` symbols above were missing. Verified with: - All 4 edgellm-side import lines from the bug's traceback now succeed. - ``tests/unit/onnx/quantization/test_qdq_utils.py`` and ``tests/unit/onnx/test_onnx_utils.py`` (36 tests) pass. - Each restored shim emits ``DeprecationWarning`` when called. Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
41828cb to
c61ac69
Compare
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
modelopt/onnx/quantization/qdq_utils.py (2)
1571-1573: 💤 Low valueFragile attribute access pattern.
Line 1573 assumes the first attribute of
TRT_FP4QDQis alwaysblock_size. If the node's attributes are reordered in future ONNX exports, this would silently use the wrong value.Consider iterating over attributes to find
block_sizeby name, similar to how other functions in this file handle attributes (e.g., lines 1330-1332, 1444-1458).🔧 Suggested fix
- block_size = node.attribute[0].i + block_size = None + for attr in node.attribute: + if attr.name == "block_size": + block_size = attr.i + break + assert block_size is not None, f"block_size attribute not found for {node.name}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1571 - 1573, The code currently assumes block_size is at node.attribute[0] for TRT_FP4QDQ (lines around the assert using initializer_indices and block_size = node.attribute[0].i); instead, iterate over node.attribute to find the attribute with name "block_size" and use its .i value (fall back to raising a clear error if not found). Keep the existing initializer lookup/assert (initializer_indices.get(node.input[0], None)) but replace the positional access to node.attribute[0] with a search loop that matches attr.name == "block_size" to make the TRT_FP4QDQ handling robust to attribute reordering.
1555-1561: 💤 Low valueMagic number for dtype detection.
The check
initializer.data_type == 16uses a magic number for bfloat16. This is acceptable for deprecated code, but a brief comment would aid maintainability until removal.📝 Suggested improvement
def _get_precision_dtype() -> str: precision_dtype = "Half" for initializer in graph.initializer: - if initializer.data_type == 16: + if initializer.data_type == 16: # BFLOAT16 precision_dtype = "BFloat16" break return precision_dtype🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1555 - 1561, The _get_precision_dtype function uses a magic number (16) to detect BFloat16 via initializer.data_type; replace the literal with a named constant or reference the ONNX TensorProto.DataType enum (e.g., TensorProto.BFLOAT16) to make intent explicit, or at minimum add a short inline comment explaining that 16 corresponds to BFloat16; update the check in _get_precision_dtype (and any similar occurrences) to use the named symbol instead of the raw number to improve readability and maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/onnx/quantization/qdq_utils.py`:
- Around line 1571-1573: The code currently assumes block_size is at
node.attribute[0] for TRT_FP4QDQ (lines around the assert using
initializer_indices and block_size = node.attribute[0].i); instead, iterate over
node.attribute to find the attribute with name "block_size" and use its .i value
(fall back to raising a clear error if not found). Keep the existing initializer
lookup/assert (initializer_indices.get(node.input[0], None)) but replace the
positional access to node.attribute[0] with a search loop that matches attr.name
== "block_size" to make the TRT_FP4QDQ handling robust to attribute reordering.
- Around line 1555-1561: The _get_precision_dtype function uses a magic number
(16) to detect BFloat16 via initializer.data_type; replace the literal with a
named constant or reference the ONNX TensorProto.DataType enum (e.g.,
TensorProto.BFLOAT16) to make intent explicit, or at minimum add a short inline
comment explaining that 16 corresponds to BFloat16; update the check in
_get_precision_dtype (and any similar occurrences) to use the named symbol
instead of the raw number to improve readability and maintainability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cc203c0e-66d9-42ce-ad4d-cf5c78638b50
📒 Files selected for processing (1)
modelopt/onnx/quantization/qdq_utils.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1408 +/- ##
==========================================
- Coverage 76.73% 74.63% -2.10%
==========================================
Files 476 476
Lines 51306 51939 +633
==========================================
- Hits 39368 38766 -602
- Misses 11938 13173 +1235
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Compat-shim follow-up to #1356 that restores three removed symbols on qdq_utils (quantize_weights_to_int4, quantize_weights_to_mxfp8, fp4qdq_to_2dq) so edgellm 0.6.1 imports can resolve. Code looks fine: imports all resolve (pack_weights_to_int4, compute_e8m0, get_amax, quantize, get_weights_scaling_factor[_2] from quant_utils; get_attribute/has_attribute/read_f16_tensor_as_fp32 from modelopt.onnx.utils; _cast_fp8/cast_initializer_to_dtype already defined in this file; _cast_fp4/_replace_fp4qdq_with_2dq lazy-imported from nvfp4_exporter to avoid the documented circular import). Each shim emits a DeprecationWarning pointing at the staged exporters, as claimed. The implementations are near-verbatim copies of the pre-removal versions, so behavior should match what edgellm previously depended on.
Two things for a human to confirm before merging:
-
No tests for the restored shims.
tests/unit/onnx/quantization/test_qdq_utils.pyexercisesINT4QuantExporter/MXFP8QuantExporter/NVFP4QuantExporter.process_model(...), not the restored top-levelquantize_weights_to_int4/quantize_weights_to_mxfp8/fp4qdq_to_2dqfunctions. The PR body claims these tests cover the shims, but they don't — the shims are ~390 lines of graph-manipulation logic with zero CI coverage on this branch. Since the staged exporters split work differently (per the PR's own "notes for reviewers"), passing tests on the exporters doesn't prove the legacy single-shot path still works. A couple of smoke tests that actually call each shim on a minimalModelProto(as the PR description says were run manually) would be cheap insurance and would keep future refactors from silently re-breaking edgellm. Same gap was flagged for the sibling shim PR #1356. -
Private-symbol coupling.
fp4qdq_to_2dqlazy-imports_cast_fp4and_replace_fp4qdq_with_2dq(both underscore-prefixed) frommodelopt.onnx.export.nvfp4_exporter. The dedup is nice, but this means any rename/signature change to those private helpers in the exporter will silently break the edgellm shim. Worth either (a) promoting them to module-private helpers inqdq_utilsitself (duplication, but the shim is explicitly documented as a copy of the legacy path anyway), or (b) adding a comment innvfp4_exporter.pynoting that the shim depends on these symbols so a future refactor doesn't drop them.
Neither is a blocker — the PR is purely additive, scoped to an urgent compat fix, and removes as soon as edgellm moves to modelopt.onnx.export.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Compat-shim PR that restores fp4qdq_to_2dq, quantize_weights_to_int4, and quantize_weights_to_mxfp8 on qdq_utils for edgellm 0.6.1 — a follow-up to #1356. The code itself looks correct: all imports resolve to real symbols (pack_weights_to_int4, compute_e8m0, get_amax, quantize, get_weights_scaling_factor[_2] from quant_utils; get_attribute/has_attribute/read_f16_tensor_as_fp32 from modelopt.onnx.utils; _cast_fp4/_replace_fp4qdq_with_2dq lazy-imported from nvfp4_exporter with the circular-import rationale documented inline), each shim emits a DeprecationWarning, and the implementations are near-verbatim copies of the pre-removal versions.
Reason to nudge rather than approve: the previous review's most important comment — no direct test coverage for the restored shims — is still unaddressed. tests/unit/onnx/quantization/test_qdq_utils.py exercises INT4QuantExporter/MXFP8QuantExporter/NVFP4QuantExporter.process_model(...), not the three restored top-level functions. The PR body claims those tests cover the shims, but they don't — and the PR's own "notes for reviewers" section explains that the staged exporters split work differently than the legacy single-shot entry points (e.g. the bias-to-fp16 cast inside quantize_weights_to_int4), which is exactly why passing exporter tests doesn't prove the legacy path still works. A few smoke tests that actually call each shim on a minimal ModelProto would be cheap insurance and match what the PR description says was already run manually. The two CodeRabbit nitpicks (magic 16 for BFLOAT16, positional node.attribute[0] access for block_size) are minor and inherited from the pre-removal code, so I'd leave those to author discretion.
A human reviewer should confirm whether shipping this compat fix without direct shim tests is acceptable given the urgency.
- fp4qdq_to_2dq: look up block_size by attribute name instead of position so the shim does not silently use the wrong attribute if TRT_FP4QDQ attribute ordering changes. - _get_precision_dtype: use onnx.TensorProto.BFLOAT16 instead of the literal 16 for readability. - nvfp4_exporter: note in the docstrings of _cast_fp4 and _replace_fp4qdq_with_2dq that they are reused by the deprecated qdq_utils.fp4qdq_to_2dq shim, so a future refactor does not silently drop them. - Add direct smoke tests for quantize_weights_to_int4, quantize_weights_to_mxfp8, and fp4qdq_to_2dq that assert each shim emits a DeprecationWarning and produces the expected end-state graph (the existing tests only exercise the staged exporters). Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review: all previous comments have been addressed.
- Critical (cjluo-nv bot, repeated across two reviews): missing direct tests for the restored shims — addressed. The new
TestLegacyEdgeLLMShimsclass intests/unit/onnx/quantization/test_qdq_utils.pycalls each shim directly on minimalModelProtofixtures, assertsDeprecationWarningis emitted, and verifies the expected transformations (INT4 data type + Reshape/Transpose removal, FLOAT8E4M3FN + Gelu tanh approximation, TRT_FP4QDQ → 2×DQ with the three expected initializers). - Minor (cjluo-nv): private-symbol coupling from
fp4qdq_to_2dqto_cast_fp4/_replace_fp4qdq_with_2dq— addressed via docstring warnings innvfp4_exporter.pymarking both helpers as load-bearing re-exports. - Minor (CodeRabbit): fragile
node.attribute[0]positional access forblock_size— addressed with a name-based lookup (next((attr for attr in node.attribute if attr.name == "block_size"), None)) plus an explicit assert. - Minor (CodeRabbit): magic number
16for BFLOAT16 — addressed by usingonnx.TensorProto.BFLOAT16.
Correctness is fine: imports are at the top, the one lazy import (_cast_fp4/_replace_fp4qdq_with_2dq) has a documented circular-import rationale, deprecation warnings use stacklevel=2, and the shims are near-verbatim copies of the pre-removal implementations per the PR's documented rationale. Purely additive compat shim, no licensing changes, no API removals.
Testing: Test plan has 1 unchecked item(s) out of 5. Finish or remove them before approving.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review: all previous critical and minor comments have been addressed.
- Critical (missing direct shim tests): Addressed.
TestLegacyEdgeLLMShimsnow has three tests (four with thewith_transposeparametrization) that directly callquantize_weights_to_int4,quantize_weights_to_mxfp8, andfp4qdq_to_2dqon minimalModelProtofixtures, assertDeprecationWarningis emitted with the function name, and verify the expected graph transformations (INT4 data type + Reshape/Transpose removal; FLOAT8E4M3FN + Gelu tanh approximation; TRT_FP4QDQ → 2×DQ with the three expected initializers and original weight removed). - Minor (private-symbol coupling on
_cast_fp4/_replace_fp4qdq_with_2dq): Addressed with docstring warnings innvfp4_exporter.pymarking both helpers as load-bearing re-exports. - Minor (magic number
16for BFLOAT16): Addressed — now usesonnx.TensorProto.BFLOAT16. - Minor (positional
node.attribute[0]forblock_size): Addressed — name-based lookup with explicit assert.
Correctness is fine: top-level imports for all symbols except the one documented circular-import workaround (_cast_fp4/_replace_fp4qdq_with_2dq in fp4qdq_to_2dq), stacklevel=2 on warnings.warn, shims are near-verbatim copies of the pre-removal implementations per the PR's documented rationale. Purely additive, no API removals, standard NVIDIA headers preserved, no licensing changes. PR body's test checklist items all appear checked.
Minor (non-blocking) nit: the new test methods put import warnings and from modelopt.onnx.quantization.qdq_utils import ... inside the methods rather than at the top of the test file; would be cleaner to hoist, but this is a test-file convention and not worth blocking.
What does this PR do?
Type of change: Bug fix (backward-compat restoration), follow-up to #1356.
#1356 restored
modelopt.onnx.llm_export_utilsas a deprecation shim so that TensorRT-Edge-LLM 0.6.1 could resume importingfold_fp8_qdq_to_dq. QA reports that the next line oftensorrt_edgellm/onnx_export/onnx_utils.pystill aborts at module load:Those three symbols were removed when the LLM ONNX export pipeline was refactored into the staged
modelopt.onnx.exportexporters (INT4QuantExporter,NVFP4QuantExporter,MXFP8QuantExporter). The new exporters still implement the same conversions, but edgellm 0.6.1 imports the old top-level functions unconditionally, so everytensorrt-edgellm-*CLI fails immediately — same 5/5 (100%) failure rate as before, just one line further in.What this PR does
fp4qdq_to_2dq,quantize_weights_to_int4, andquantize_weights_to_mxfp8onmodelopt.onnx.quantization.qdq_utilsas deprecation shims that:DeprecationWarningpointing readers at the newmodelopt.onnx.exportexporters and at TensorRT-Edge-LLM as the long-term migration target,_cast_fp8here,_cast_fp4and_replace_fp4qdq_with_2dqfrommodelopt.onnx.export.nvfp4_exportervia a lazy import to avoid a circular import —nvfp4_exporteralready importsonnx_dtype_mapfrom this module),pack_weights_to_int4,get_amax,compute_e8m0,get_weights_scaling_factor[_2],quantize,get_attribute,has_attribute, andread_f16_tensor_as_fp32from their canonical locations rather than duplicating them.qdq_utils.py. No new files, no public API removals, no behavior changes for any existing caller.I also audited every other
modeloptimport edgellm 0.6.1 makes (modelopt.torch.quantization,modelopt.torch.opt,modelopt.torch.export.quant_utils,modelopt.torch.opt.plugins.huggingface,modelopt.onnx.quantization.gs_patching,modelopt.torch.quantization.nn.{QuantLinear,TensorQuantizer},modelopt.torch.quantization.utils.{is_quantized,is_quantized_linear}) and confirmed each still resolves on currentmain; only the threeqdq_utilssymbols above were missing.Testing
tests/unit/onnx/quantization/test_qdq_utils.py+tests/unit/onnx/test_onnx_utils.py: 36 passed in 5s.DeprecationWarningwhen called and runs to completion on an emptyModelProtosmoke input.pre-commit run --files modelopt/onnx/quantization/qdq_utils.py: ruff, ruff-format, mypy, bandit, license header all pass.Notes for reviewers
compute_scales/compress_weights/post_processstages, while edgellm calls the legacy single-shot entry points and depends on side effects (e.g. casting bias to fp16 insidequantize_weights_to_int4) that the staged exporters do not all reproduce in one call. Wrapping them up was higher-risk than restoring the legacy paths verbatim with a deprecation warning.llm_export_utilsshim — drop together once edgellm releases a version that targetsmodelopt.onnx.exportdirectly.Before your PR is "Ready for review"
Summary by CodeRabbit