fix: clamp dynamic NVFP4 FP8 scale export by ShawRong · Pull Request #1465 · NVIDIA/Model-Optimizer

ShawRong · 2026-05-12T09:34:58Z

What does this PR do?

Type of change: Bug fix

Clamp the dynamic NVFP4 per-block FP8 scale before casting it to torch.float8_e4m3fn during export.

The static quantizer-derived path already clamps the exported FP8 scale, but the dynamic path can still produce scale values larger than the finite FP8 E4M3 range. Since torch.float8_e4m3fn has no Inf representation, casting values such as 480 or 1000 produces NaN payloads in the exported scale tensor. This patch applies the same finite-range clamp to the dynamic path so exported dynamic NVFP4 scales remain finite.

Usage

# No API changes. Dynamic NVFP4 export now avoids NaN FP8 scale entries when
# the computed per-block scale exceeds the finite fp8_e4m3fn range.

Testing

uv run pytest tests/gpu/torch/quantization/test_qtensor_cuda.py::TestQTensor::test_nvfp4_dynamic_export_fp8_scale_no_nan_when_scale_exceeds_fp8 -q

Result: 1 passed.

Also ran pre-commit on the touched files:

uv run pre-commit run --files \
  modelopt/torch/quantization/qtensor/nvfp4_tensor.py \
  tests/gpu/torch/quantization/test_qtensor_cuda.py

Result: all applicable hooks passed.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: N/A

Additional Information

This complements the existing static NVFP4 scale clamp by applying the same safety guard to the dynamic scale export path.

Summary by CodeRabbit

Bug Fixes

Fixed NaN values in FP8 scale conversions during dynamic NVFP4 quantization export. Scale values now clamp properly when they exceed the valid FP8 range, preventing overflow and ensuring numerical stability in exported scale bytes.

Tests

Added regression test to verify FP8 scale exports no longer contain NaN bytes during dynamic NVFP4 operations.

Signed-off-by: ShawRong <shawnrong1213@gmail.com>

copy-pr-bot · 2026-05-12T09:35:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-12T09:35:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1919a19e-573f-4a66-901b-d1c0526bccaa

📥 Commits

Reviewing files that changed from the base of the PR and between 7a11fb2 and 446536a.

📒 Files selected for processing (2)

modelopt/torch/quantization/qtensor/nvfp4_tensor.py
tests/gpu/torch/quantization/test_qtensor_cuda.py

📝 Walkthrough

Walkthrough

NVFP4 quantization now prevents numeric saturation to NaN when converting very large per-block scales to FP8 format by clamping values to 448.0 before the conversion. A regression test verifies the fix works by checking that exported FP8 scale bytes contain no NaN values and saturate correctly.

Changes

NVFP4 FP8 Scale Saturation

Layer / File(s)	Summary
FP8 scale clamping logic `modelopt/torch/quantization/qtensor/nvfp4_tensor.py`	When `keep_high_precision` is `False`, `per_block_scale` is clamped to `max=448.0` before casting to `torch.float8_e4m3fn` to avoid NaN overflow; comments updated to explain the NaN-avoidance rationale.
Regression test for FP8 saturation `tests/gpu/torch/quantization/test_qtensor_cuda.py`	New CUDA test verifies that dynamic NVFP4 export of FP8 scale bytes contains no NaN (0x7F) when scale exceeds FP8 range and that the first byte saturates to 0x7E.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: clamping dynamic NVFP4 FP8 scale export to prevent NaN values.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found. PR contains math/numerical changes to NVFP4 quantization and test additions. None of the six critical patterns are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

shengliangxu

LGTM

fix: clamp dynamic NVFP4 FP8 scale export

446536a

Signed-off-by: ShawRong <shawnrong1213@gmail.com>

ShawRong marked this pull request as ready for review May 12, 2026 14:18

ShawRong requested a review from a team as a code owner May 12, 2026 14:18

ShawRong requested a review from sychen52 May 12, 2026 14:18

shengliangxu approved these changes May 12, 2026

View reviewed changes

Merge branch 'main' into fix/nvfp4-dynamic-scale-clamp

7335da8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clamp dynamic NVFP4 FP8 scale export#1465

fix: clamp dynamic NVFP4 FP8 scale export#1465
ShawRong wants to merge 2 commits into
NVIDIA:mainfrom
ShawRong:fix/nvfp4-dynamic-scale-clamp

ShawRong commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

shengliangxu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShawRong commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Bug Fixes

Tests

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShawRong commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading