Skip to content

fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413

Open
makroumi wants to merge 2 commits intoNVIDIA:mainfrom
makroumi:nvfp4-block-size-validation
Open

fix(quantization): enforce Blackwell MX-unit alignment for NVFP4 block-size validation#1413
makroumi wants to merge 2 commits intoNVIDIA:mainfrom
makroumi:nvfp4-block-size-validation

Conversation

@makroumi
Copy link
Copy Markdown

@makroumi makroumi commented May 7, 2026

What does this PR do?

Type of change: Bug fix

QuantizerAttributeConfig.validate_block_sizes validates axis conflicts, dynamic single-axis constraints, and key types, but does not constrain the integer block size values for NVFP4 quantization. NVFP4 (num_bits=(2,1), scale_bits=(4,3)) targets Blackwell MMA tiles hardwired for block sizes of 16 or 32 elements. An illegal block size (e.g. 64, 128, 4) passes validation silently, propagates through calibration, and corrupts scale tensors at TensorRT-LLM export or produces garbage at deployment.

This PR adds a guard that rejects any integer block size not in {16, 32} when the NVFP4 format signature is detected. The check fires at Pydantic config construction time, before any GPU work is spent. Non-NVFP4 formats (INT8, FP8, INT4, MXFP4, etc.) are unaffected.

Usage

from modelopt.torch.quantization.config import QuantizeConfig

# This now raises ValidationError immediately
QuantizeConfig(
    quant_cfg=[{
        "quantizer_name": "*weight_quantizer",
        "cfg": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 64, "scale_bits": (4, 3)},
        },
    }],
    algorithm="max",
)
# PydanticValidationError: NVFP4 block_size must be 16 or 32 (Blackwell MMA tile), got 64

# block_size=16 and block_size=32 continue to work as before

Testing

8 new test cases in TestNVFP4BlockSizeValidation:

-test_nvfp4_block_16_accepted: canonical tile passes
-test_nvfp4_block_32_accepted: alternative tile passes
-test_nvfp4_illegal_block_size_rejected[8] through [256]: 5 parametrized illegal values rejected with correct error message
-test_non_nvfp4_block_size_unaffected: INT4 block_size=128 still passes
-test_nvfp4_without_scale_bits_unaffected: MXFP4 (scale_bits=(8,0)) skips constraint
All 60 tests in test_config_validation.py pass (52 existing + 8 new). All existing NVFP4 preset configs (NVFP4_DEFAULT_CFG, etc.) validated against the new constraint with zero regressions.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: N/A

Additional Information

7 lines added to modelopt/torch/quantization/config.py, 85 lines added to tests/unit/torch/quantization/test_config_validation.py, 4 lines added to CHANGELOG.rst. Zero lines modified, zero lines deleted, zero new imports, zero new functions.

Summary by CodeRabbit

  • Bug Fixes

    • Enforced NVFP4 block_size values to only 16 or 32 at configuration creation, preventing invalid configurations that could later corrupt scale data during export.
  • Tests

    • Added unit tests covering NVFP4 block_size validation, ensuring correct acceptance of 16/32 and rejection of invalid values while not affecting non-NVFP4 configurations.

NVFP4 quantization (num_bits=(2,1), scale_bits=(4,3)) targets Blackwell
MMA tiles which are hardwired for block sizes of 16 or 32 elements.
Prior to this change, illegal block sizes (e.g. 64, 128) passed
validation silently, corrupting scale tensors at export time after
wasting GPU hours on calibration.

Add a guard in QuantizerAttributeConfig.validate_block_sizes that
rejects any integer block_size not in {16, 32} when the NVFP4
signature is detected. Non-NVFP4 formats are unaffected.

Signed-off-by: Mehdi Makroumi <134870510+makroumi@users.noreply.github.com>
@makroumi makroumi requested a review from a team as a code owner May 7, 2026 23:31
@makroumi makroumi requested a review from Edwardf0t1 May 7, 2026 23:31
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c75f5bc-5206-4b35-9456-05d1f32f06cd

📥 Commits

Reviewing files that changed from the base of the PR and between 4fcb798 and b6852b0.

📒 Files selected for processing (3)
  • CHANGELOG.rst
  • modelopt/torch/quantization/config.py
  • tests/unit/torch/quantization/test_config_validation.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/torch/quantization/test_config_validation.py
  • modelopt/torch/quantization/config.py

📝 Walkthrough

Walkthrough

Adds NVFP4-specific validation that restricts quantization block_size values to 16 or 32 at config construction, accompanies it with unit tests covering valid/invalid and non-applicable cases, and documents the change in the 0.45 changelog.

Changes

NVFP4 Block Size Validation

Layer / File(s) Summary
Validation Constraint
modelopt/torch/quantization/config.py
QuantizerAttributeConfig.validate_block_sizes adds an NVFP4-specific check: when num_bits == (2, 1) and scale_bits == (4, 3), integer block_size values must be 16 or 32.
Test Coverage
tests/unit/torch/quantization/test_config_validation.py
TestNVFP4BlockSizeValidation verifies that NVFP4 configs accept 16/32, reject other values with a ValidationError matching the NVFP4 message, and that non-NVFP4 or mismatched scale_bits cases are unaffected.
Release Documentation
CHANGELOG.rst
Adds a 0.45 bug-fix entry documenting that NVFP4 block_size is now validated at Pydantic config construction time and limited to 16 or 32.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: enforcing block-size validation for NVFP4 by requiring alignment with Blackwell MX-unit constraints (16 or 32), which directly matches the core objective of the PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected. PR contains only validation logic enhancements and unit tests. No torch.load, numpy.load, trust_remote_code, eval/exec, nosec comments, or new dependencies present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@makroumi makroumi force-pushed the nvfp4-block-size-validation branch from 4fcb798 to 19d26f8 Compare May 7, 2026 23:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 458-464: The Pydantic validator validate_block_sizes in
modelopt.torch.quantization.config uses Python assert statements which are
removed under -O; replace each assert with an explicit exception (e.g., raise
ValueError with the same message) so validation always runs, including the NVFP4
check that currently asserts "_v in (16, 32)" and the other asserts in
validate_block_sizes (the checks at the locations referenced around the NVFP4
block). Update all assertions in validate_block_sizes to raise ValueError with
clear messages preserving the original assertion text and ensure any
loop/conditional logic and return behavior remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b2f49d17-e721-4201-b641-7be9d07a287e

📥 Commits

Reviewing files that changed from the base of the PR and between 6a3b6b8 and 4fcb798.

📒 Files selected for processing (3)
  • CHANGELOG.rst
  • modelopt/torch/quantization/config.py
  • tests/unit/torch/quantization/test_config_validation.py

Comment thread modelopt/torch/quantization/config.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant