Fix DeepSeek PTQ script by cjluo-nv · Pull Request #912 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-02-20T18:56:41Z

What does this PR do?

Type of change: ? Bug fix

Overview: ?

Fix two bugs in the PTQ script

Testing

Run DeepseekV3.2 PTQ and export

Summary by CodeRabbit

Refactor
- Enhanced data type handling in quantization examples for bf16 operations
- Updated internal dependencies for quantization utilities to improve modularity

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai · 2026-02-20T18:57:02Z

📝 Walkthrough

Walkthrough

Two example files are updated: one adds an explicit dtype parameter to a weight dequantization call for bfloat16 paths, and the other changes the import source for weight_dequant from ds_kernel to modelopt.torch.quantization.triton.

Changes

Cohort / File(s)	Summary
Dequantization Parameter `examples/deepseek/ptq.py`	Added explicit `dtype=torch.bfloat16` parameter to `weight_dequant` call in bf16 branch path.
Import Source Update `examples/deepseek/quantize_to_nvfp4.py`	Changed `weight_dequant` import source from `ds_kernel` to `modelopt.torch.quantization.triton`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Fix DeepSeek PTQ script' is vague and generic. While it relates to the changeset, it lacks specificity about what bugs were fixed (dtype handling or import path changes).	Consider a more specific title like 'Fix DeepSeek PTQ: Add explicit dtype and update weight_dequant import' to clearly communicate the main changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chenjie/fix_deepseek

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

examples/deepseek/quantize_to_nvfp4.py (1)
210-210: ⚠️ Potential issue | 🟠 Major

Missing dtype=torch.bfloat16 — same stale-default bug that was fixed in ptq.py but not here.

torch.set_default_dtype(torch.bfloat16) at line 173 sets the runtime default, but weight_dequant's parameter dtype=torch.get_default_dtype() is evaluated once at import time (when the module is defined), capturing torch.float32. The subsequent torch.set_default_dtype call has no effect on the already-captured default. Calling weight_dequant(item, scale_inv) without an explicit dtype will produce a float32 tensor, even though the result is stored in bf16_state_dict and fed into NVFP4/FP8 quantization — causing silent dtype mismatches and 2× memory overhead for the dequantized weights.
🐛 Proposed fix — consistent with the `ptq.py` fix
-                    bf16_state_dict[key] = weight_dequant(item, scale_inv)
+                    bf16_state_dict[key] = weight_dequant(item, scale_inv, dtype=torch.bfloat16)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/deepseek/quantize_to_nvfp4.py` at line 210, The call to
weight_dequant(item, scale_inv) relies on a default dtype captured at import
time (torch.get_default_dtype()) and thus produces float32 despite
torch.set_default_dtype(torch.bfloat16); fix by passing an explicit
dtype=torch.bfloat16 when calling weight_dequant so bf16_state_dict entries are
actually bfloat16 (or alternatively update weight_dequant's signature to default
to torch.bfloat16), e.g., change the call site that writes into bf16_state_dict
to call weight_dequant(item, scale_inv, dtype=torch.bfloat16).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@examples/deepseek/quantize_to_nvfp4.py`:
- Line 210: The call to weight_dequant(item, scale_inv) relies on a default
dtype captured at import time (torch.get_default_dtype()) and thus produces
float32 despite torch.set_default_dtype(torch.bfloat16); fix by passing an
explicit dtype=torch.bfloat16 when calling weight_dequant so bf16_state_dict
entries are actually bfloat16 (or alternatively update weight_dequant's
signature to default to torch.bfloat16), e.g., change the call site that writes
into bf16_state_dict to call weight_dequant(item, scale_inv,
dtype=torch.bfloat16).

codecov · 2026-02-20T20:30:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.10%. Comparing base (9e38041) to head (2c60f43).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #912      +/-   ##
==========================================
- Coverage   73.54%   73.10%   -0.44%     
==========================================
  Files         205      205              
  Lines       22000    22281     +281     
==========================================
+ Hits        16179    16288     +109     
- Misses       5821     5993     +172

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? Fix two bugs in the PTQ script ## Testing Run DeepseekV3.2 PTQ and export  ## Summary by CodeRabbit * **Refactor** * Enhanced data type handling in quantization examples for bf16 operations * Updated internal dependencies for quantization utilities to improve modularity  Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

Fix DeepSeek PTQ script

2c60f43

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv requested a review from a team as a code owner February 20, 2026 18:56

cjluo-nv requested a review from sugunav14 February 20, 2026 18:56

cjluo-nv requested a review from meenchen February 20, 2026 18:58

coderabbitai bot reviewed Feb 20, 2026

View reviewed changes

sugunav14 approved these changes Feb 20, 2026

View reviewed changes

cjluo-nv enabled auto-merge (squash) February 20, 2026 19:18

cjluo-nv merged commit 9975ba1 into main Feb 20, 2026
35 checks passed

cjluo-nv deleted the chenjie/fix_deepseek branch February 20, 2026 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix DeepSeek PTQ script#912

Fix DeepSeek PTQ script#912
cjluo-nv merged 1 commit intomainfrom
chenjie/fix_deepseek

cjluo-nv commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

cjluo-nv commented Feb 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Feb 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjluo-nv commented Feb 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 20, 2026 •

edited

Loading