Fix BFloat16 conversion error in eager checkpoint loading by phu0ngng · Pull Request #3347 · AI-Hypercomputer/maxtext

phu0ngng · 2026-03-07T00:32:04Z

Description

to_maxtext.py crashes with TypeError: Got unsupported ScalarType BFloat16 when converting bf16 HuggingFace models (e.g. Qwen3, Llama 3) in eager mode. PyTorch's .numpy() doesn't support bfloat16 tensors. This fix casts bf16 tensors to float32 before the numpy conversion.
The lazy loading path is unaffected (safetensors handles this internally).

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

…onversion_bf16

shuningjin

Thanks for the fix!

I believe this error is dependent on transformer version.

Using older transformer version, from_pretrained by default loads bf16 ckpt in fp32. so it can pass with v.numpy()
using new transformer version, it loads bf16 ckpt as is, and hence error with v.numpy(). Yes, your fix v.float().numpy() will help.

Lastly, we are working more on loading & type conversion for to_maxtext in #3184

shuningjin · 2026-03-08T06:58:28Z

src/maxtext/checkpoint_conversion/to_maxtext.py

-    # Convert all to numpy immediately in eager mode
+    # Convert all to numpy immediately in eager mode.
+    # torch.Tensor.numpy() does not support bfloat16, so cast to float32 first.
+    import torch  # pylint: disable=g-import-not-at-top


move this import to the top?

We can, but then it will require torch in the lazy loading path as well, even though torch is not needed there. I think we should go with the current implementation to avoid torch requirements for lazy load.

@shuningjin what do you think?

@shuningjin, friendly reminder.

codecov · 2026-03-08T09:07:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

shuningjin · 2026-03-27T22:14:27Z

Thanks for your patience! We re-vamped the loading/conversion/save pipeline for to_maxtext in PR 3184, with careful consideration of speed and memory.

The current hf -> maxtext behavior would be:

eager loading (auto dtype as specified in config.json) -> conversion -> save with specified dtype (e.g., bf16 or f32, with bf16 as the recommended type to save memory).
A typical conversion path: torch.bfloat16 -> torch.float32 -> numpy.float32 -> ml_dtypes.bfloat16.

Could you verify if the latest code solve your original issue? Thanks!

phu0ngng added 2 commits March 6, 2026 16:26

fix bf16 type conversion

7bf11b5

Signed-off-by: Phuong Nguyen <phuonguyen@nvidia.com>

Merge branch 'main' of github.com:AI-Hypercomputer/maxtext into fix_c…

e8faa02

…onversion_bf16

phu0ngng requested review from NicoGrande, RissyRan, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, parambole, richjames0, shralex, shuningjin and suexu1025 as code owners March 7, 2026 00:32

shuningjin reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BFloat16 conversion error in eager checkpoint loading#3347

Fix BFloat16 conversion error in eager checkpoint loading#3347
phu0ngng wants to merge 2 commits intoAI-Hypercomputer:mainfrom
phu0ngng:fix_conversion_bf16

phu0ngng commented Mar 7, 2026

Uh oh!

shuningjin left a comment •

edited

Loading

Uh oh!

shuningjin Mar 8, 2026 •

edited

Loading

Uh oh!

phu0ngng Mar 9, 2026 •

edited

Loading

Uh oh!

phu0ngng Mar 11, 2026

Uh oh!

phu0ngng Mar 13, 2026

Uh oh!

codecov bot commented Mar 8, 2026

Uh oh!

shuningjin commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

phu0ngng commented Mar 7, 2026

Description

Checklist

Uh oh!

shuningjin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuningjin Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phu0ngng Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phu0ngng Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

phu0ngng Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 8, 2026

Codecov Report

Uh oh!

shuningjin commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shuningjin left a comment •

edited

Loading

shuningjin Mar 8, 2026 •

edited

Loading

phu0ngng Mar 9, 2026 •

edited

Loading

shuningjin commented Mar 27, 2026 •

edited

Loading