fbgemm_fp8:Keep the current device aligned with the input tensor by kaixuanliu · Pull Request #46403 · huggingface/transformers

kaixuanliu · 2026-06-04T08:09:48Z

As the comment in the code: x_quantized and x_scale are not necessarily on the same device as x in L119, on xpu, although we use kernels-community/fp8-fbgemm, we still meet this problem. Even move the output can still produce incorrect output(nan output in middle layer). In this PR we use a context manager to fix the bug and optimize the code like what we did in mxfp4 .
Cases to re-produce:

RUN_SLOW=1 python -m pytest tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::FbgemmFp8Test::test_quantized_model_multi_gpu

RUN_SLOW=1 python -m pytest tests/quantization/fbgemm_fp8/test_fbgemm_fp8.py::FbgemmFp8Test::test_save_pretrained_multi_gpu

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

kaixuanliu · 2026-06-04T08:13:05Z

@SunMarc ,pls help review, thx!

SunMarc

Thanks, just a nit

SunMarc · 2026-06-04T12:20:35Z

+@contextmanager
+def on_device(tensor):
+    """Force the global current device to match ``tensor``'s device.
+
+    This keeps quantization kernel launches aligned with the input tensor device when the
+    process current device differs from the module placement.
+    """
+    device = getattr(tensor, "device", None)
+    device_type = getattr(device, "type", None)
+    if device_type == "cuda":
+        with torch.cuda.device(device):
+            yield
+    elif _is_torch_xpu_available and device_type == "xpu":
+        with torch.xpu.device(device):
+            yield
+    else:
+        yield
+


can we put that in another file so that we can reuse them across quants integration ?

kaixuanliu added 2 commits June 4, 2026 07:35

fbgemm_fp8:Keep the current device aligned with the input tensor

5d653e0

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

update comment

320115f

Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>

Merge branch 'main' into fbgemm_fp8_xpu

27e3eb4

kaixuanliu changed the title ~~Fbgemm fp8 xpu~~ fbgemm_fp8:Keep the current device aligned with the input tensor Jun 4, 2026

SunMarc reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fbgemm_fp8:Keep the current device aligned with the input tensor#46403

fbgemm_fp8:Keep the current device aligned with the input tensor#46403
kaixuanliu wants to merge 3 commits into
huggingface:mainfrom
kaixuanliu:fbgemm_fp8_xpu

kaixuanliu commented Jun 4, 2026 •

edited

Loading

Uh oh!

kaixuanliu commented Jun 4, 2026

Uh oh!

SunMarc left a comment

Uh oh!

SunMarc Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaixuanliu commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaixuanliu commented Jun 4, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaixuanliu commented Jun 4, 2026 •

edited

Loading