Skip to content

support AR format FP8 in vLLM#1798

Open
Zhenzhong1 wants to merge 5 commits into
mainfrom
zhenzhong/arformat_fp8
Open

support AR format FP8 in vLLM#1798
Zhenzhong1 wants to merge 5 commits into
mainfrom
zhenzhong/arformat_fp8

Conversation

@Zhenzhong1
Copy link
Copy Markdown
Contributor

@Zhenzhong1 Zhenzhong1 commented May 11, 2026

Related Issues

#1536

Type of Change

New feature

Test

auto-round --model /models/Llama-3.1-8B --scheme FP8_BLOCK --iters 0 --format auto_round

Output Model:
image

Copilot AI review requested due to automatic review settings May 11, 2026 05:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to enable exporting block-wise FP8 quantization using the auto_round / auto_round:fp8 format (targeting vLLM compatibility), rather than forcing users to export with the standalone fp8 format.

Changes:

  • Removes the format-compatibility rewrite that previously replaced auto_round with fp8 for block-wise FP8 configs.
  • Updates FP8 export to always emit weight_block_size for tuple group_size, and additionally emits modules_to_not_convert alongside ignored_layers for layers kept in high precision.

Comment thread auto_round/formats.py Outdated
Comment thread auto_round/formats.py
formats = tmp_format_name.split(",")
if isinstance(ar.group_size, tuple) and any(["auto_round" in f.lower() for f in formats]):
logger.warning(
"`auto_round` format can't be used for deploying block-wise fp8 quantization now, use `fp8` instead."
Copy link
Copy Markdown
Contributor

@wenhuach21 wenhuach21 May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the warning and change it to: auto_round:fp8 format only supports vLLM inference for now. We recommend using the FP8 format via --format fp8 instead .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. 33c5923

Comment thread auto_round/formats.py Outdated

if isinstance(ar.group_size, tuple) and any(["auto_round" in f.lower() for f in formats]):
logger.warning(
"auto_round:fp8 format only supports vLLM inference for now. We recommend using the FP8 format via --format fp8 instead."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add `` to -format fp8

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. c167289

Copy link
Copy Markdown
Contributor

@yiliu30 yiliu30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some UTs; others LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants