-
Notifications
You must be signed in to change notification settings - Fork 718
Implement row-scaled NVFP4 fprop recipe #2931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zianglih
wants to merge
46
commits into
NVIDIA:main
Choose a base branch
from
zianglih:fp4-per-token
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
14f77be
Adapt initial implementation and make quantization bitwise exact
zianglih 700cbce
Add col
zianglih cfd13bb
Add fp32
zianglih 866d337
Clean up tests
zianglih 5a6ea13
Clean up ref
zianglih ee0aafb
Clean up gemm wrapper
zianglih e852804
Clean up test
zianglih 9dbb3ad
Clean up
zianglih 475de8a
Rename and reformat
zianglih 62a1c1e
Avoid partial amax folding in gemm
zianglih 44e4e0f
Expand test coverage
zianglih 4755f09
Expand more tests
zianglih 55286ed
Turn on test for grouped linear sanity
zianglih e4829b8
Rename pertoken to per_token
zianglih dbbdecb
Expand .cu test
zianglih 2374a6e
Format after rebase
zianglih 5798285
Fix test after rebase
zianglih 233bb44
Clean up cpp test
zianglih 47c9cde
Extend cpp dequantize test
zianglih 21a19f5
Only pass `per_token_activation` to forward activation quantizer and …
zianglih 75c19d0
Minor fix test
zianglih a3e8305
Improve accuracy by unfolding weight per-tensor fp32
zianglih 027cb79
Fold row-wise quantization
zianglih 93a06ad
Drop column wise
zianglih db1c2a6
Clean up
zianglih 9eb06c7
Clean up
zianglih 21274d8
Clean up column wise
zianglih 4cbb43a
Move shared test helpers
zianglih d4ab1e7
Minor clean up test
zianglih 363335b
Readability
zianglih 1a4d3b0
Rename
zianglih 66622e8
Further refactor
zianglih 94b05e3
Clean up bias
zianglih 6c10ed2
Clean up cast
zianglih aa519d1
Avoid silently disable column wise
zianglih 90a97a4
Clean up
zianglih 600b4cd
`is_quantizable` returns false
zianglih cc9a210
Error out grouped gemm
zianglih 39f96c1
Tighten test
zianglih 4d34527
Rename verbose rowwise_amax_is_row_scaled
zianglih 9676563
Clean up
zianglih 0187d80
Explicitly handle both gemm input and error out
zianglih ee74019
Minor
zianglih 01a32ef
Nits and lint
zianglih 4e9bef5
Merge branch 'main' into fp4-per-token
zianglih afc99ad
Minor fix A100 ci
zianglih File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to change here to stay aligned with pytorch reference.