Skip to content

4bit GEMM fix: per-device cudaFuncSetAttribute cache#1952

Merged
matthewdouglas merged 1 commit into
mainfrom
gemm4bit-fix-setattribute
May 22, 2026
Merged

4bit GEMM fix: per-device cudaFuncSetAttribute cache#1952
matthewdouglas merged 1 commit into
mainfrom
gemm4bit-fix-setattribute

Conversation

@matthewdouglas
Copy link
Copy Markdown
Member

cudaFuncSetAttribute for the smem limit needs to be set per-device on the MMA kernels, but is currently set only for one device on the first call. This PR changes it to be set once per kernel per device. As with GPU property cache, it is cached for up to 16 devices.

@matthewdouglas matthewdouglas added this to the v0.50.0 milestone May 22, 2026
@matthewdouglas matthewdouglas added the CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. label May 22, 2026
@github-actions
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@matthewdouglas matthewdouglas merged commit c59334e into main May 22, 2026
152 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA Issues and PRs related to the CUDA backend, excluding installation/support help.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant