4bit GEMM fix: per-device cudaFuncSetAttribute cache by matthewdouglas · Pull Request #1952 · bitsandbytes-foundation/bitsandbytes

matthewdouglas · 2026-05-22T18:02:40Z

cudaFuncSetAttribute for the smem limit needs to be set per-device on the MMA kernels, but is currently set only for one device on the first call. This PR changes it to be set once per kernel per device. As with GPU property cache, it is cached for up to 16 devices.

github-actions · 2026-05-22T18:07:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

4bit GEMM fix: per-device cudaFuncSetAttribute cache

7f7146a

matthewdouglas added this to the v0.50.0 milestone May 22, 2026

matthewdouglas added the CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. label May 22, 2026

matthewdouglas merged commit c59334e into main May 22, 2026
152 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

4bit GEMM fix: per-device cudaFuncSetAttribute cache#1952

4bit GEMM fix: per-device cudaFuncSetAttribute cache#1952
matthewdouglas merged 1 commit into
mainfrom
gemm4bit-fix-setattribute

matthewdouglas commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

matthewdouglas commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant