Skip to content

Update CUDA/ROCm setup tests#1899

Open
sstamenk wants to merge 3 commits intobitsandbytes-foundation:mainfrom
sstamenk:update_setup_tests
Open

Update CUDA/ROCm setup tests#1899
sstamenk wants to merge 3 commits intobitsandbytes-foundation:mainfrom
sstamenk:update_setup_tests

Conversation

@sstamenk
Copy link
Contributor

@sstamenk sstamenk commented Mar 13, 2026

Updated backend-specific setup handling and tests for CUDA/ROCm library selection. The change makes get_cuda_bnb_library_path() fail fast when the wrong override variable is used (BNB_ROCM_VERSION on CUDA or BNB_CUDA_VERSION on ROCm), improves the warning/error guidance, and refreshes the evaluator tests to use BNB_BACKEND with clearer CUDA/ROCm coverage.
Removed the test_get_rocm_bnb_library_path_rocm_override_takes_priority unit-test as it doesn't provide much utility.

This PR is a subset of the changes originally proposed in #1888

@matthewdouglas matthewdouglas added CUDA Setup ROCm CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. labels Mar 17, 2026
@matthewdouglas matthewdouglas added this to the v0.50.0 milestone Mar 17, 2026
@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@matthewdouglas matthewdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Update CUDA/ROCm setup tests

(Note: this review was created with the assistance of an LLM.)

Thanks for the PR! The fail-fast behavior for wrong override variables and the HIP_ENVIRONMENTBNB_BACKEND migration are good improvements. A couple of issues to address before merging:

1. Missing path assertion in test_get_rocm_bnb_library_path_override

The diff removes the functional assertion that validates the override actually produces the correct library path:

assert get_cuda_bnb_library_path(rocm70_spec).stem == "libbitsandbytes_rocm72"

The test now only checks assert "BNB_ROCM_VERSION" in caplog.text, which verifies the warning was emitted but not that the correct library was selected. This looks like an accidental omission — please restore it.

2. test_get_cuda_bnb_library_path_override is fragile

This test sets BNB_CUDA_VERSION but doesn't clear BNB_ROCM_VERSION. With the new logic, rocm_override_value is checked first — so if BNB_ROCM_VERSION happens to be set in the environment, the test will raise RuntimeError instead of exercising the CUDA override path. Please add:

monkeypatch.delenv("BNB_ROCM_VERSION", raising=False)

(The base test test_get_cuda_bnb_library_path already does this correctly.)

Minor notes

  • The dead re.sub(r"rocm\d+", ...) call in the CUDA+BNB_ROCM_VERSION error path is harmless (regex doesn't match libbitsandbytes_cuda*) but could be avoided by moving the substitution after the torch.version.cuda check.
  • Consider keeping an adapted version of test_get_rocm_bnb_library_path_rocm_override_takes_priority — the "both overrides set" edge case is still reachable and worth covering, especially now that the branching logic has changed. Not blocking though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA Setup CUDA Issues and PRs related to the CUDA backend, excluding installation/support help. ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants