Update CUDA/ROCm setup tests#1899
Update CUDA/ROCm setup tests#1899sstamenk wants to merge 3 commits intobitsandbytes-foundation:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Review: Update CUDA/ROCm setup tests
(Note: this review was created with the assistance of an LLM.)
Thanks for the PR! The fail-fast behavior for wrong override variables and the HIP_ENVIRONMENT → BNB_BACKEND migration are good improvements. A couple of issues to address before merging:
1. Missing path assertion in test_get_rocm_bnb_library_path_override
The diff removes the functional assertion that validates the override actually produces the correct library path:
assert get_cuda_bnb_library_path(rocm70_spec).stem == "libbitsandbytes_rocm72"The test now only checks assert "BNB_ROCM_VERSION" in caplog.text, which verifies the warning was emitted but not that the correct library was selected. This looks like an accidental omission — please restore it.
2. test_get_cuda_bnb_library_path_override is fragile
This test sets BNB_CUDA_VERSION but doesn't clear BNB_ROCM_VERSION. With the new logic, rocm_override_value is checked first — so if BNB_ROCM_VERSION happens to be set in the environment, the test will raise RuntimeError instead of exercising the CUDA override path. Please add:
monkeypatch.delenv("BNB_ROCM_VERSION", raising=False)(The base test test_get_cuda_bnb_library_path already does this correctly.)
Minor notes
- The dead
re.sub(r"rocm\d+", ...)call in the CUDA+BNB_ROCM_VERSIONerror path is harmless (regex doesn't matchlibbitsandbytes_cuda*) but could be avoided by moving the substitution after thetorch.version.cudacheck. - Consider keeping an adapted version of
test_get_rocm_bnb_library_path_rocm_override_takes_priority— the "both overrides set" edge case is still reachable and worth covering, especially now that the branching logic has changed. Not blocking though.
Updated backend-specific setup handling and tests for CUDA/ROCm library selection. The change makes
get_cuda_bnb_library_path()fail fast when the wrong override variable is used (BNB_ROCM_VERSIONon CUDA orBNB_CUDA_VERSIONon ROCm), improves the warning/error guidance, and refreshes the evaluator tests to useBNB_BACKENDwith clearer CUDA/ROCm coverage.Removed the
test_get_rocm_bnb_library_path_rocm_override_takes_priorityunit-test as it doesn't provide much utility.This PR is a subset of the changes originally proposed in #1888