Skip to content

test(kv_cache): force TORCH/CUPY availability off in tier-order test#560

Open
FileSystemGuy wants to merge 1 commit into
mainfrom
fix/group-G-kvcache-tier-order
Open

test(kv_cache): force TORCH/CUPY availability off in tier-order test#560
FileSystemGuy wants to merge 1 commit into
mainfrom
fix/group-G-kvcache-tier-order

Conversation

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Summary

`test_tier_order_includes_fake_gpu`'s baseline assertion ("no `gpu` tier on a freshly constructed cache") was fragile: `MultiTierCache.init` adds the `gpu` backend whenever the module-level `TORCH_AVAILABLE` or `CUPY_AVAILABLE` flag is true (`cache.py:243`), regardless of `gpu_memory_gb`. On a dev box / CI runner with torch in the venv — the normal case for this repo — the cache starts with a real GPU backend (with a 0-byte limit), and the test's `assert 'gpu' not in tier_order_before` fires.

The test's intent is to exercise the fake-GPU-injection path: "injecting a backend after construction promotes the tier order to `['gpu', 'cpu', 'nvme']`". Whether the baseline cache happens to have a GPU backend at all is incidental, and depends on the runner's installed Python deps.

Pin the test to the no-GPU-library precondition by monkeypatching `TORCH_AVAILABLE` and `CUPY_AVAILABLE` to False at the `kv_cache.cache` module before constructing the cache. The post-construction state then matches the test's stated baseline regardless of whether torch is installed in the test environment.

Open question (out of scope for this PR)

Should `gpu_memory_gb=0` skip the GPU backend in production? Today every allocation does a redundant `_ensure_space_in_tier('gpu')` check that always returns False before falling through to CPU. That looks like a wart but is a production behavior change worth its own discussion + PR; the current PR's scope is restoring the test to passing.

Test plan

  • `uv run python -m pytest kv_cache_benchmark/tests/test_kv_cache.py::TestThreeTierEvictionCascade::test_tier_order_includes_fake_gpu -v` → 1 passed (was failing)
  • `uv run python -m pytest kv_cache_benchmark/tests/test_kv_cache.py::TestThreeTierEvictionCascade -v` → 3 passed (no collateral damage)

test_tier_order_includes_fake_gpu's baseline assertion (no 'gpu' tier
on a freshly constructed cache) was fragile: MultiTierCache.__init__
adds the 'gpu' backend whenever the module-level TORCH_AVAILABLE or
CUPY_AVAILABLE flag is true (cache.py:243), regardless of
gpu_memory_gb. On a dev box or CI runner with torch in the venv —
which is the normal case for this repo — the cache starts with a real
GPU backend (with a 0-byte limit), and the test's
``assert 'gpu' not in tier_order_before`` fires.

The test's *intent* is to exercise the fake-GPU injection path:
"injecting a backend after construction promotes the tier order to
['gpu', 'cpu', 'nvme']". Whether the baseline cache happens to have a
GPU backend at all is incidental, and depends on the runner's
installed Python deps.

Pin the test to the no-GPU-library precondition by monkeypatching
TORCH_AVAILABLE and CUPY_AVAILABLE to False at the
``kv_cache.cache`` module before constructing the cache. The
post-construction state then matches the test's stated baseline
regardless of whether torch is installed in the test environment.

Result: 239 passed + 1 failed → 240 passed in
``kv_cache_benchmark/tests/test_kv_cache.py``.

(Separate question: should ``gpu_memory_gb=0`` skip the GPU backend
in production? Today every allocation does a redundant ``_ensure_
space_in_tier('gpu')`` check that always returns False before falling
through to CPU. That looks like a wart but is a behavior change
worth its own PR — out of scope for this test fix.)
@FileSystemGuy FileSystemGuy requested a review from a team June 26, 2026 23:26
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant