THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI by FileSystemGuy · Pull Request #562 · mlcommons/storage

FileSystemGuy · 2026-06-27T00:13:40Z

Summary

Speeds up the default test run by ~104s and closes a CI coverage gap that left three sibling test trees unexecuted.

test perf (commit 1): three tests in the fast lane were silently eating ~30s of unintended waits. Fixed the root cause — they remain in the fast lane.
- test_collects_multiple_errors was hitting a real SSH connect timeout (~20s) because the test passes hosts=['node1','node2'] and validate_benchmark_environment runs SSH probes unless skip_remote_checks=True.
- Two mocked-MPI tests (test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py, test_rank0_emits_markers_and_non_rank0_silent[0]) were execing the probe script in-process; rank 0 hits time.sleep(5.0). Patched time.sleep for the duration of the exec.
slow markers (commit 2): mark three genuinely-slow tests @pytest.mark.slow. Also declares the slow marker in kv_cache_benchmark/pyproject.toml (the root suite already has one).
- test_init_then_closed_datagen_no_env_var — ~17.5s full in-process CLI dispatcher.
- test_gpu_overflow_to_cpu — ~32s, 100×10K-token allocations.
- test_profile_allocate_vs_access_overhead — ~5.8s profiling test.
CI coverage (commit 3): the previous workflow ran uv run pytest, which only picked up testpaths=['tests'] from the root pyproject. Three other test trees (mlpstorage_py/tests, vdb_benchmark/tests, kv_cache_benchmark/tests) never executed in CI — exactly the kind of gap PRs test(vdb): declare 'test' optional-deps extra (psutil, h5py, pyyaml, mpi4py) #551–test(kv_cache): force TORCH/CUPY availability off in tier-order test #560 had to fix by hand. Each suite now runs in its own step (collecting tests/ and vdb_benchmark/tests/ together hits ImportPathMismatchError because both define a top-level tests package).
version bump (commit 4): 3.0.23 → 3.0.25 and uv.lock regen. PR THIS ONE FIRST: chore: bump dlio-benchmark pin to DLIO #38 merged + #37; version 3.0.23 -> 3.0.24 #550 takes 3.0.24, so this PR claims the next slot.

Local results (all 11 in-flight PRs + this PR's changes)

Suite	Before	After
`tests/`	86s, 2380 passed, 12 deselected	39s, 2379 passed, 13 deselected
`mlpstorage_py/tests/`	3.98s, 778 passed	unchanged
`vdb_benchmark/tests/`	0.86s, 136 passed	unchanged
`kv_cache_benchmark/tests/`	155s, 240 passed	98s, 238 passed, 2 deselected

Total: 3531 passed, 15 deselected (slow), 1 xfailed, 0 failed in ~142s.

Test plan

CI runs all four suites green on this PR.
Confirm pytest -m slow on the root suite picks up the 13 deselected tests (12 pre-existing + the newly-marked integration test).
Confirm pytest -m slow on kv_cache_benchmark/tests picks up the 2 newly-marked tests.
After PR THIS ONE FIRST: chore: bump dlio-benchmark pin to DLIO #38 merged + #37; version 3.0.23 -> 3.0.24 #550 merges, rebase and resolve the trivial pyproject.toml version conflict (3.0.24 → 3.0.25).

Notes

The four 5.3s test_shared_fs_probe_real_mpi.py::TestSharedFsProbeRealMpi::* tests stay in the fast lane intentionally — MPI is a sensitive area for bugs and these exercise real mpirun.
Stacked behind PRs THIS ONE FIRST: chore: bump dlio-benchmark pin to DLIO #38 merged + #37; version 3.0.23 -> 3.0.24 #550–test(kv_cache): force TORCH/CUPY availability off in tier-order test #560. The version bump will need a one-line rebase after THIS ONE FIRST: chore: bump dlio-benchmark pin to DLIO #38 merged + #37; version 3.0.23 -> 3.0.24 #550 lands.

- test_collects_multiple_errors: pass skip_remote_checks=True. Both dependency checks are mocked to raise, but the test left hosts=['node1','node2'] in args, which triggered a real SSH probe to nonexistent hosts and ate ~20s of TCP connect timeouts before the assertion ran. - test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py: patch time.sleep around the in-process exec of SHARED_FS_PROBE_SCRIPT. The probe's rank-0 D-49 quiesce path calls time.sleep(5.0); the unit test only locks call ordering, not timing. - test_rank0_emits_markers_and_non_rank0_silent[0]: same root cause — rank 0 hits the 5s quiesce. monkeypatch time.sleep for the test. No behavioral changes to production code.

- tests/integration: mark test_init_then_closed_datagen_no_env_var slow (~17.5s; full in-process CLI dispatcher exercising init + datagen). - kv_cache_benchmark/pyproject.toml: declare 'slow' marker and default to '-m not slow' (parity with the root suite). Without this, the next two slow marks would emit PytestUnknownMarkWarning and still run by default. - kv_cache_benchmark/tests: mark test_gpu_overflow_to_cpu slow (~32s; 100 x 10K-token allocations) and test_profile_allocate_vs_access_overhead slow (~5.8s; profiling). Net effect on default test run: root suite drops from 86s to 39s, kv_cache suite drops from 155s to 98s.

The previous workflow ran 'uv run pytest', which picked up only the root pyproject's testpaths=['tests']. The three sibling suites (mlpstorage_py/tests, vdb_benchmark/tests, kv_cache_benchmark/tests) were never executed in CI, so regressions in those areas could land without CI catching them — exactly the gap that PRs #551-#560 had to fix by hand. Each suite is invoked in its own step: - tests/ and vdb_benchmark/tests/ can't be collected in one pytest process (both define a top-level 'tests' package whose conftest.py modules collide via pytest's ImportPathMismatchError). - Each suite's pyproject defines its own '-m not slow' default, so subprocess-level invocation is the correct boundary. Also installs vdb_benchmark and kv_cache_benchmark editable so their imports resolve.

PR #550 bumps 3.0.23 -> 3.0.24. This PR lands on top, so bump to 3.0.25 directly. uv.lock regenerated to reflect the new project version (no dependency changes).

github-actions · 2026-06-27T00:13:50Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

FileSystemGuy added 4 commits June 26, 2026 17:12

chore: bump version 3.0.23 -> 3.0.25; regenerate uv.lock

1e26fee

PR #550 bumps 3.0.23 -> 3.0.24. This PR lands on top, so bump to 3.0.25 directly. uv.lock regenerated to reflect the new project version (no dependency changes).

FileSystemGuy requested a review from a team June 27, 2026 00:13

FileSystemGuy changed the title ~~test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI~~ THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562

THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562
FileSystemGuy wants to merge 4 commits into
mainfrom
test/perf-and-ci-all-suites

FileSystemGuy commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FileSystemGuy commented Jun 27, 2026

Summary

Local results (all 11 in-flight PRs + this PR's changes)

Test plan

Notes

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant