Skip to content

THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562

Open
FileSystemGuy wants to merge 4 commits into
mainfrom
test/perf-and-ci-all-suites
Open

THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562
FileSystemGuy wants to merge 4 commits into
mainfrom
test/perf-and-ci-all-suites

Conversation

@FileSystemGuy

Copy link
Copy Markdown
Contributor

Summary

Speeds up the default test run by ~104s and closes a CI coverage gap that left three sibling test trees unexecuted.

  • test perf (commit 1): three tests in the fast lane were silently eating ~30s of unintended waits. Fixed the root cause — they remain in the fast lane.
    • test_collects_multiple_errors was hitting a real SSH connect timeout (~20s) because the test passes hosts=['node1','node2'] and validate_benchmark_environment runs SSH probes unless skip_remote_checks=True.
    • Two mocked-MPI tests (test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py, test_rank0_emits_markers_and_non_rank0_silent[0]) were execing the probe script in-process; rank 0 hits time.sleep(5.0). Patched time.sleep for the duration of the exec.
  • slow markers (commit 2): mark three genuinely-slow tests @pytest.mark.slow. Also declares the slow marker in kv_cache_benchmark/pyproject.toml (the root suite already has one).
    • test_init_then_closed_datagen_no_env_var — ~17.5s full in-process CLI dispatcher.
    • test_gpu_overflow_to_cpu — ~32s, 100×10K-token allocations.
    • test_profile_allocate_vs_access_overhead — ~5.8s profiling test.
  • CI coverage (commit 3): the previous workflow ran uv run pytest, which only picked up testpaths=['tests'] from the root pyproject. Three other test trees (mlpstorage_py/tests, vdb_benchmark/tests, kv_cache_benchmark/tests) never executed in CI — exactly the kind of gap PRs test(vdb): declare 'test' optional-deps extra (psutil, h5py, pyyaml, mpi4py) #551test(kv_cache): force TORCH/CUPY availability off in tier-order test #560 had to fix by hand. Each suite now runs in its own step (collecting tests/ and vdb_benchmark/tests/ together hits ImportPathMismatchError because both define a top-level tests package).
  • version bump (commit 4): 3.0.23 → 3.0.25 and uv.lock regen. PR THIS ONE FIRST: chore: bump dlio-benchmark pin to DLIO #38 merged + #37; version 3.0.23 -> 3.0.24 #550 takes 3.0.24, so this PR claims the next slot.

Local results (all 11 in-flight PRs + this PR's changes)

Suite Before After
tests/ 86s, 2380 passed, 12 deselected 39s, 2379 passed, 13 deselected
mlpstorage_py/tests/ 3.98s, 778 passed unchanged
vdb_benchmark/tests/ 0.86s, 136 passed unchanged
kv_cache_benchmark/tests/ 155s, 240 passed 98s, 238 passed, 2 deselected

Total: 3531 passed, 15 deselected (slow), 1 xfailed, 0 failed in ~142s.

Test plan

Notes

- test_collects_multiple_errors: pass skip_remote_checks=True. Both
  dependency checks are mocked to raise, but the test left
  hosts=['node1','node2'] in args, which triggered a real SSH probe
  to nonexistent hosts and ate ~20s of TCP connect timeouts before
  the assertion ran.
- test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py:
  patch time.sleep around the in-process exec of SHARED_FS_PROBE_SCRIPT.
  The probe's rank-0 D-49 quiesce path calls time.sleep(5.0); the unit
  test only locks call ordering, not timing.
- test_rank0_emits_markers_and_non_rank0_silent[0]: same root cause —
  rank 0 hits the 5s quiesce. monkeypatch time.sleep for the test.

No behavioral changes to production code.
- tests/integration: mark test_init_then_closed_datagen_no_env_var slow
  (~17.5s; full in-process CLI dispatcher exercising init + datagen).
- kv_cache_benchmark/pyproject.toml: declare 'slow' marker and default
  to '-m not slow' (parity with the root suite). Without this, the
  next two slow marks would emit PytestUnknownMarkWarning and still
  run by default.
- kv_cache_benchmark/tests: mark test_gpu_overflow_to_cpu slow
  (~32s; 100 x 10K-token allocations) and
  test_profile_allocate_vs_access_overhead slow (~5.8s; profiling).

Net effect on default test run: root suite drops from 86s to 39s,
kv_cache suite drops from 155s to 98s.
The previous workflow ran 'uv run pytest', which picked up only the
root pyproject's testpaths=['tests']. The three sibling suites
(mlpstorage_py/tests, vdb_benchmark/tests, kv_cache_benchmark/tests)
were never executed in CI, so regressions in those areas could land
without CI catching them — exactly the gap that PRs #551-#560 had
to fix by hand.

Each suite is invoked in its own step:
- tests/ and vdb_benchmark/tests/ can't be collected in one pytest
  process (both define a top-level 'tests' package whose conftest.py
  modules collide via pytest's ImportPathMismatchError).
- Each suite's pyproject defines its own '-m not slow' default, so
  subprocess-level invocation is the correct boundary.

Also installs vdb_benchmark and kv_cache_benchmark editable so their
imports resolve.
PR #550 bumps 3.0.23 -> 3.0.24. This PR lands on top, so bump to
3.0.25 directly. uv.lock regenerated to reflect the new project
version (no dependency changes).
@FileSystemGuy FileSystemGuy requested a review from a team June 27, 2026 00:13
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@FileSystemGuy FileSystemGuy changed the title test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant