THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562
Open
FileSystemGuy wants to merge 4 commits into
Open
THIS ONE LAST: test+ci: fast-lane perf fixes, slow markers, run all 4 suites in CI#562FileSystemGuy wants to merge 4 commits into
FileSystemGuy wants to merge 4 commits into
Conversation
- test_collects_multiple_errors: pass skip_remote_checks=True. Both dependency checks are mocked to raise, but the test left hosts=['node1','node2'] in args, which triggered a real SSH probe to nonexistent hosts and ate ~20s of TCP connect timeouts before the assertion ran. - test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py: patch time.sleep around the in-process exec of SHARED_FS_PROBE_SCRIPT. The probe's rank-0 D-49 quiesce path calls time.sleep(5.0); the unit test only locks call ordering, not timing. - test_rank0_emits_markers_and_non_rank0_silent[0]: same root cause — rank 0 hits the 5s quiesce. monkeypatch time.sleep for the test. No behavioral changes to production code.
- tests/integration: mark test_init_then_closed_datagen_no_env_var slow (~17.5s; full in-process CLI dispatcher exercising init + datagen). - kv_cache_benchmark/pyproject.toml: declare 'slow' marker and default to '-m not slow' (parity with the root suite). Without this, the next two slow marks would emit PytestUnknownMarkWarning and still run by default. - kv_cache_benchmark/tests: mark test_gpu_overflow_to_cpu slow (~32s; 100 x 10K-token allocations) and test_profile_allocate_vs_access_overhead slow (~5.8s; profiling). Net effect on default test run: root suite drops from 86s to 39s, kv_cache suite drops from 155s to 98s.
The previous workflow ran 'uv run pytest', which picked up only the root pyproject's testpaths=['tests']. The three sibling suites (mlpstorage_py/tests, vdb_benchmark/tests, kv_cache_benchmark/tests) were never executed in CI, so regressions in those areas could land without CI catching them — exactly the gap that PRs #551-#560 had to fix by hand. Each suite is invoked in its own step: - tests/ and vdb_benchmark/tests/ can't be collected in one pytest process (both define a top-level 'tests' package whose conftest.py modules collide via pytest's ImportPathMismatchError). - Each suite's pyproject defines its own '-m not slow' default, so subprocess-level invocation is the correct boundary. Also installs vdb_benchmark and kv_cache_benchmark editable so their imports resolve.
PR #550 bumps 3.0.23 -> 3.0.24. This PR lands on top, so bump to 3.0.25 directly. uv.lock regenerated to reflect the new project version (no dependency changes).
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Speeds up the default test run by ~104s and closes a CI coverage gap that left three sibling test trees unexecuted.
test_collects_multiple_errorswas hitting a real SSH connect timeout (~20s) because the test passeshosts=['node1','node2']andvalidate_benchmark_environmentruns SSH probes unlessskip_remote_checks=True.test_bcast_precedes_barrier_in_executed_heredoc_with_mocked_mpi4py,test_rank0_emits_markers_and_non_rank0_silent[0]) were execing the probe script in-process; rank 0 hitstime.sleep(5.0). Patchedtime.sleepfor the duration of the exec.@pytest.mark.slow. Also declares theslowmarker inkv_cache_benchmark/pyproject.toml(the root suite already has one).test_init_then_closed_datagen_no_env_var— ~17.5s full in-process CLI dispatcher.test_gpu_overflow_to_cpu— ~32s, 100×10K-token allocations.test_profile_allocate_vs_access_overhead— ~5.8s profiling test.uv run pytest, which only picked uptestpaths=['tests']from the root pyproject. Three other test trees (mlpstorage_py/tests,vdb_benchmark/tests,kv_cache_benchmark/tests) never executed in CI — exactly the kind of gap PRs test(vdb): declare 'test' optional-deps extra (psutil, h5py, pyyaml, mpi4py) #551–test(kv_cache): force TORCH/CUPY availability off in tier-order test #560 had to fix by hand. Each suite now runs in its own step (collectingtests/andvdb_benchmark/tests/together hitsImportPathMismatchErrorbecause both define a top-leveltestspackage).Local results (all 11 in-flight PRs + this PR's changes)
tests/mlpstorage_py/tests/vdb_benchmark/tests/kv_cache_benchmark/tests/Total: 3531 passed, 15 deselected (slow), 1 xfailed, 0 failed in ~142s.
Test plan
pytest -m slowon the root suite picks up the 13 deselected tests (12 pre-existing + the newly-marked integration test).pytest -m slowonkv_cache_benchmark/testspicks up the 2 newly-marked tests.pyproject.tomlversion conflict (3.0.24 → 3.0.25).Notes
test_shared_fs_probe_real_mpi.py::TestSharedFsProbeRealMpi::*tests stay in the fast lane intentionally — MPI is a sensitive area for bugs and these exercise realmpirun.