[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615
[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615Wanli-Jiang wants to merge 1 commit into
Conversation
…pre-merge CI Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
3d6851a to
c815cd7
Compare
|
/bot run --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughThe PR removes VideoMME-related multimodal accuracy coverage, narrows several parametrized test cases to single configurations, and adjusts integration test lists for DGX B200 and L40S. ChangesMultimodal test coverage updates
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
PR_Github #55721 [ run ] triggered by Bot. Commit: |
| skip_pre_blackwell, | ||
| skip_pre_hopper, | ||
| ) | ||
| from .accuracy_core import MMMU, LlmapiAccuracyTestHarness, VideoMME, VoxPopuli |
There was a problem hiding this comment.
Can we keep VideoMME? We have zero video coverage otherwise.
|
PR_Github #55721 [ run ] completed with state
|
Multimodal Accuracy Tests — Change Summary
Trim duplicated / dead / over-budget multimodal accuracy tests from pre-merge CI.
Runtimes = OpenSearch L0 CI, 45-day avg. No input modality (image/video/audio)
coverage is lost — every change keeps a cheaper or equivalent guard (logit-match
unittest, EPD test, post-merge, or QA).
TestNanoV3Omni::test_auto_dtype[fp8]TestNanoV3Omni::test_auto_dtype[nvfp4]TestKimiK25::test_nvfp4[dep8]TestNemotron_Nano_12B_V2_VL::test_auto_dtype[full_budget]forced_chunked_prefillsibling covers the modelTestQwen3VL::test_auto_dtype[full_budget]TestMistralSmall24B::test_auto_dtype[full_budget]TestKimiK25::test_nvfp4[tp8] / [tp8_attn_dp] / [ep8]test_llm_api_pytorch.pyruns theseTestQwen2_5_VL_7B::test_auto_dtypeTestVILA1_5_3B::test_auto_dtypemodeling_vila(38 s) stays pre-merge; QA retains MMMUexamples/test_multimodal.py::...[video-neva]+[kosmos-2]Files touched
tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py— A1, B1–B4tests/integration/test_lists/test-db/l0_dgx_b200.yml— A2tests/integration/test_lists/test-db/l0_l40s.yml— C1–C3Per-PR pre-merge time recovered: ~980 s (L40S) + ~1934 s (8× DGX_B200) + ~510 s
(DGX_H100 + single-GPU DGX_B200 VideoMME slices).
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.