Skip to content

[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615

Open
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/trim-mm-tests
Open

[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/trim-mm-tests

Conversation

@Wanli-Jiang

@Wanli-Jiang Wanli-Jiang commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Multimodal Accuracy Tests — Change Summary

Trim duplicated / dead / over-budget multimodal accuracy tests from pre-merge CI.
Runtimes = OpenSearch L0 CI, 45-day avg. No input modality (image/video/audio)
coverage is lost
— every change keeps a cheaper or equivalent guard (logit-match
unittest, EPD test, post-merge, or QA).

ID Test (Class::variant) Change Stage before → after Runtime/PR Why
A1 TestNanoV3Omni::test_auto_dtype[fp8] trim VideoMME sub-task pre-merge (H100) — kept 779 s → ~505 s VideoMME duplicated by EPD test for same model+quant; keep MMMU+VoxPopuli
A1 TestNanoV3Omni::test_auto_dtype[nvfp4] trim VideoMME sub-task pre-merge (B200) — kept 379 s → ~140 s same; video stays covered by EPD
A2 TestKimiK25::test_nvfp4[dep8] demote pre-merge → post-merge (8× B200) 1934 s most expensive test (~32 min) every PR; low per-PR value; stays in post-merge + QA
B1 TestNemotron_Nano_12B_V2_VL::test_auto_dtype[full_budget] delete param listed nowhere → removed 0 s dead variant; forced_chunked_prefill sibling covers the model
B2 TestQwen3VL::test_auto_dtype[full_budget] delete param listed nowhere → removed 0 s dead variant; sibling covers the model
B3 TestMistralSmall24B::test_auto_dtype[full_budget] delete param listed nowhere → removed 0 s dead variant; sibling covers the model
B4 TestKimiK25::test_nvfp4[tp8] / [tp8_attn_dp] / [ep8] delete params (mm file) listed nowhere → removed 0 s dead in mm file; text-path namesake in test_llm_api_pytorch.py runs these
C1 TestQwen2_5_VL_7B::test_auto_dtype remove from pre-merge pre-merge (L40S) → QA only 689 s "MMMU sanity" overlaps cheaper logit-match unittest (109 s) on same stage; QA retains MMMU
C2 TestVILA1_5_3B::test_auto_dtype remove from pre-merge pre-merge (L40S) → QA only 291 s same; logit-match modeling_vila (38 s) stays pre-merge; QA retains MMMU
C3 examples/test_multimodal.py::...[video-neva] + [kosmos-2] remove from pre-merge pre-merge (L40S) → removed 0 s legacy TRT workflow; 0 runs in 120 days; PyTorch serving e2e tests retained

Files touched

  • tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py — A1, B1–B4
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml — A2
  • tests/integration/test_lists/test-db/l0_l40s.yml — C1–C3

Per-PR pre-merge time recovered: ~980 s (L40S) + ~1934 s (8× DGX_B200) + ~510 s
(DGX_H100 + single-GPU DGX_B200 VideoMME slices).

Summary by CodeRabbit

  • Tests
    • Updated multimodal and accuracy test coverage to focus on the currently supported evaluation scenarios.
    • Simplified several benchmark runs to use a single, standard configuration, reducing redundant test variants.
    • Adjusted test scheduling so multimodal checks run in the appropriate pipeline stage.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…pre-merge CI

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
@Wanli-Jiang Wanli-Jiang force-pushed the user/williamj/trim-mm-tests branch from 3d6851a to c815cd7 Compare June 25, 2026 06:32
@Wanli-Jiang Wanli-Jiang changed the title [None][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI [TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI Jun 25, 2026
@Wanli-Jiang Wanli-Jiang marked this pull request as ready for review June 25, 2026 06:35
@Wanli-Jiang Wanli-Jiang requested review from a team as code owners June 25, 2026 06:35
@Wanli-Jiang

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@Wanli-Jiang Wanli-Jiang requested a review from 2ez4bz June 25, 2026 06:37
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4d9385aa-1de5-4ae2-a812-3260166417df

📥 Commits

Reviewing files that changed from the base of the PR and between 70a7528 and c815cd7.

📒 Files selected for processing (3)
  • tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml
  • tests/integration/test_lists/test-db/l0_l40s.yml
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/test-db/l0_l40s.yml

📝 Walkthrough

Walkthrough

The PR removes VideoMME-related multimodal accuracy coverage, narrows several parametrized test cases to single configurations, and adjusts integration test lists for DGX B200 and L40S.

Changes

Multimodal test coverage updates

Layer / File(s) Summary
VideoMME cleanup
tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
The module stops importing VideoMME, deletes its task-spec setup, and removes VideoMME from NanoV3Omni task-spec tuples.
Multimodal parameter narrowing
tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
Chunked-prefill, KimiK25 NVFP4, and MistralSmall24B parametrizations are reduced to single configurations.
Integration test list updates
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_l40s.yml
The DGX B200 and L40S lists change multimodal entries for KimiK25 and remove two L40S test cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • lancelly
  • YihuiLu512
  • sunnyqgg
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise, ticketed, and accurately summarizes the main change: trimming multimodal accuracy tests from pre-merge CI.
Description check ✅ Passed The description is detailed and on-topic, covering the change summary, affected files, and checklist, though explicit Test Coverage text is missing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55721 [ run ] triggered by Bot. Commit: c815cd7 Link to invocation

skip_pre_blackwell,
skip_pre_hopper,
)
from .accuracy_core import MMMU, LlmapiAccuracyTestHarness, VideoMME, VoxPopuli

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep VideoMME? We have zero video coverage otherwise.

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #55721 [ run ] completed with state SUCCESS. Commit: c815cd7
/LLM/main/L0_MergeRequest_PR pipeline #44624 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants