[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI by Wanli-Jiang · Pull Request #15615 · NVIDIA/TensorRT-LLM

Wanli-Jiang · 2026-06-25T06:21:56Z

Multimodal Accuracy Tests — Change Summary

Trim duplicated / dead / over-budget multimodal accuracy tests from pre-merge CI.
Runtimes = OpenSearch L0 CI, 45-day avg. No input modality (image/video/audio)
coverage is lost — every change keeps a cheaper or equivalent guard (logit-match
unittest, EPD test, post-merge, or QA).

ID	Test (Class::variant)	Change	Stage before → after	Runtime/PR	Why
A1	`TestNanoV3Omni::test_auto_dtype[fp8]`	trim VideoMME sub-task	pre-merge (H100) — kept	779 s → ~505 s	VideoMME duplicated by EPD test for same model+quant; keep MMMU+VoxPopuli
A1	`TestNanoV3Omni::test_auto_dtype[nvfp4]`	trim VideoMME sub-task	pre-merge (B200) — kept	379 s → ~140 s	same; video stays covered by EPD
A2	`TestKimiK25::test_nvfp4[dep8]`	demote	pre-merge → post-merge (8× B200)	1934 s	most expensive test (~32 min) every PR; low per-PR value; stays in post-merge + QA
B1	`TestNemotron_Nano_12B_V2_VL::test_auto_dtype[full_budget]`	delete param	listed nowhere → removed	0 s	dead variant; `forced_chunked_prefill` sibling covers the model
B2	`TestQwen3VL::test_auto_dtype[full_budget]`	delete param	listed nowhere → removed	0 s	dead variant; sibling covers the model
B3	`TestMistralSmall24B::test_auto_dtype[full_budget]`	delete param	listed nowhere → removed	0 s	dead variant; sibling covers the model
B4	`TestKimiK25::test_nvfp4[tp8] / [tp8_attn_dp] / [ep8]`	delete params (mm file)	listed nowhere → removed	0 s	dead in mm file; text-path namesake in `test_llm_api_pytorch.py` runs these
C1	`TestQwen2_5_VL_7B::test_auto_dtype`	remove from pre-merge	pre-merge (L40S) → QA only	689 s	"MMMU sanity" overlaps cheaper logit-match unittest (109 s) on same stage; QA retains MMMU
C2	`TestVILA1_5_3B::test_auto_dtype`	remove from pre-merge	pre-merge (L40S) → QA only	291 s	same; logit-match `modeling_vila` (38 s) stays pre-merge; QA retains MMMU
C3	`examples/test_multimodal.py::...[video-neva]` + `[kosmos-2]`	remove from pre-merge	pre-merge (L40S) → removed	0 s	legacy TRT workflow; 0 runs in 120 days; PyTorch serving e2e tests retained

Files touched

tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py — A1, B1–B4
tests/integration/test_lists/test-db/l0_dgx_b200.yml — A2
tests/integration/test_lists/test-db/l0_l40s.yml — C1–C3

Per-PR pre-merge time recovered: ~980 s (L40S) + ~1934 s (8× DGX_B200) + ~510 s
(DGX_H100 + single-GPU DGX_B200 VideoMME slices).

Summary by CodeRabbit

Tests
- Updated multimodal and accuracy test coverage to focus on the currently supported evaluation scenarios.
- Simplified several benchmark runs to use a single, standard configuration, reducing redundant test variants.
- Adjusted test scheduling so multimodal checks run in the appropriate pipeline stage.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…pre-merge CI Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang · 2026-06-25T06:36:10Z

/bot run --disable-fail-fast

coderabbitai · 2026-06-25T06:39:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4d9385aa-1de5-4ae2-a812-3260166417df

📥 Commits

Reviewing files that changed from the base of the PR and between 70a7528 and c815cd7.

📒 Files selected for processing (3)

tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
tests/integration/test_lists/test-db/l0_dgx_b200.yml
tests/integration/test_lists/test-db/l0_l40s.yml

💤 Files with no reviewable changes (1)

tests/integration/test_lists/test-db/l0_l40s.yml

📝 Walkthrough

Walkthrough

The PR removes VideoMME-related multimodal accuracy coverage, narrows several parametrized test cases to single configurations, and adjusts integration test lists for DGX B200 and L40S.

Changes

Multimodal test coverage updates

Layer / File(s)	Summary
VideoMME cleanup `tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`	The module stops importing VideoMME, deletes its task-spec setup, and removes VideoMME from NanoV3Omni task-spec tuples.
Multimodal parameter narrowing `tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`	Chunked-prefill, KimiK25 NVFP4, and MistralSmall24B parametrizations are reduced to single configurations.
Integration test list updates `tests/integration/test_lists/test-db/l0_dgx_b200.yml`, `tests/integration/test_lists/test-db/l0_l40s.yml`	The DGX B200 and L40S lists change multimodal entries for KimiK25 and remove two L40S test cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

lancelly
YihuiLu512
sunnyqgg

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise, ticketed, and accurately summarizes the main change: trimming multimodal accuracy tests from pre-merge CI.
Description check	✅ Passed	The description is detailed and on-topic, covering the change summary, affected files, and checklist, though explicit Test Coverage text is missing.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

tensorrt-cicd · 2026-06-25T06:42:27Z

PR_Github #55721 [ run ] triggered by Bot. Commit: c815cd7 Link to invocation

2ez4bz · 2026-06-25T06:51:40Z

    skip_pre_blackwell,
    skip_pre_hopper,
 )
-from .accuracy_core import MMMU, LlmapiAccuracyTestHarness, VideoMME, VoxPopuli


Can we keep VideoMME? We have zero video coverage otherwise.

tensorrt-cicd · 2026-06-25T11:35:52Z

PR_Github #55721 [ run ] completed with state SUCCESS. Commit: c815cd7
/LLM/main/L0_MergeRequest_PR pipeline #44624 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

github-actions Bot assigned Wanli-Jiang Jun 25, 2026

[None][test] Trim duplicated and dead multimodal accuracy tests from …

c815cd7

…pre-merge CI Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>

Wanli-Jiang force-pushed the user/williamj/trim-mm-tests branch from 3d6851a to c815cd7 Compare June 25, 2026 06:32

Wanli-Jiang changed the title ~~[None][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI~~ [TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI Jun 25, 2026

Wanli-Jiang marked this pull request as ready for review June 25, 2026 06:35

Wanli-Jiang requested review from a team as code owners June 25, 2026 06:35

Wanli-Jiang requested review from moraxu and yechank-nvidia June 25, 2026 06:35

Wanli-Jiang requested a review from 2ez4bz June 25, 2026 06:37

2ez4bz reviewed Jun 25, 2026

View reviewed changes

2ez4bz approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615

[TRTLLM-13613][test] Trim duplicated and dead multimodal accuracy tests from pre-merge CI#15615
Wanli-Jiang wants to merge 1 commit into
NVIDIA:mainfrom
Wanli-Jiang:user/williamj/trim-mm-tests

Wanli-Jiang commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Wanli-Jiang commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

2ez4bz Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Wanli-Jiang commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multimodal Accuracy Tests — Change Summary

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Wanli-Jiang commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

2ez4bz Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Wanli-Jiang commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading