Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) by Oseltamivir · Pull Request #1636 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-01T21:27:00Z

Points both B300 DSv4 TRT configs at the official NVIDIA release image and adds the MTP sibling to the sweep:

dsv4-fp4-b300-trt (non-MTP): feat-deepseek_v4-2dd03e6 → nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1
dsv4-fp4-b300-trt-mtp (MTP): feat-deepseek_v4-9aa3715 → nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1

This drops the custom ghcr.io semianalysis feat/deepseek_v4 builds in favor of the official RC, to evaluate whether the official image can serve DeepSeek-V4-Pro (non-MTP and MTP). The non-MTP launcher's TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 workaround (specific to the custom build) is removed so the official image runs with its native behavior, matching the MTP launcher which never had it.

Known risk

A prior run of 1.3.0rc15.post1 with attention-DP (dpa=true) served a couple of iterations and then crashed with CUDA_ERROR_ILLEGAL_ADDRESS in kv_cache_manager.free_resources (run 26786937394) — a different failure from the custom build's SWA-scratch-revert crash. So dpa=true jobs may still fail on the official image; the pure-TP (dpa=false) cases are more likely to pass. MTP on the official RC is untested. This sweep is what tells us where it stands.

Scope

B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715); its OOM follow-up is tracked separately.

🤖 Generated with Claude Code

Note

Low Risk
Benchmark config and launcher env only; no application auth or data paths, though DPA jobs may still hit known GPU crashes on the official image.

Overview
B300 DeepSeek-V4-Pro TensorRT-LLM benchmarks (dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp) now use the official NVIDIA image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1 instead of custom ghcr.io/semianalysisai/trtllm-deepseek-v4 tags.

Both dsv4_fp4_b300_trt.sh and dsv4_fp4_b300_trt_mtp.sh export TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE (default on via :-1) and log it so sweeps can turn SWA scratch reuse off and compare rc15.post1 DPA crashes against the old custom-build SWA-scratch failure. perf-changelog.yaml documents the image switch (and related launcher behavior).

B200 TRT configs are unchanged.

^{Reviewed by Cursor Bugbot for commit 285b79a. Bugbot is set up for automated code reviews on this repo. Configure here.}

…03e6 Bumps the TensorRT-LLM DeepSeek-V4-Pro image for dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6. The -mtp variants are intentionally left on feat-deepseek_v4-9aa3715. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T21:27:08Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-01T21:27:08Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T21:27:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26783090679
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26783090679

functionstackx · 2026-06-01T21:47:50Z


 dsv4-fp4-b200-trt:
-  image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-9aa3715
+  image: ghcr.io#semianalysisai/trtllm-deepseek-v4:feat-deepseek_v4-2dd03e6


is there any official nvidia RC that works...

Image is from dsv4 branch: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/deepseek_v4

Main dsv4 failing DPA: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26786937394

github-actions · 2026-06-01T22:33:04Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26783097365
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26783097365

github-actions · 2026-06-01T22:33:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786056973
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786056973

github-actions · 2026-06-01T22:37:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786107993
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786107993

… (non-MTP) Swap dsv4-fp4-b200-trt and dsv4-fp4-b300-trt from the custom ghcr.io semianalysis feat/deepseek_v4 build to the official nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 to test whether the official RC can serve DeepSeek-V4-Pro. The -mtp variants are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T01:18:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26786937394
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26786937394

…on-MTP) The official nvcr.io tensorrt-llm/release:1.3.0rc15.post1 loads DSv4-Pro but its DP-attention path deadlocks/crashes under concurrent load (every dpa=true job hung or failed; only pure-TP conc-1 points passed). Revert to the stable custom build until upstream fixes DSv4 + attention-DP (NVIDIA/TensorRT-LLM#13431). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T06:57:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26803566770
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26803566770

github-actions · 2026-06-02T08:01:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26803566770
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26803566770

Bump dsv4-fp4-b200-trt and dsv4-fp4-b300-trt to ghcr.io#semianalysisai/trtllm-deepseek-v4:fix-dsv4-swa-scratch-revert-shrink-c914d6d (TRT-LLM feat/deepseek_v4 @ 084cf2ba + kv_cache_manager_v2 fix). This resolves the engine crash on attention-DP context/generation reverts at high concurrency (the b300 8k1k conc>=512 "LLM is shutting down" hang). The -mtp variants stay on feat-deepseek_v4-9aa3715. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-2dd03e6

…reuse The c914d6d image's kv_cache_manager_v2 patch was wrong: freeing SWA scratch slots on the attention-DP revert->resize(shrink) path hits finish_event=None (a deferred request never forwarded), crashing every dpa=true job and hanging the engine. Root cause is a V2-scheduler / SWA-scratch-reuse conflict: the V2 scheduler grows a context request's KV cache (incl. SWA scratch) before delay batching can defer it, so revert_allocate_context -> resize(shrink) must release scratch slots that have no finish_event. Revert both non-MTP images to feat-deepseek_v4-2dd03e6 and set TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in the launchers so no scratch slots are allocated and the revert shrinks cleanly. MTP configs untouched (9aa3715). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T19:59:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-02T22:17:01Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-03T00:09:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

github-actions · 2026-06-03T19:13:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26843313476
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26843313476

…-2dd03e6

B200 reverts to feat-deepseek_v4-9aa3715: the 2dd03e6 image OOMs on B200's smaller HBM at conc-256 once SWA scratch reuse is disabled. Only B300 moves to 2dd03e6 + TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 in its launcher. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T21:14:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26912996470
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26912996470

…TP + MTP) Point dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp at the official nvcr.io#nvidia/tensorrt-llm/release:1.3.0rc15.post1 (from the custom feat/deepseek_v4 builds 2dd03e6 / 9aa3715) and drop the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround so the official image runs with native behavior. B200 TRT unchanged (9aa3715). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T23:41:29Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26914210927
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26914210927

github-actions · 2026-06-04T06:07:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26914210927
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26914210927

…PA crash The previous sweep crashed all dpa=true jobs with CUDA_ERROR_ILLEGAL_ADDRESS on rc15.post1 without the SWA scratch workaround. Re-add TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 to both B300 TRT launchers (non-MTP and MTP) to determine whether the DPA crash is the same SWA-scratch bug or a separate FMHA kernel issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor · 2026-06-05T04:45:10Z

+    - dsv4-fp4-b300-trt-mtp
+  description:
+    - "Point the B300 TensorRT-LLM DeepSeek-V4-Pro configs (non-MTP dsv4-fp4-b300-trt and MTP dsv4-fp4-b300-trt-mtp) at the official NVIDIA release image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1, replacing the custom ghcr.io semianalysis feat/deepseek_v4 builds (2dd03e6 and 9aa3715 respectively), to evaluate the official RC for DeepSeek-V4-Pro. Also drops the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround (specific to the custom build) so the official image runs with its native behavior. B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715)."
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636


Duplicate PR changelog entries

Low Severity

This commit adds two separate perf-changelog.yaml blocks for the same PR link and the same dsv4-fp4-b300-trt / dsv4-fp4-b300-trt-mtp config keys, with conflicting descriptions. That duplicates maintenance and leaves readers unsure which entry reflects the shipped change.

Additional Locations (1)

perf-changelog.yaml#L3454-L3460

^{Reviewed by Cursor Bugbot for commit c2381b7. Configure here.}

github-actions · 2026-06-05T09:51:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26999118817
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26999118817

…A crash) TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 did not fix the DPA crash on rc15.post1 — the CUDA_ERROR_ILLEGAL_ADDRESS persists during warmup. This is a separate bug from the SWA scratch revert issue. Default back to 1 (native behavior). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-06-05T19:07:22Z

+- config-keys:
+    - dsv4-fp4-b300-trt
+    - dsv4-fp4-b300-trt-mtp
+  description:
+    - "Point the B300 TensorRT-LLM DeepSeek-V4-Pro configs (non-MTP dsv4-fp4-b300-trt and MTP dsv4-fp4-b300-trt-mtp) at the official NVIDIA release image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1, replacing the custom ghcr.io semianalysis feat/deepseek_v4 builds (2dd03e6 and 9aa3715 respectively), to evaluate the official RC for DeepSeek-V4-Pro. Also drops the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround (specific to the custom build) so the official image runs with its native behavior. B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715)."
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636


🟡 Duplicate and contradictory perf-changelog entries for PR #1636: two entries cover the same config-keys (dsv4-fp4-b300-trt and dsv4-fp4-b300-trt-mtp) with contradictory descriptions. The first entry (lines 3390-3395) claims the PR "drops the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround ... so the official image runs with its native behavior", but the same PR actually ADDS that export with default 0 to both launchers (dsv4_fp4_b300_trt.sh L62-65 and dsv4_fp4_b300_trt_mtp.sh L61-64). The second entry (lines 3455-3460) is correct; please remove the first (stale) entry so the changelog has a single, accurate source of truth.

Extended reasoning...

What the bug is

perf-changelog.yaml ends up with two distinct entries for the same PR (#1636) and the same config-keys (dsv4-fp4-b300-trt, dsv4-fp4-b300-trt-mtp), and the two entries say opposite things about whether the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround is shipped.

Entry A (perf-changelog.yaml:3390-3395): "Point the B300 TensorRT-LLM DeepSeek-V4-Pro configs ... at the official NVIDIA release image ... Also drops the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround (specific to the custom build) so the official image runs with its native behavior. B200 TRT is unchanged."

Entry B (perf-changelog.yaml:3455-3460): "Switch B300 DSv4 TRT (non-MTP + MTP) to official rc15.post1 image with TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 to test whether DPA crash is same SWA-scratch bug".

Why entry A is wrong

The same PR diff adds the export to both launcher scripts with default 0:

# benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh:62-65 export TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE="${TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE:-0}" echo "TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE: $TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE" # benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh:61-64 (identical)

Because the parameter expansion ${VAR:-0} defaults to 0 when unset, sweeps using the official rc15.post1 image still disable SWA scratch reuse — the workaround is kept, not dropped. The recent commit c2381b7 ("Re-add TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 on rc15.post1 to test DPA crash") confirms this is the intended shipped behavior, matching Entry B and contradicting Entry A.

Impact

Doc-only — the runtime behavior is unambiguously the one described in Entry B (the workaround is on). But the changelog is the project's audit trail: any human or tool that consumes it to ask "what did PR #1636 ship for dsv4-fp4-b300-trt?" now gets two contradictory answers for the same config-keys. Future debugging of the rc15.post1 DPA crash will be harder because the changelog suggests the workaround was removed when it actually was retained — exactly the opposite of the diagnostic intent.

Fix

Delete Entry A (perf-changelog.yaml:3389-3395). Entry B already covers the same configs with the accurate description. Independently flagged by Cursor Bugbot in two comments on this PR: Duplicate PR changelog entries (Low) and Changelog contradicts launcher workaround (Medium).

Step-by-step proof

Open perf-changelog.yaml, scroll to line 3389. Entry A lists pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636 with config-keys: [dsv4-fp4-b300-trt, dsv4-fp4-b300-trt-mtp].

Scroll to line 3454. Entry B lists the same pr-link and the same config-keys.

Read Entry A's description — it says the SWA workaround is dropped.

Read Entry B's description — it says the SWA workaround is set to 0 to test DPA crash.

Open benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh at line 62. Observe export TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE="${TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE:-0}" — defaults to 0.

Same in dsv4_fp4_b300_trt_mtp.sh at line 61.

Therefore the shipped behavior matches Entry B; Entry A is stale and should be removed.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 65bddbb. Configure here.}

cursor · 2026-06-05T19:07:57Z


+# Disable DSv4 SWA scratch reuse to test whether the rc15.post1 DPA crash
+# is the same SWA-scratch bug or a separate FMHA kernel issue.
+export TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE="${TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE:-1}"


SWA default contradicts disable comment

Medium Severity

The new launcher comments say DSv4 SWA scratch reuse is disabled to isolate the rc15.post1 DPA crash, but TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE defaults to 1 when unset, which keeps scratch reuse enabled. Sweep runs without an override therefore do not match the stated experiment.

Additional Locations (1)

benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh#L60-L63

^{Reviewed by Cursor Bugbot for commit 65bddbb. Configure here.}

github-actions · 2026-06-05T19:16:13Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27034741001
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27034741001

Oseltamivir requested a review from a team June 1, 2026 21:27

Oseltamivir requested review from jgangani and kedarpotdar-nv as code owners June 1, 2026 21:27

github-project-automation Bot added this to InferenceMAX Board Jun 1, 2026

Oseltamivir added the full-sweep-enabled label Jun 1, 2026

Backfill PR number in changelog pr-link

6b7558c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

functionstackx reviewed Jun 1, 2026

View reviewed changes

functionstackx requested changes Jun 1, 2026

View reviewed changes

Oseltamivir added sweep-enabled and removed full-sweep-enabled labels Jun 1, 2026

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

bd3c94c

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6~~ Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP) Jun 1, 2026

Oseltamivir changed the title ~~Try official TRT-LLM release image 1.3.0rc15.post1 for DSv4 B200/B300 (non-MTP)~~ Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6 Jun 2, 2026

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml Outdated

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

1b0afeb

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to feat-deepseek_v4-2dd03e6~~ Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d) Jun 2, 2026

Oseltamivir and others added 2 commits June 2, 2026 12:32

Merge remote-tracking branch 'origin/main' into update-dsv4-trt-image…

242ab88

…-2dd03e6

Oseltamivir force-pushed the update-dsv4-trt-image-2dd03e6 branch from 1f70cac to e23a541 Compare June 2, 2026 19:34

Oseltamivir and others added 2 commits June 3, 2026 14:03

Merge remote-tracking branch 'origin/main' into update-dsv4-trt-image…

6118a76

…-2dd03e6

Oseltamivir changed the title ~~Update DSv4 TRT image for B200/B300 (non-MTP) to the SWA-scratch-fix build (c914d6d)~~ Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse Jun 3, 2026

Oseltamivir added full-sweep-enabled and removed sweep-enabled labels Jun 3, 2026

Oseltamivir changed the title ~~Update DSv4 TRT image for B300 (non-MTP) to 2dd03e6 + disable SWA scratch reuse~~ Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) Jun 3, 2026

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

b09619e

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread perf-changelog.yaml

Oseltamivir closed this Jun 5, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 5, 2026

Oseltamivir reopened this Jun 5, 2026

claude Bot reviewed Jun 5, 2026

View reviewed changes

cursor Bot reviewed Jun 5, 2026

View reviewed changes

Merge branch 'main' into update-dsv4-trt-image-2dd03e6

285b79a

Conversation

Oseltamivir commented Jun 1, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Known risk

Scope

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

functionstackx Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Oseltamivir Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

Duplicate PR changelog entries

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

claude Bot Jun 5, 2026

Choose a reason for hiding this comment

What the bug is

Why entry A is wrong

Impact

Fix

Step-by-step proof

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 5, 2026

Choose a reason for hiding this comment

SWA default contradicts disable comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented Jun 1, 2026 •

edited by cursor Bot

Loading