chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing by arygupt · Pull Request #1666 · SemiAnalysisAI/InferenceX

arygupt · 2026-06-04T21:11:44Z

Why

The power/energy canvas currently models per-GPU power because its source rows predate the power-capture merge (#1558, merged 2026-05-27). Those MiniMax-M2.5 runs carry throughput / interactivity / latency but no measured power (avg_power_w).

This PR re-runs the exact same configs (no recipe change) on current main, so the new rows land with measured power telemetry. The canvas can then swap its modeled power layer for measured.

What

Adds one perf-changelog.yaml entry arming a full sweep of the five canvas configs:

config-key	HW	precision
`minimaxm2.5-fp8-h100-vllm`	H100	FP8
`minimaxm2.5-fp8-h200-vllm`	H200	FP8
`minimaxm2.5-fp4-b200-vllm`	B200	FP4
`minimaxm2.5-fp4-b300-vllm`	B300	FP4
`minimaxm2.5-fp4-mi355x-vllm`	MI355X	FP4

No recipe/code changes — changelog-only. Locally validated with utils/process_changelog.py: generates 107 single-node runs (b200:26, b300:23, mi355x:28, h100:12, h200:18) across 1k1k + 8k1k seq-len groups.

Downstream

Once this sweep completes, the rows publish via the weekly DB dump (unblocked by InferenceX-app#418, which fixes the 2 GiB asset cap), and the canvas re-points to the new dump to use measured power.

🤖 Generated with Claude Code

Note

Low Risk
Changelog-only sweep trigger plus small validation/processing flags; no inference recipes or runtime benchmark logic changed beyond skipping eval jobs when flagged.

Overview
Adds a benchmarks-only changelog path so power re-runs can schedule throughput sweeps without lm-eval jobs, and arms a MiniMax-M2.5 re-run across five single-node vLLM configs to backfill measured avg_power_w for the power/energy canvas.

Changelog plumbing: ChangelogEntry gains benchmarks-only (YAML alias), default false, mutually exclusive with existing evals-only. process_changelog.py skips the eval-generation pass when that flag is set; benchmarks still run with --no-evals as today.

Sweep entry: New perf-changelog.yaml block targets minimaxm2.5-fp8-h100-vllm, minimaxm2.5-fp8-h200-vllm, minimaxm2.5-fp4-b200-vllm, minimaxm2.5-fp4-b300-vllm, and minimaxm2.5-fp4-mi355x-vllm with no recipe changes—only re-execution so rows pick up power telemetry from #1558.

Tests: TestChangelogEntry covers defaults, alias mapping, mutual exclusion, and extra=forbid on typos.

^{Reviewed by Cursor Bugbot for commit 5fa9848. Bugbot is set up for automated code reviews on this repo. Configure here.}

Re-runs the MiniMax-M2.5 single-node vLLM configs (H100/H200 FP8, B200/B300/MI355X FP4) with no recipe change, so the new rows carry the per-GPU power telemetry (avg_power_w) added in #1558. The power/energy canvas currently models power because its source rows predate the 2026-05-27 capture merge; this re-run lets it use measured power. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-04T21:11:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-04T21:11:53Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-06-04T21:17:17Z

+  description:
+    - "Re-run MiniMax-M2.5 single-node vLLM sweeps (H100/H200 FP8, B200/B300/MI355X FP4) with no recipe change, to capture per-GPU power telemetry (avg_power_w) added in #1558 for the power/energy canvas"
+    - "Source rows for the canvas predate the 2026-05-27 power-capture merge, so they carry throughput/latency but no measured power; this re-run replaces the modeled power layer with measured power"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1666


🔴 The new changelog entry's pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — a literal XXX placeholder rather than the actual PR number. The PR description references pull/1666 and every other entry in this file resolves to a real PR number; please replace XXX with 1666 before merge so the canvas re-run rows remain traceable.

Extended reasoning...

What the bug is

perf-changelog.yaml:3487 (the only line added by this PR's pr-link: field) literally reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The XXX is a leftover template placeholder that was never substituted with the real PR number (1666).

How it manifests

Every other entry in perf-changelog.yaml resolves to a numeric PR — e.g. /pull/1544 at line 3476, and /pull/1648, /pull/1663, /pull/1647 in nearby blocks. This entry is the only one whose link does not resolve. As shipped, anyone clicking the link from a canvas row that originated in this sweep would get a 404, and any tooling that joins changelog rows back to their originating PR (for traceability or audit) will see an unparseable PR id.

Code path that triggers it

This is a pure data/config bug — the row is appended verbatim to perf-changelog.yaml, which is the authoritative changelog for sweep triggers. The placeholder is in the field that downstream tooling (and humans) use to map a sweep back to the PR that armed it. Because the sweep itself is armed by config-keys/description, the bad pr-link will not block execution, so it will silently land on main.

Why existing code doesn't prevent it

There is no schema validator on pr-link requiring a numeric PR id, and utils/process_changelog.py (mentioned in the PR description as the local validator) keys on config-keys, not the link. The author validated processing but not link well-formedness, so the placeholder slipped through.

Impact

Traceability is broken for the five canvas re-run rows generated by this entry. A follow-up cleanup PR will be required to replace XXX with 1666 (or any future correct number) — exactly the kind of trivial follow-up that wastes a review cycle when it could be caught here.

Fix

Change line 3487 from:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

to:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1666

Step-by-step proof

Read perf-changelog.yaml at lines 3474–3487 on the PR's HEAD (commit c772387).

Line 3476 (prior entry) ends in /pull/1544 — a valid PR id.

Lines 3478–3487 are the new entry added by this PR.

Line 3487 ends in /pull/XXX — a literal three-character placeholder, not a number.

The PR description explicitly states the canvas should point to pull/1666 ("the canvas re-points to the new dump…"), and this PR is itself #1666, confirming the intended value is 1666.

Conclusion: the placeholder was never substituted before commit, and will be merged as-is unless fixed.

…e-run Adds `benchmarks-only: true` to a changelog entry to skip the eval pass (symmetric with the existing `evals-only`; the two are mutually exclusive). Power telemetry comes from the benchmark runs, so the MiniMax power re-run doesn't need evals — sets the flag, dropping 14 unnecessary eval runs. - validation.py: new `benchmarks_only` field + mutual-exclusion validator - process_changelog.py: skip eval generation when benchmarks_only is set - test_validation.py: ChangelogEntry coverage (aliases, exclusivity, forbid) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-04T21:35:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26979891411
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26979891411

github-actions · 2026-06-04T22:23:01Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26980953103
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26980953103

github-actions · 2026-06-04T22:49:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26980953103
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26980953103

arygupt requested a review from a team June 4, 2026 21:11

github-project-automation Bot added this to InferenceMAX Board Jun 4, 2026

chore(sweep): point changelog pr-link at #1666

b271a56

arygupt added the full-sweep-enabled label Jun 4, 2026

claude Bot reviewed Jun 4, 2026

View reviewed changes

functionstackx changed the title ~~chore(sweep): re-run MiniMax-M2.5 vLLM sweeps to capture power telemetry~~ chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing#1666

chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing#1666
arygupt wants to merge 3 commits into
mainfrom
chore/recapture-minimax-power-canvas

arygupt commented Jun 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

claude Bot Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arygupt commented Jun 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Downstream

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

claude Bot Jun 4, 2026

Choose a reason for hiding this comment

What the bug is

How it manifests

Code path that triggers it

Why existing code doesn't prevent it

Impact

Fix

Step-by-step proof

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arygupt commented Jun 4, 2026 •

edited by cursor Bot

Loading