chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing#1666
chore(sweep): re-run MiniMax-M2.5 vLLM sweeps for motniroing#1666arygupt wants to merge 3 commits into
Conversation
Re-runs the MiniMax-M2.5 single-node vLLM configs (H100/H200 FP8, B200/B300/MI355X FP4) with no recipe change, so the new rows carry the per-GPU power telemetry (avg_power_w) added in #1558. The power/energy canvas currently models power because its source rows predate the 2026-05-27 capture merge; this re-run lets it use measured power. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| description: | ||
| - "Re-run MiniMax-M2.5 single-node vLLM sweeps (H100/H200 FP8, B200/B300/MI355X FP4) with no recipe change, to capture per-GPU power telemetry (avg_power_w) added in #1558 for the power/energy canvas" | ||
| - "Source rows for the canvas predate the 2026-05-27 power-capture merge, so they carry throughput/latency but no measured power; this re-run replaces the modeled power layer with measured power" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1666 |
There was a problem hiding this comment.
🔴 The new changelog entry's pr-link is set to https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — a literal XXX placeholder rather than the actual PR number. The PR description references pull/1666 and every other entry in this file resolves to a real PR number; please replace XXX with 1666 before merge so the canvas re-run rows remain traceable.
Extended reasoning...
What the bug is
perf-changelog.yaml:3487 (the only line added by this PR's pr-link: field) literally reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXThe XXX is a leftover template placeholder that was never substituted with the real PR number (1666).
How it manifests
Every other entry in perf-changelog.yaml resolves to a numeric PR — e.g. /pull/1544 at line 3476, and /pull/1648, /pull/1663, /pull/1647 in nearby blocks. This entry is the only one whose link does not resolve. As shipped, anyone clicking the link from a canvas row that originated in this sweep would get a 404, and any tooling that joins changelog rows back to their originating PR (for traceability or audit) will see an unparseable PR id.
Code path that triggers it
This is a pure data/config bug — the row is appended verbatim to perf-changelog.yaml, which is the authoritative changelog for sweep triggers. The placeholder is in the field that downstream tooling (and humans) use to map a sweep back to the PR that armed it. Because the sweep itself is armed by config-keys/description, the bad pr-link will not block execution, so it will silently land on main.
Why existing code doesn't prevent it
There is no schema validator on pr-link requiring a numeric PR id, and utils/process_changelog.py (mentioned in the PR description as the local validator) keys on config-keys, not the link. The author validated processing but not link well-formedness, so the placeholder slipped through.
Impact
Traceability is broken for the five canvas re-run rows generated by this entry. A follow-up cleanup PR will be required to replace XXX with 1666 (or any future correct number) — exactly the kind of trivial follow-up that wastes a review cycle when it could be caught here.
Fix
Change line 3487 from:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXto:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1666Step-by-step proof
- Read
perf-changelog.yamlat lines 3474–3487 on the PR's HEAD (commitc772387). - Line 3476 (prior entry) ends in
/pull/1544— a valid PR id. - Lines 3478–3487 are the new entry added by this PR.
- Line 3487 ends in
/pull/XXX— a literal three-character placeholder, not a number. - The PR description explicitly states the canvas should point to
pull/1666("the canvas re-points to the new dump…"), and this PR is itself#1666, confirming the intended value is1666. - Conclusion: the placeholder was never substituted before commit, and will be merged as-is unless fixed.
…e-run Adds `benchmarks-only: true` to a changelog entry to skip the eval pass (symmetric with the existing `evals-only`; the two are mutually exclusive). Power telemetry comes from the benchmark runs, so the MiniMax power re-run doesn't need evals — sets the flag, dropping 14 unnecessary eval runs. - validation.py: new `benchmarks_only` field + mutual-exclusion validator - process_changelog.py: skip eval generation when benchmarks_only is set - test_validation.py: ChangelogEntry coverage (aliases, exclusivity, forbid) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26979891411 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26980953103 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26980953103 |
Why
The power/energy canvas currently models per-GPU power because its source rows predate the power-capture merge (#1558, merged 2026-05-27). Those MiniMax-M2.5 runs carry throughput / interactivity / latency but no measured power (
avg_power_w).This PR re-runs the exact same configs (no recipe change) on current
main, so the new rows land with measured power telemetry. The canvas can then swap its modeled power layer for measured.What
Adds one
perf-changelog.yamlentry arming a full sweep of the five canvas configs:minimaxm2.5-fp8-h100-vllmminimaxm2.5-fp8-h200-vllmminimaxm2.5-fp4-b200-vllmminimaxm2.5-fp4-b300-vllmminimaxm2.5-fp4-mi355x-vllmNo recipe/code changes — changelog-only. Locally validated with
utils/process_changelog.py: generates 107 single-node runs (b200:26, b300:23, mi355x:28, h100:12, h200:18) across1k1k+8k1kseq-len groups.Downstream
Once this sweep completes, the rows publish via the weekly DB dump (unblocked by InferenceX-app#418, which fixes the 2 GiB asset cap), and the canvas re-points to the new dump to use measured power.
🤖 Generated with Claude Code
Note
Low Risk
Changelog-only sweep trigger plus small validation/processing flags; no inference recipes or runtime benchmark logic changed beyond skipping eval jobs when flagged.
Overview
Adds a benchmarks-only changelog path so power re-runs can schedule throughput sweeps without lm-eval jobs, and arms a MiniMax-M2.5 re-run across five single-node vLLM configs to backfill measured
avg_power_wfor the power/energy canvas.Changelog plumbing:
ChangelogEntrygainsbenchmarks-only(YAML alias), defaultfalse, mutually exclusive with existingevals-only.process_changelog.pyskips the eval-generation pass when that flag is set; benchmarks still run with--no-evalsas today.Sweep entry: New
perf-changelog.yamlblock targetsminimaxm2.5-fp8-h100-vllm,minimaxm2.5-fp8-h200-vllm,minimaxm2.5-fp4-b200-vllm,minimaxm2.5-fp4-b300-vllm, andminimaxm2.5-fp4-mi355x-vllmwith no recipe changes—only re-execution so rows pick up power telemetry from #1558.Tests:
TestChangelogEntrycovers defaults, alias mapping, mutual exclusion, andextra=forbidon typos.Reviewed by Cursor Bugbot for commit 5fa9848. Bugbot is set up for automated code reviews on this repo. Configure here.