deps: switch sglang to prebuilt PyPI wheels (v0.5.11) by kajalj22 · Pull Request #2535 · NVIDIA-NeMo/RL

kajalj22 · 2026-05-20T17:30:04Z

Summary

Upgrade sglang 0.5.10 → 0.5.11 and sglang-kernel 0.4.1 → 0.4.2, switching from source builds to prebuilt PyPI wheels
Eliminates the slow sglang-kernel CUDA compilation during Docker builds
Possible because sglang v0.5.11 declares torch==2.11.0, matching our torch pin after the vLLM 0.20.0 upgrade in chore: Upgrade vLLM from 0.17.1 to 0.20.0 #2384

Dependency changes (`pyproject.toml`)

Pinned sglang==0.5.11 and sglang-kernel==0.4.2 as regular PyPI deps
Removed VCS git source entries for sglang and sglang-kernel from [tool.uv.sources]
Removed sglang-kernel from extra-build-variables, no-build-isolation-package, and extra-build-dependencies
Removed manual [[tool.uv.dependency-metadata]] sections for sglang and sglang-kernel (uv reads metadata from PyPI directly)
Updated override-dependencies comments (flashinfer now aligned between vllm and sglang at 0.6.8.post1)
Regenerated uv.lock — scikit-build-core dropped, easydict/kernels added, torchao bumped 0.9.0 → 0.17.0

Port fix (`sglang_worker.py`)

sglang v0.5.11 derives grpc_port = port + 10000 and validates <= 65535
Added retry loop to cap the server port at 55535 so grpc_port stays valid

CI / test changes

Removed SKIP_SGLANG_BUILD=1 from cicd-main.yml — sglang now installs from PyPI, no source compilation to skip
Removed python -c "import sglang" guards from test scripts (reverts c45e889) — sglang is always installed

Risks / things to verify

CUDA arch compatibility: PyPI sglang-kernel wheels should include sm_90a (Hopper) and sm_100a (Blackwell) — needs runtime validation
transformers override: sglang v0.5.11 declares transformers==5.6.0 but our override forces 5.3.0 — needs functional testing
xgrammar override: sglang v0.5.11 wants 0.1.32, we override to 0.1.33 for CVE fix
torchao bump: sglang v0.5.11 pulls in torchao==0.17.0 (up from 0.9.0) — verify no regressions

Depends on #2384 (vLLM 0.17.1 → 0.20.0 + torch 2.11.0). Merge that first.

Test plan

Build Docker image successfully (verify sglang-kernel installs from PyPI, no source compilation)
Run sglang unit tests (test_sglang_non_divisible_batch_handling etc.)
Run sglang functional/nightly tests (GRPO with sglang inference backend)
Verify CUDA kernels work on H100 (sm_90a) and B200 (sm_100a) if available

🤖 Generated with Claude Code

copy-pr-bot · 2026-05-20T17:30:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

kajalj22 · 2026-05-20T17:38:58Z

/ok to test d702b90

kajalj22 · 2026-05-21T05:14:00Z

/ok to test a0e27d8

- vLLM 0.17.1 → 0.20.0, torch 2.10 → 2.11, torchvision 0.25 → 0.26, flashinfer 0.6.4 → 0.6.8.post1 - Cap requires-python to <3.14 - Adapt to vLLM 0.20 render architecture: move prefix-token override to NeMoRLOpenAIServingRender - Wrap process_weights_after_loading in set_current_vllm_config context - Fix cloudpickle ConfigModuleInstance error (torch 2.11) - Fix Eagle3 draft weight loading: trim padded vocab embeddings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

vLLM 0.20 moved chat preprocessing to the render layer, but create_chat_completion no longer catches errors from that path. Prompts exceeding max_model_len now raise VLLMValidationError as an unhandled exception (500) instead of returning ErrorResponse (400). The Gym proxy only detects context-length overflow on 400, so the 500 crashes rollouts. Catch VLLMValidationError at the endpoint and return HTTP 400 to restore the graceful handling chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

sglang is incompatible with the vLLM 0.20 / torch 2.11 upgrade. Unconditionally set SKIP_SGLANG_BUILD=1 so the Docker build and tests skip sglang until it is updated separately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

Eagle3 draft models can have draft_vocab_size (32k) different from vocab_size (151k). The old code used a single org_vocab_size from the first VocabParallelEmbedding (embed_tokens), which meant lm_head weights were never trimmed (padded_32k < 151k → condition false), causing an assertion failure in vLLM's weight_loader. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

vLLM 0.20 added quant_config arg to ParallelLMHead in Eagle3LlamaForCausalLM.__init__, so the old_snippet no longer matched and the has_own_lm_head patch silently failed to apply. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

Replace source builds of sglang and sglang-kernel with prebuilt PyPI wheels. This is possible because sglang v0.5.11 declares torch==2.11.0, matching our torch pin after the vLLM 0.20.0 upgrade. - Bump sglang 0.5.10 → 0.5.11, sglang-kernel 0.4.1 → 0.4.2 - Remove VCS source entries from [tool.uv.sources] - Remove sglang-kernel build config (extra-build-variables, no-build-isolation, extra-build-dependencies) - Remove manual dependency-metadata for sglang and sglang-kernel (uv reads metadata from PyPI directly) - Update override-dependencies comments (flashinfer now aligned between vllm and sglang at 0.6.8.post1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

sglang v0.5.11 derives grpc_port = port + 10000 and validates it is <= 65535. Retry port selection if the OS assigns a port > 55535. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

copy-pr-bot · 2026-05-22T18:44:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Remove SKIP_SGLANG_BUILD=1 from CI — no longer needed since sglang installs from prebuilt PyPI wheels instead of compiling from source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

sglang is now always installed via prebuilt PyPI wheels, so the python -c "import sglang" check is no longer needed. Reverts c45e889. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

kajalj22 · 2026-05-22T19:11:17Z

/ok to test c25281a

kajalj22 requested review from a team as code owners May 20, 2026 17:30

kajalj22 added the CI:L1 Run doctests, unit tests, and functional tests label May 20, 2026

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:39 Inactive

kajalj22 changed the title ~~deps: switch sglang to prebuilt PyPI wheels (v0.5.11)~~ ci: switch sglang to prebuilt PyPI wheels (v0.5.11) May 20, 2026

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:39 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 20, 2026 17:39 Error

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:39 Inactive

kajalj22 marked this pull request as draft May 20, 2026 17:40

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:43 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 19:40 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 20, 2026 20:58 Inactive

kajalj22 force-pushed the kajalj/sglang-prebuilt-wheels branch from d702b90 to a0e27d8 Compare May 21, 2026 05:13

copy-pr-bot Bot temporarily deployed to public May 21, 2026 05:14 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 05:14 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 05:14 Inactive

copy-pr-bot Bot temporarily deployed to public May 21, 2026 05:18 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 21, 2026 06:11 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci May 21, 2026 08:36 Failure

kajalj22 and others added 4 commits May 21, 2026 18:20

kajalj22 and others added 5 commits May 21, 2026 19:24

Merge branch 'main' into kajalj/upgrade-vllm-2.11

73e9b54

chore: regenerate uv.lock for sglang PyPI wheels

7c0e005

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

kajalj22 force-pushed the kajalj/sglang-prebuilt-wheels branch from a0e27d8 to c574c9d Compare May 22, 2026 18:44

github-actions Bot added the CI Relating to CI label May 22, 2026

kajalj22 and others added 2 commits May 22, 2026 13:45

ci: re-enable sglang in Docker build

39dc17d

Remove SKIP_SGLANG_BUILD=1 from CI — no longer needed since sglang installs from prebuilt PyPI wheels instead of compiling from source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>

kajalj22 changed the title ~~ci: switch sglang to prebuilt PyPI wheels (v0.5.11)~~ deps: switch sglang to prebuilt PyPI wheels (v0.5.11) May 22, 2026

copy-pr-bot Bot temporarily deployed to public May 22, 2026 19:11 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 22, 2026 19:11 Inactive

copy-pr-bot Bot temporarily deployed to public May 22, 2026 19:11 Inactive

copy-pr-bot Bot temporarily deployed to public May 22, 2026 19:12 Inactive

copy-pr-bot Bot temporarily deployed to public May 22, 2026 19:15 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci May 22, 2026 20:10 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps: switch sglang to prebuilt PyPI wheels (v0.5.11)#2535

deps: switch sglang to prebuilt PyPI wheels (v0.5.11)#2535
kajalj22 wants to merge 11 commits into
mainfrom
kajalj/sglang-prebuilt-wheels

kajalj22 commented May 20, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

kajalj22 commented May 20, 2026

Uh oh!

kajalj22 commented May 21, 2026

Uh oh!

copy-pr-bot Bot commented May 22, 2026

Uh oh!

kajalj22 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kajalj22 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Dependency changes (pyproject.toml)

Port fix (sglang_worker.py)

CI / test changes

Risks / things to verify

Test plan

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

kajalj22 commented May 20, 2026

Uh oh!

kajalj22 commented May 21, 2026

Uh oh!

copy-pr-bot Bot commented May 22, 2026

Uh oh!

kajalj22 commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kajalj22 commented May 20, 2026 •

edited

Loading

Dependency changes (`pyproject.toml`)

Port fix (`sglang_worker.py`)