Skip to content

deps: switch sglang to prebuilt PyPI wheels (v0.5.11)#2535

Draft
kajalj22 wants to merge 11 commits into
mainfrom
kajalj/sglang-prebuilt-wheels
Draft

deps: switch sglang to prebuilt PyPI wheels (v0.5.11)#2535
kajalj22 wants to merge 11 commits into
mainfrom
kajalj/sglang-prebuilt-wheels

Conversation

@kajalj22
Copy link
Copy Markdown
Contributor

@kajalj22 kajalj22 commented May 20, 2026

Summary

  • Upgrade sglang 0.5.10 → 0.5.11 and sglang-kernel 0.4.1 → 0.4.2, switching from source builds to prebuilt PyPI wheels
  • Eliminates the slow sglang-kernel CUDA compilation during Docker builds
  • Possible because sglang v0.5.11 declares torch==2.11.0, matching our torch pin after the vLLM 0.20.0 upgrade in chore: Upgrade vLLM from 0.17.1 to 0.20.0 #2384

Dependency changes (pyproject.toml)

  • Pinned sglang==0.5.11 and sglang-kernel==0.4.2 as regular PyPI deps
  • Removed VCS git source entries for sglang and sglang-kernel from [tool.uv.sources]
  • Removed sglang-kernel from extra-build-variables, no-build-isolation-package, and extra-build-dependencies
  • Removed manual [[tool.uv.dependency-metadata]] sections for sglang and sglang-kernel (uv reads metadata from PyPI directly)
  • Updated override-dependencies comments (flashinfer now aligned between vllm and sglang at 0.6.8.post1)
  • Regenerated uv.lockscikit-build-core dropped, easydict/kernels added, torchao bumped 0.9.0 → 0.17.0

Port fix (sglang_worker.py)

  • sglang v0.5.11 derives grpc_port = port + 10000 and validates <= 65535
  • Added retry loop to cap the server port at 55535 so grpc_port stays valid

CI / test changes

  • Removed SKIP_SGLANG_BUILD=1 from cicd-main.yml — sglang now installs from PyPI, no source compilation to skip
  • Removed python -c "import sglang" guards from test scripts (reverts c45e889) — sglang is always installed

Risks / things to verify

  • CUDA arch compatibility: PyPI sglang-kernel wheels should include sm_90a (Hopper) and sm_100a (Blackwell) — needs runtime validation
  • transformers override: sglang v0.5.11 declares transformers==5.6.0 but our override forces 5.3.0 — needs functional testing
  • xgrammar override: sglang v0.5.11 wants 0.1.32, we override to 0.1.33 for CVE fix
  • torchao bump: sglang v0.5.11 pulls in torchao==0.17.0 (up from 0.9.0) — verify no regressions

Depends on #2384 (vLLM 0.17.1 → 0.20.0 + torch 2.11.0). Merge that first.

Test plan

  • Build Docker image successfully (verify sglang-kernel installs from PyPI, no source compilation)
  • Run sglang unit tests (test_sglang_non_divisible_batch_handling etc.)
  • Run sglang functional/nightly tests (GRPO with sglang inference backend)
  • Verify CUDA kernels work on H100 (sm_90a) and B200 (sm_100a) if available

🤖 Generated with Claude Code

@kajalj22 kajalj22 requested review from a team as code owners May 20, 2026 17:30
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kajalj22
Copy link
Copy Markdown
Contributor Author

/ok to test d702b90

@kajalj22 kajalj22 added the CI:L1 Run doctests, unit tests, and functional tests label May 20, 2026
@kajalj22 kajalj22 changed the title deps: switch sglang to prebuilt PyPI wheels (v0.5.11) ci: switch sglang to prebuilt PyPI wheels (v0.5.11) May 20, 2026
@kajalj22 kajalj22 marked this pull request as draft May 20, 2026 17:40
@kajalj22 kajalj22 force-pushed the kajalj/sglang-prebuilt-wheels branch from d702b90 to a0e27d8 Compare May 21, 2026 05:13
@kajalj22
Copy link
Copy Markdown
Contributor Author

/ok to test a0e27d8

kajalj22 and others added 4 commits May 21, 2026 18:20
- vLLM 0.17.1 → 0.20.0, torch 2.10 → 2.11, torchvision 0.25 → 0.26,
  flashinfer 0.6.4 → 0.6.8.post1
- Cap requires-python to <3.14
- Adapt to vLLM 0.20 render architecture: move prefix-token override
  to NeMoRLOpenAIServingRender
- Wrap process_weights_after_loading in set_current_vllm_config context
- Fix cloudpickle ConfigModuleInstance error (torch 2.11)
- Fix Eagle3 draft weight loading: trim padded vocab embeddings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
vLLM 0.20 moved chat preprocessing to the render layer, but
create_chat_completion no longer catches errors from that path.
Prompts exceeding max_model_len now raise VLLMValidationError
as an unhandled exception (500) instead of returning ErrorResponse
(400). The Gym proxy only detects context-length overflow on 400,
so the 500 crashes rollouts.

Catch VLLMValidationError at the endpoint and return HTTP 400 to
restore the graceful handling chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang is incompatible with the vLLM 0.20 / torch 2.11 upgrade.
Unconditionally set SKIP_SGLANG_BUILD=1 so the Docker build and
tests skip sglang until it is updated separately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Eagle3 draft models can have draft_vocab_size (32k) different from
vocab_size (151k). The old code used a single org_vocab_size from the
first VocabParallelEmbedding (embed_tokens), which meant lm_head
weights were never trimmed (padded_32k < 151k → condition false),
causing an assertion failure in vLLM's weight_loader.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
kajalj22 and others added 5 commits May 21, 2026 19:24
vLLM 0.20 added quant_config arg to ParallelLMHead in
Eagle3LlamaForCausalLM.__init__, so the old_snippet no longer matched
and the has_own_lm_head patch silently failed to apply.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Replace source builds of sglang and sglang-kernel with prebuilt PyPI
wheels. This is possible because sglang v0.5.11 declares torch==2.11.0,
matching our torch pin after the vLLM 0.20.0 upgrade.

- Bump sglang 0.5.10 → 0.5.11, sglang-kernel 0.4.1 → 0.4.2
- Remove VCS source entries from [tool.uv.sources]
- Remove sglang-kernel build config (extra-build-variables,
  no-build-isolation, extra-build-dependencies)
- Remove manual dependency-metadata for sglang and sglang-kernel
  (uv reads metadata from PyPI directly)
- Update override-dependencies comments (flashinfer now aligned
  between vllm and sglang at 0.6.8.post1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang v0.5.11 derives grpc_port = port + 10000 and validates it
is <= 65535. Retry port selection if the OS assigns a port > 55535.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 force-pushed the kajalj/sglang-prebuilt-wheels branch from a0e27d8 to c574c9d Compare May 22, 2026 18:44
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the CI Relating to CI label May 22, 2026
kajalj22 and others added 2 commits May 22, 2026 13:45
Remove SKIP_SGLANG_BUILD=1 from CI — no longer needed since sglang
installs from prebuilt PyPI wheels instead of compiling from source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang is now always installed via prebuilt PyPI wheels, so the
python -c "import sglang" check is no longer needed. Reverts c45e889.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Kajal Jain <kajalj@nvidia.com>
@kajalj22 kajalj22 changed the title ci: switch sglang to prebuilt PyPI wheels (v0.5.11) deps: switch sglang to prebuilt PyPI wheels (v0.5.11) May 22, 2026
@kajalj22
Copy link
Copy Markdown
Contributor Author

/ok to test c25281a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests CI Relating to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant