deps: switch sglang to prebuilt PyPI wheels (v0.5.11)#2535
Draft
kajalj22 wants to merge 11 commits into
Draft
Conversation
Contributor
Author
|
/ok to test d702b90 |
d702b90 to
a0e27d8
Compare
Contributor
Author
|
/ok to test a0e27d8 |
- vLLM 0.17.1 → 0.20.0, torch 2.10 → 2.11, torchvision 0.25 → 0.26, flashinfer 0.6.4 → 0.6.8.post1 - Cap requires-python to <3.14 - Adapt to vLLM 0.20 render architecture: move prefix-token override to NeMoRLOpenAIServingRender - Wrap process_weights_after_loading in set_current_vllm_config context - Fix cloudpickle ConfigModuleInstance error (torch 2.11) - Fix Eagle3 draft weight loading: trim padded vocab embeddings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
vLLM 0.20 moved chat preprocessing to the render layer, but create_chat_completion no longer catches errors from that path. Prompts exceeding max_model_len now raise VLLMValidationError as an unhandled exception (500) instead of returning ErrorResponse (400). The Gym proxy only detects context-length overflow on 400, so the 500 crashes rollouts. Catch VLLMValidationError at the endpoint and return HTTP 400 to restore the graceful handling chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang is incompatible with the vLLM 0.20 / torch 2.11 upgrade. Unconditionally set SKIP_SGLANG_BUILD=1 so the Docker build and tests skip sglang until it is updated separately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Eagle3 draft models can have draft_vocab_size (32k) different from vocab_size (151k). The old code used a single org_vocab_size from the first VocabParallelEmbedding (embed_tokens), which meant lm_head weights were never trimmed (padded_32k < 151k → condition false), causing an assertion failure in vLLM's weight_loader. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
vLLM 0.20 added quant_config arg to ParallelLMHead in Eagle3LlamaForCausalLM.__init__, so the old_snippet no longer matched and the has_own_lm_head patch silently failed to apply. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Replace source builds of sglang and sglang-kernel with prebuilt PyPI wheels. This is possible because sglang v0.5.11 declares torch==2.11.0, matching our torch pin after the vLLM 0.20.0 upgrade. - Bump sglang 0.5.10 → 0.5.11, sglang-kernel 0.4.1 → 0.4.2 - Remove VCS source entries from [tool.uv.sources] - Remove sglang-kernel build config (extra-build-variables, no-build-isolation, extra-build-dependencies) - Remove manual dependency-metadata for sglang and sglang-kernel (uv reads metadata from PyPI directly) - Update override-dependencies comments (flashinfer now aligned between vllm and sglang at 0.6.8.post1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang v0.5.11 derives grpc_port = port + 10000 and validates it is <= 65535. Retry port selection if the OS assigns a port > 55535. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
a0e27d8 to
c574c9d
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Remove SKIP_SGLANG_BUILD=1 from CI — no longer needed since sglang installs from prebuilt PyPI wheels instead of compiling from source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
sglang is now always installed via prebuilt PyPI wheels, so the python -c "import sglang" check is no longer needed. Reverts c45e889. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kajal Jain <kajalj@nvidia.com>
Contributor
Author
|
/ok to test c25281a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
torch==2.11.0, matching our torch pin after the vLLM 0.20.0 upgrade in chore: Upgrade vLLM from 0.17.1 to 0.20.0 #2384Dependency changes (
pyproject.toml)sglang==0.5.11andsglang-kernel==0.4.2as regular PyPI deps[tool.uv.sources]extra-build-variables,no-build-isolation-package, andextra-build-dependencies[[tool.uv.dependency-metadata]]sections for sglang and sglang-kernel (uv reads metadata from PyPI directly)uv.lock—scikit-build-coredropped,easydict/kernelsadded,torchaobumped 0.9.0 → 0.17.0Port fix (
sglang_worker.py)grpc_port = port + 10000and validates<= 65535CI / test changes
SKIP_SGLANG_BUILD=1fromcicd-main.yml— sglang now installs from PyPI, no source compilation to skippython -c "import sglang"guards from test scripts (reverts c45e889) — sglang is always installedRisks / things to verify
transformers==5.6.0but our override forces5.3.0— needs functional testing0.1.32, we override to0.1.33for CVE fixtorchao==0.17.0(up from 0.9.0) — verify no regressionsTest plan
test_sglang_non_divisible_batch_handlingetc.)🤖 Generated with Claude Code