-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
feat(vllm): add grammar and structured output support #8806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
eureka928
wants to merge
8
commits into
mudler:master
Choose a base branch
from
eureka928:feat/vllm-structured-output
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+209
−6
Open
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
666f8c7
feat(proto): add JSONSchema and ResponseFormat fields to PredictOptions
eureka928 1fd670c
feat(backend): pass JSONSchema and ResponseFormat through gRPC
eureka928 0fa07d3
feat(endpoints): extract raw JSON schema for structured output
eureka928 ea89ee8
feat(vllm): add structured output support via guided decoding
eureka928 d65b35f
fix: refine vLLM structured output implementation
eureka928 8511c50
fix(vllm): support both vLLM API versions and add grammar passthrough
eureka928 bb08454
docs: update constrained grammars with vLLM structured output support
eureka928 278e7e2
refactor: use Metadata map instead of dedicated proto fields for stru…
eureka928 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a fallback? We usually pin the upstream version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I checked and vLLM is actually not pinned to a specific version —
requirements-after.txtjust listsvllmwith no version constraint, and different platform builds (CPU/CUDA/ROCm) may end up with different vLLM versions.That said, if the project plans to pin vLLM to a specific version, I'm happy to drop the fallback and target whichever API is current. Let me know which you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, when you say newer versions, how new? If it's a very recent change then maybe we need this, otherwise we probably don't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rename happened in vLLM v0.8.x → latest.
GuidedDecodingParamswas renamed toStructuredOutputsParamsand the correspondingSamplingParamsfield changed fromguided_decodingtostructured_outputs.Since vLLM isn't pinned (
requirements-after.txtjust saysvllm), builds can land on either version depending on when/how the image is built. If we pin to a specific version, I can drop the fallback and target that API directly — let me know which version to target.Also in the latest push: I've refactored to use the
Metadatamap instead of new proto fields, as discussed.