Add /tokenize and /apply-template endpoints like llama.cpp offers. #51
Closed
unverbraucht wants to merge 1 commit intoSearchSavior:mainfrom
Closed
Add /tokenize and /apply-template endpoints like llama.cpp offers. #51unverbraucht wants to merge 1 commit intoSearchSavior:mainfrom
unverbraucht wants to merge 1 commit intoSearchSavior:mainfrom
Conversation
…is aids in running benchmarks and getting compatibility up with llama-server
Owner
|
Apologies for the late reply. I think this work could be integrated in the future as a set of utilities. Logprobs are on the todo which will take some work to implement and demand some refactoring to accommodate cleanly. As in, a new set of endpoints related to tokens/tokenizers similar to what you implemented here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I wanted to understand the performance OpenArc offers and wanted to run server-benchmark.py from llama.cpp for benchmarking. It relies on retrieving the prompt template of a model via the API, and to then tokenize this prompt plus user query with a output token cut-off so that we get a fixed amount of input tokens.
I've added these two endpoints in both the API endpoint that llama.cpp offers and also the OpenAI API compatible endpoint with /v1/ prefix. This aids in running benchmarks and getting compatibility up with llama-server, and I assume that also other uses can be found (especially retrieving the system prompt sounds handy for front-ends).
Would you consider merging these?