Skip to content

Add /tokenize and /apply-template endpoints like llama.cpp offers. #51

Closed
unverbraucht wants to merge 1 commit intoSearchSavior:mainfrom
KIntegrated:feat/template_and_tokenize_endpoints
Closed

Add /tokenize and /apply-template endpoints like llama.cpp offers. #51
unverbraucht wants to merge 1 commit intoSearchSavior:mainfrom
KIntegrated:feat/template_and_tokenize_endpoints

Conversation

@unverbraucht
Copy link
Copy Markdown

I wanted to understand the performance OpenArc offers and wanted to run server-benchmark.py from llama.cpp for benchmarking. It relies on retrieving the prompt template of a model via the API, and to then tokenize this prompt plus user query with a output token cut-off so that we get a fixed amount of input tokens.

I've added these two endpoints in both the API endpoint that llama.cpp offers and also the OpenAI API compatible endpoint with /v1/ prefix. This aids in running benchmarks and getting compatibility up with llama-server, and I assume that also other uses can be found (especially retrieving the system prompt sounds handy for front-ends).

Would you consider merging these?

…is aids in running benchmarks and getting compatibility up with llama-server
@SearchSavior
Copy link
Copy Markdown
Owner

Apologies for the late reply.

I think this work could be integrated in the future as a set of utilities. Logprobs are on the todo which will take some work to implement and demand some refactoring to accommodate cleanly. As in, a new set of endpoints related to tokens/tokenizers similar to what you implemented here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants