Skip to content

Mkulakow/fix output tokens#4282

Open
michalkulakowski wants to merge 5 commits into
mainfrom
mkulakow/fix_output_tokens
Open

Mkulakow/fix output tokens#4282
michalkulakowski wants to merge 5 commits into
mainfrom
mkulakow/fix_output_tokens

Conversation

@michalkulakowski

Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings June 10, 2026 14:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes legacy streaming response generation so that token usage fields (prompt/completion/output tokens) are populated correctly in serialized streaming events—especially for the OpenAI Responses endpoint where usage is embedded in the response.completed / response.incomplete event payload.

Changes:

  • Set prompt/completion token usage on the API handler before calling serializeStreamingChunk() in legacy LM and VLM streaming finalization paths.
  • Add unit tests validating correct usage fields for Responses endpoint completed/incomplete events and for chat_completions usage SSE chunk behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/test/http_openai_handler_test.cpp Adds legacy servable streaming tests to assert correct token usage serialization for Responses and chat_completions.
src/llm/language_model/legacy/servable.cpp Moves set*TokensUsage() calls before final serializeStreamingChunk() so usage is available during serialization.
src/llm/visual_language_model/legacy/servable.cpp Same ordering fix as LM legacy servable for VLM legacy streaming finalization.

Comment on lines +5681 to +5684
std::optional<uint32_t> maxTokensLimit;
const absl::Status parseStatus = apiHandler->parseRequest(maxTokensLimit, 0, std::nullopt);
ASSERT_TRUE(parseStatus.ok()) << parseStatus;
ctx->apiHandler = apiHandler;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants