Skip to content

VLM chat aplication via python jinja#4235

Open
dkalinowski wants to merge 13 commits into
mainfrom
vlm-python-jinja
Open

VLM chat aplication via python jinja#4235
dkalinowski wants to merge 13 commits into
mainfrom
vlm-python-jinja

Conversation

@dkalinowski

@dkalinowski dkalinowski commented May 22, 2026

Copy link
Copy Markdown
Collaborator

This change switches chat template application from GenAI to Python's Jinja for VLM & VLM_CB pipelines.

MMMU_VAL benchmark results

model current application via minja chat application via python jinja llama.cpp
Qwen3-VL-8B-Instruct 0.51 0.5033 0.5122
Phi-3.5-vision-instruct-int8-ov 0.4322 0.4333
Qwen3.6-35B-A3B-int4-ov 0.226 0.2567

BFCL with new chat templates (multi-turn)

model current application via minja chat application via python jinja
Qwen3-VL-8B-Instruct 0.55 0.515
Qwen3.6-35B-A3B-int4-ov 0.226 TBD
Qwen3.5-35B-A3B-int4-ov ? TBD

Copilot AI review requested due to automatic review settings May 22, 2026 11:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Visual Language Model (VLM) servables to support applying chat templates via a Python/Jinja processor (when Python support is enabled), including injecting <ov_genai_image_*> tags into the request JSON before template rendering, and adds exception handling around the C++ tokenizer chat-template path.

Changes:

  • Added RapidJSON-based rewriting of the request JSON to prepend image tags into messages[*].content before calling PyJinjaTemplateProcessor::applyChatTemplate (Python-enabled builds).
  • Wrapped tokenizer.apply_chat_template(...) in try/catch and improved error handling for invalid/missing chat templates (Python-disabled builds).
  • Added validation that the final prompt after template application is non-empty.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/llm/visual_language_model/legacy/servable.cpp Adds Python/Jinja chat-template path with image-tag injection and improves error handling for chat-template application.
src/llm/visual_language_model/continuous_batching/servable.cpp Mirrors the Python/Jinja chat-template path and image-tag injection for continuous batching, plus exception handling around tokenizer template application.
Comments suppressed due to low confidence (2)

src/llm/visual_language_model/legacy/servable.cpp:322

  • msg.HasMember("content") also asserts if the messages[chatTurnIndex] element is not an object. Guard with msg.IsObject() (or use msg.GetObject().FindMember) before HasMember to avoid crashes on unexpected request shapes.
                    if (chatTurnIndex < messages.Size()) {
                        auto& msg = messages[chatTurnIndex];
                        if (msg.HasMember("content") && msg["content"].IsString()) {
                            std::string newContent = imageTagString + msg["content"].GetString();
                            msg["content"].SetString(newContent.c_str(), newContent.length(), jsonDoc.GetAllocator());
                        }

src/llm/visual_language_model/continuous_batching/servable.cpp:126

  • msg.HasMember("content") asserts if messages[chatTurnIndex] is not an object. Guard with msg.IsObject() (or use msg.GetObject().FindMember) before HasMember to avoid crashes on unexpected request shapes.
                    if (chatTurnIndex < messages.Size()) {
                        auto& msg = messages[chatTurnIndex];
                        if (msg.HasMember("content") && msg["content"].IsString()) {
                            std::string newContent = imageTagString + msg["content"].GetString();
                            msg["content"].SetString(newContent.c_str(), newContent.length(), jsonDoc.GetAllocator());
                        }

Comment thread src/llm/visual_language_model/legacy/servable.cpp Outdated
Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp Outdated
Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants