Skip to content

fix for splitted utf-8 signs, based on genai solution#4284

Open
przepeck wants to merge 1 commit into
servables_refactor_phase1from
servable_refactor_utf8_fix
Open

fix for splitted utf-8 signs, based on genai solution#4284
przepeck wants to merge 1 commit into
servables_refactor_phase1from
servable_refactor_utf8_fix

Conversation

@przepeck

Copy link
Copy Markdown
Collaborator

🛠 Summary

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@przepeck przepeck requested a review from dkalinowski June 11, 2026 08:51

// 1. Newline flush: emit everything and reset.
if (!text.empty() && text.back() == '\n' && text.size() > m_printed_len) {
const auto status = flush_chunk(text, text.size(), ov::genai::GenerationFinishReason::NONE);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GenAI it is cut (- m_printed_len):

res << std::string_view{text.data() + m_printed_len, text.size() - m_printed_len};

https://github.com/openvinotoolkit/openvino.genai/blob/master/src/cpp/src/text_streamer.cpp#L36

const size_t n = m_decoded_lengths.size();
if (n >= 2 && m_decoded_lengths[n - 1] == m_decoded_lengths[n - 2]) {
const size_t text_size = text.size();
char replacement[] = "\xef\xbf\xbd";

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create is_incomplete function just like in genai, it will be easier for us to debug later

if (n < DELAY_N_TOKENS) {
return ov::genai::StreamingStatus::RUNNING;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we also need compute_decoded_length_for_position from genai?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants