Skip to content

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502

Open
alvinttang wants to merge 1 commit into
deepset-ai:mainfrom
alvinttang:fix/hf-local-generator-stop-words-cross-product
Open

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502
alvinttang wants to merge 1 commit into
deepset-ai:mainfrom
alvinttang:fix/hf-local-generator-stop-words-cross-product

Conversation

@alvinttang
Copy link
Copy Markdown

Refs #11409.

HuggingFaceLocalGenerator.run post-processes replies with a nested list comprehension:

replies = [reply.replace(stop_word, '').rstrip()
           for reply in replies
           for stop_word in self.stop_words]

That's a cross-product. With N replies and M stop words it emits N*M replies, and only every M-th one has every stop word removed. Half the output silently still contains a stop word.

The chat sibling at chat/hugging_face_local.py:660 already does this correctly with a sequential loop, so this PR aligns the non-chat path with the same pattern.

RED

Two new regression tests on main:

FAILED test_run_stop_words_removal_with_multiple_stop_words
  assert {'replies': [...4 entries...]} == {'replies': ['Paris is the capital.', 'France is in Europe.']}

FAILED test_run_stop_words_removal_all_stop_words_removed_from_each_reply
  assert {'replies': ['Hello STOP world']} == {'replies': ['Hello  world']}

The original test_run_stop_words_removal (single stop word) keeps passing.

GREEN

test_run_stop_words_removal                                              PASSED
test_run_stop_words_removal_with_multiple_stop_words                     PASSED
test_run_stop_words_removal_all_stop_words_removed_from_each_reply       PASSED

Full file: 28 passed, 1 deselected (integration, model download), 6 warnings in 140.60s. No regression elsewhere.

…ply post-processing

With N replies and M stop_words, the previous nested-comprehension
produced N*M replies instead of N. Half of the extra replies still
contained the stop word because each iteration only stripped one.

Switching to a sequential loop (already what the chat sibling at
chat/hugging_face_local.py:660 does) keeps the count at N and removes
every stop word from every reply.

Refs deepset-ai#11409
@alvinttang alvinttang requested a review from a team as a code owner June 4, 2026 00:37
@alvinttang alvinttang requested review from anakin87 and removed request for a team June 4, 2026 00:37
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


alvinttang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR.

Please sign the CLA, then ping me and I'll proceed with the actual review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants