fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing by alvinttang · Pull Request #11502 · deepset-ai/haystack

alvinttang · 2026-06-04T00:37:06Z

HuggingFaceLocalGenerator.run post-processes replies with a nested list comprehension:

replies = [reply.replace(stop_word, '').rstrip()
           for reply in replies
           for stop_word in self.stop_words]

That's a cross-product. With N replies and M stop words it emits N*M replies, and only every M-th one has every stop word removed. Half the output silently still contains a stop word.

The chat sibling at chat/hugging_face_local.py:660 already does this correctly with a sequential loop, so this PR aligns the non-chat path with the same pattern.

RED

Two new regression tests on main:

FAILED test_run_stop_words_removal_with_multiple_stop_words
  assert {'replies': [...4 entries...]} == {'replies': ['Paris is the capital.', 'France is in Europe.']}

FAILED test_run_stop_words_removal_all_stop_words_removed_from_each_reply
  assert {'replies': ['Hello STOP world']} == {'replies': ['Hello  world']}

The original test_run_stop_words_removal (single stop word) keeps passing.

GREEN

test_run_stop_words_removal                                              PASSED
test_run_stop_words_removal_with_multiple_stop_words                     PASSED
test_run_stop_words_removal_all_stop_words_removed_from_each_reply       PASSED

Full file: 28 passed, 1 deselected (integration, model download), 6 warnings in 140.60s. No regression elsewhere.

…ply post-processing With N replies and M stop_words, the previous nested-comprehension produced N*M replies instead of N. Half of the extra replies still contained the stop word because each iteration only stripped one. Switching to a sequential loop (already what the chat sibling at chat/hugging_face_local.py:660 does) keeps the count at N and removes every stop word from every reply. Refs deepset-ai#11409

vercel · 2026-06-04T00:37:11Z

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-06-04T00:37:14Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

alvinttang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

anakin87

Thank you for this PR.

Please sign the CLA, then ping me and I'll proceed with the actual review.

alvinttang requested a review from a team as a code owner June 4, 2026 00:37

alvinttang requested review from anakin87 and removed request for a team June 4, 2026 00:37

github-actions Bot added the topic:tests label Jun 4, 2026

anakin87 requested changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502

fix(HuggingFaceLocalGenerator): remove stop_words cross-product in reply post-processing#11502
alvinttang wants to merge 1 commit into
deepset-ai:mainfrom
alvinttang:fix/hf-local-generator-stop-words-cross-product

alvinttang commented Jun 4, 2026

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

CLAassistant commented Jun 4, 2026

Uh oh!

anakin87 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alvinttang commented Jun 4, 2026

RED

GREEN

Uh oh!

vercel Bot commented Jun 4, 2026

Uh oh!

CLAassistant commented Jun 4, 2026

Uh oh!

anakin87 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants