fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words by NIK-TIGER-BILL · Pull Request #11413 · deepset-ai/haystack

NIK-TIGER-BILL · 2026-05-26T23:09:44Z

Related Issues

fixes bug: HuggingFaceLocalGenerator returns N×M replies instead of N when stop_words has multiple entries #11409

Proposed Changes:

The list comprehension in HuggingFaceLocalGenerator.run() used two for clauses to remove stop words from replies:

replies = [reply.replace(stop_word, "").rstrip() for reply in replies for stop_word in self.stop_words]

This creates a cross-product: with N replies and M stop words the output contains N×M replies instead of N. Half of the extra replies still contain the stop word, and downstream components receive an unexpected number of replies.

The fix replaces the comprehension with an explicit outer loop over stop_words, matching the already-correct implementation in HuggingFaceChatLocalGenerator (chat/hugging_face_local.py, line 654):

for stop_word in self.stop_words:
    replies = [reply.replace(stop_word, "").rstrip() for reply in replies]

How did you test it?

Added a new unit test test_run_multiple_stop_words_removal that mocks a pipeline returning 2 replies with 2 stop words configured.
Before the fix the test would produce 4 replies; after the fix it correctly returns 2 replies with both stop words stripped.
Verified the exact logic with a standalone Python snippet.

Notes for the reviewer

This is a minimal, surgical change. The sibling chat generator already uses the same sequential approach, so this only brings the non-chat generator in line with the existing pattern.

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I have used one of the conventional commit types for my PR title: fix: ....
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

This PR was fully generated with an AI assistant. I have reviewed the changes and run the relevant tests.

vercel · 2026-05-26T23:09:50Z

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-05-26T23:13:21Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

julian-risch · 2026-05-27T06:52:22Z

@NIK-TIGER-BILL Thank you for opening this pull request. Would you please agree to our CLA? Otherwise we can't merge this pull request. #11413 (comment)

NIK-TIGER-BILL · 2026-05-27T07:09:55Z

@NIK-TIGER-BILL Thank you for opening this pull request. Would you please agree to our CLA? Otherwise we can't merge this pull request. #11413 (comment)

Done!
https://cla-assistant.io/deepset-ai/haystack - "You have agreed to the CLA for deepset-ai/haystack"

julian-risch · 2026-05-29T12:12:27Z

@NIK-TIGER-BILL Your commit in this pull request appears not to be linked to your user account. Could you please fix that? Here are instructions: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/troubleshooting-commits/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

NIK-TIGER-BILL · 2026-05-29T23:04:06Z

@julian-risch The commit author is already set to NIK-TIGER-BILL <nik.tiger.bill@github.com>. Could the issue be that this email address needs to be verified in my GitHub account settings? I have amended the commits with this email previously. Please let me know if there is anything else I should adjust.

julian-risch · 2026-06-01T07:01:59Z

@NIK-TIGER-BILL Unfortunately, the problem with the CLA is not solved yet in this PR. What you did in this other PR #11385 solved the problem there. Did you force push already?

git config user.email "new.email@example.com"
git commit --amend --author="Your Name <new.email@example.com>" --no-edit
git push --force-with-lease

…ltiple stop_words Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>

NIK-TIGER-BILL · 2026-06-01T23:04:32Z

@julian-risch Done — I rebased the branch on latest main, kept the original fix, and force-pushed. The commit author is set to NIK-TIGER-BILL <nik.tiger.bill@github.com>. Let me know if the CLA check passes now.

NIK-TIGER-BILL requested a review from a team as a code owner May 26, 2026 23:09

NIK-TIGER-BILL requested review from bogdankostic and removed request for a team May 26, 2026 23:09

github-actions Bot added the topic:tests label May 26, 2026

sjrl requested a review from julian-risch May 27, 2026 06:06

sachinn854 mentioned this pull request May 27, 2026

bug: HuggingFaceLocalGenerator returns N×M replies instead of N when stop_words has multiple entries #11409

Open

1 task

julian-risch mentioned this pull request May 29, 2026

Fix duplicated replies when HuggingFaceLocalGenerator uses multiple stop words #11414

Closed

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with mu…

49c4c0f

…ltiple stop_words Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>

NIK-TIGER-BILL force-pushed the fix-hf-local-stop-words-cross-product branch from fa48350 to 49c4c0f Compare June 1, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words#11413

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words#11413
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-hf-local-stop-words-cross-product

NIK-TIGER-BILL commented May 26, 2026

Uh oh!

vercel Bot commented May 26, 2026

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

julian-risch commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

julian-risch commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

julian-risch commented Jun 1, 2026

Uh oh!

NIK-TIGER-BILL commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NIK-TIGER-BILL commented May 26, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented May 26, 2026

Uh oh!

CLAassistant commented May 26, 2026

Uh oh!

julian-risch commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

julian-risch commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

julian-risch commented Jun 1, 2026

Uh oh!

NIK-TIGER-BILL commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants