Skip to content

Fix duplicated replies when HuggingFaceLocalGenerator uses multiple stop words#11414

Closed
18062706139fcz wants to merge 1 commit into
deepset-ai:mainfrom
18062706139fcz:fix-11409-hf-local-generator-stop-words
Closed

Fix duplicated replies when HuggingFaceLocalGenerator uses multiple stop words#11414
18062706139fcz wants to merge 1 commit into
deepset-ai:mainfrom
18062706139fcz:fix-11409-hf-local-generator-stop-words

Conversation

@18062706139fcz
Copy link
Copy Markdown

Summary

Fixes #11409.

HuggingFaceLocalGenerator currently duplicates replies when multiple stop_words are configured. This PR updates the stop word post-processing so each stop word is removed sequentially from the existing replies instead of producing a cross-product of replies and stop words.

Problem

When stop_words contains more than one entry, the generator returns too many replies after generation.

Root Cause

The current implementation uses a nested list comprehension that iterates over both replies and self.stop_words at the same time:

[reply.replace(stop_word, "").rstrip() for reply in replies for stop_word in self.stop_words]

That creates one output entry per (reply, stop_word) pair, which duplicates replies.

Fix

Apply stop words sequentially to the current reply list:

  • iterate over self.stop_words
  • rewrite replies in place for each stop word
  • preserve the original number of replies while still stripping all configured stop words

Tests

Ran:

hatch run test:unit test/components/generators/test_hugging_face_local_generator.py

Result:

  • 25 passed
  • 3 deselected

Added a regression test that verifies multiple stop words do not duplicate replies.

@18062706139fcz 18062706139fcz requested a review from a team as a code owner May 27, 2026 03:34
@18062706139fcz 18062706139fcz requested review from julian-risch and removed request for a team May 27, 2026 03:34
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


ryker seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@julian-risch
Copy link
Copy Markdown
Member

@18062706139fcz Thank you for opening this pull request. We are closing this PR as duplicate of #11413
If you would like to contribute with another PR, please make sure to agree to our CLA. #11414 (comment) and make sure that commits are linked to your account https://help.github.com/articles/why-are-my-commits-linked-to-the-wrong-user/#commits-are-not-linked-to-any-user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: HuggingFaceLocalGenerator returns N×M replies instead of N when stop_words has multiple entries

3 participants