fix(bedrock): wrap user text/image in guardContent to prevent tool result false positives#1886
Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
Conversation
…sult false positives When guardrails are enabled, tool results (role='user') containing text like 'You are Test Admin User.' can trigger false-positive prompt injection detections because the guardrail treats them as user input. The fix wraps ALL user text/image content blocks in guardContent when guardrails are enabled (not just when guardrail_latest_message=True). This signals the guardrail to evaluate ONLY those blocks, excluding tool results from scanning. Behavior change: - guardrail_latest_message=True: unchanged (only latest user text wrapped) - guardrail_latest_message=False (default): all user text/image wrapped, tool results excluded from guardrail scanning Closes strands-agents#1671
9c5cdb1 to
6114956
Compare
Contributor
Author
|
Friendly ping — wraps user text/image content in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1671
When Bedrock guardrails are enabled, tool results stored with
role: "user"can trigger false-positive prompt injection detections. For example, a tool returning"You are Test Admin User."gets flagged as a prompt injection attack on subsequent messages.Root Cause
Without
guardrail_latest_message=True, no content blocks are wrapped inguardContent. The Bedrock guardrail then evaluates all message content — including tool results that happen to haverole: "user"— leading to false positives on system-generated content.Even with the existing
_find_last_user_text_message_indexfix (PR #1658), this only activates whenguardrail_latest_message=True. The default behavior (False) leaves tool results exposed to guardrail scanning.Fix
When guardrails are enabled (
guardrail_id+guardrail_versionare set), all usertextandimagecontent blocks are wrapped inguardContent. This signals the guardrail to evaluate only those blocks, excluding tool results (which containtoolResultblocks, nottext/image) from scanning.Behavior
guardrail_latest_messageTrueFalse(default)Tests
test_format_request_guardrail_default_wraps_all_user_text— verifies all user text is wrapped when guardrails enabledtest_format_request_guardrail_default_excludes_tool_results— reproduces the exact scenario from [BUG] Bedrock Guardrail False Positive on Tool Results #1671test_format_request_no_guardrail_no_wrapping— verifies no wrapping without guardrails