Skip to content

fix(bedrock): wrap user text/image in guardContent to prevent tool result false positives#1886

Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-tool-result-false-positive
Open

fix(bedrock): wrap user text/image in guardContent to prevent tool result false positives#1886
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-tool-result-false-positive

Conversation

@giulio-leone
Copy link
Contributor

Summary

Fixes #1671

When Bedrock guardrails are enabled, tool results stored with role: "user" can trigger false-positive prompt injection detections. For example, a tool returning "You are Test Admin User." gets flagged as a prompt injection attack on subsequent messages.

Root Cause

Without guardrail_latest_message=True, no content blocks are wrapped in guardContent. The Bedrock guardrail then evaluates all message content — including tool results that happen to have role: "user" — leading to false positives on system-generated content.

Even with the existing _find_last_user_text_message_index fix (PR #1658), this only activates when guardrail_latest_message=True. The default behavior (False) leaves tool results exposed to guardrail scanning.

Fix

When guardrails are enabled (guardrail_id + guardrail_version are set), all user text and image content blocks are wrapped in guardContent. This signals the guardrail to evaluate only those blocks, excluding tool results (which contain toolResult blocks, not text/image) from scanning.

Behavior

guardrail_latest_message Before After
True Only latest user text wrapped Unchanged
False (default) No wrapping — guardrail scans everything All user text/image wrapped — tool results excluded

Tests

  • Added test_format_request_guardrail_default_wraps_all_user_text — verifies all user text is wrapped when guardrails enabled
  • Added test_format_request_guardrail_default_excludes_tool_results — reproduces the exact scenario from [BUG] Bedrock Guardrail False Positive on Tool Results #1671
  • Added test_format_request_no_guardrail_no_wrapping — verifies no wrapping without guardrails
  • Updated 2 existing config tests to reflect new wrapping behavior
  • All 128 bedrock tests pass

…sult false positives

When guardrails are enabled, tool results (role='user') containing text like
'You are Test Admin User.' can trigger false-positive prompt injection
detections because the guardrail treats them as user input.

The fix wraps ALL user text/image content blocks in guardContent when
guardrails are enabled (not just when guardrail_latest_message=True).
This signals the guardrail to evaluate ONLY those blocks, excluding
tool results from scanning.

Behavior change:
- guardrail_latest_message=True: unchanged (only latest user text wrapped)
- guardrail_latest_message=False (default): all user text/image wrapped,
  tool results excluded from guardrail scanning

Closes strands-agents#1671
@giulio-leone giulio-leone force-pushed the fix/guardrail-tool-result-false-positive branch from 9c5cdb1 to 6114956 Compare March 15, 2026 16:12
@github-actions github-actions bot added size/m and removed size/m labels Mar 15, 2026
@giulio-leone
Copy link
Contributor Author

Friendly ping — wraps user text/image content in guardContent format for Bedrock, preventing guardrail checks from treating tool results as user input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Bedrock Guardrail False Positive on Tool Results

1 participant