Skip to content

fix: guardrail redact targets last user message, not trailing LTM context#1884

Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message
Open

fix: guardrail redact targets last user message, not trailing LTM context#1884
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message

Conversation

@giulio-leone
Copy link
Contributor

Issue

Closes #1639

Problem

When guardrail redaction is enabled (guardrail_redact_input=True) together with a long-term memory (LTM) session manager like AgentCoreMemorySessionManager, the redact logic incorrectly modifies the LTM context message instead of the user's input.

The LTM session manager appends an assistant message after the user turn:

messages[0]: {role: 'user',      content: [{text: 'Tell me something bad'}]}      ← should be redacted
messages[1]: {role: 'assistant', content: [{text: '<user_context>...</user_context>'}]}  ← was being redacted

The redact handler used self.messages[-1], which blindly picked the last message regardless of role.

Root Cause

In agent.py, the guardrail redaction code assumed self.messages[-1] is always the user's input:

self.messages[-1]['content'] = self._redact_user_content(
    self.messages[-1]['content'], ...
)

With LTM enabled, messages[-1] is the assistant's context message, not the user's input.

Solution

Replaced self.messages[-1] with a reverse search for the last message with role == 'user':

last_user_msg = next(
    (m for m in reversed(self.messages) if m['role'] == 'user'),
    None,
)

This matches the pattern already used by _find_last_user_text_message_index() in the Bedrock model for guardrail_latest_message wrapping.

Testing

  • Added test_agent_redacts_user_message_not_ltm_context: Simulates the LTM scenario with a trailing assistant context message, verifies the user message is redacted and the LTM context is preserved
  • All 8 guardrail-related tests pass
  • All 113 agent tests pass

Changes

  • src/strands/agent/agent.py: Changed guardrail redact handler to find last user-role message
  • tests/strands/agent/test_agent.py: Added test for LTM + guardrail interaction

When long-term memory (LTM) session managers like
AgentCoreMemorySessionManager append an assistant message containing
user context after the user turn, the guardrail redaction logic
incorrectly redacted the LTM context instead of the actual user input.

Root cause: the redact handler used `self.messages[-1]` which assumes
the last message is the user's input.  With LTM enabled, the message
list looks like:

  [0] user: 'Tell me something bad'       ← should be redacted
  [1] assistant: '<user_context>...</user_context>'  ← was being redacted

The fix replaces `self.messages[-1]` with a reverse search for the
last message with `role == 'user'`, matching the pattern already used
by `_find_last_user_text_message_index()` in the Bedrock model for
guardrail_latest_message wrapping.

Closes strands-agents#1639
@giulio-leone giulio-leone force-pushed the fix/guardrail-redact-last-user-message branch from 44b6bb3 to ce2e12f Compare March 15, 2026 16:12
@github-actions github-actions bot removed the size/s label Mar 15, 2026
@giulio-leone
Copy link
Contributor Author

Friendly ping — fixes guardrail redaction to target the actual last user message instead of trailing long-term memory context, which was causing false positive redactions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] guardrail_redact_input override ltm_msg instead of the last user message

1 participant