Add untrusted annotation to proposed mitigations by johannhof · Pull Request #137 · webmachinelearning/webmcp

johannhof · 2026-03-11T21:15:39Z

No description provided.

johannhof · 2026-03-11T21:16:26Z

victorhuangwq · 2026-03-11T21:20:59Z

docs/security-privacy-considerations.md

+
+**Threats addressed:**  Prompt Injection Attacks (Output Injection Attacks)
+
+**How:** A boolean `contains_untrusted_content: true` annotation that acts as a signal to the client that the payload requires heightened security handling, allowing the client to sanitize the payload, use indicators such as spotlighting to highlight untrustworthy content to the model, or hide that part of the response entirely.


Would prefer this to be part of ToolAnnotations. But I think we can discuss that in the thread.

victorhuangwq

LGTM

johannhof · 2026-03-12T14:26:03Z

@domfarolino PTAL

Add untrusted annotation to proposed mitigations

416bad9

johannhof force-pushed the add-untrusted-annotation-mitigation branch from 18bfb9c to 416bad9 Compare March 11, 2026 21:16

victorhuangwq reviewed Mar 11, 2026

View reviewed changes

victorhuangwq approved these changes Mar 11, 2026

View reviewed changes

johannhof mentioned this pull request Mar 11, 2026

Proposal: Untrusted Annotation for Tool Responses #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add untrusted annotation to proposed mitigations#137

Add untrusted annotation to proposed mitigations#137
johannhof wants to merge 1 commit intowebmachinelearning:mainfrom
johannhof:add-untrusted-annotation-mitigation

johannhof commented Mar 11, 2026

Uh oh!

johannhof commented Mar 11, 2026

Uh oh!

victorhuangwq Mar 11, 2026

Uh oh!

victorhuangwq left a comment

Uh oh!

johannhof commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Threats addressed: Prompt Injection Attacks (Output Injection Attacks)

		How: A boolean `contains_untrusted_content: true` annotation that acts as a signal to the client that the payload requires heightened security handling, allowing the client to sanitize the payload, use indicators such as spotlighting to highlight untrustworthy content to the model, or hide that part of the response entirely.

Conversation

johannhof commented Mar 11, 2026

Uh oh!

johannhof commented Mar 11, 2026

Uh oh!

victorhuangwq Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

victorhuangwq left a comment

Choose a reason for hiding this comment

Uh oh!

johannhof commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants