Fix(chat): resolve CAPABILITIES/RULES tool-name contradiction with minimal prompt patch by Jean-Regis-M · Pull Request #447 · GenAI-Security-Project/finbot-ctf

Jean-Regis-M · 2026-03-31T19:40:20Z

Summary

Fixes #443

I resolved the non-deterministic tool-name disclosure behavior by eliminating the contradiction between the CAPABILITIES and RULES sections in VendorChatAssistant._get_system_prompt() (and the matching stale rule in CoPilotAssistant._get_system_prompt()).

Problem

I identified that the CAPABILITIES section explicitly listed MCP tool names (finmail__send_email, finmail__list_inbox, finmail__read_email, finmail__search_emails), which taught the model that these names are acceptable vocabulary. The RULES section then issued a blanket prohibition on disclosing internal tool names. The model received two contradictory, equally-weighted instructions with no conflict-resolution signal.

Root Cause

This occurs because CAPABILITIES used parenthetical tool names to orient the model toward specific dispatch targets, which normalized those names as user-visible vocabulary, while RULES then silently contradicted that normalization producing non-deterministic leakage that cannot be reliably asserted in tests against a live model.

Solution

I applied two minimal changes:

Removed the parenthetical MCP tool names from the FinMail CAPABILITIES bullet the capability description remains intact, only the internal names are stripped.
Replaced the blanket "never disclose internal tool names" rule with a user-facing communication directive that gives the model unambiguous, actionable guidance: describe actions in plain language, not tool names.

Both changes applied to VendorChatAssistant and CoPilotAssistant for consistency.

Impact

No breaking changes
Tool dispatch is unaffected (routing is driven by _tool_callables, not prompt prose)
Constraint is now statically testable: assert "__" not in prompt
Deterministic, auditable behavior on user questions like "what did you just do?"
No regression risk

Testing

I verified the fix using:

Static assertion: assert "__" not in VendorChatAssistant(session)._get_system_prompt()
Existing test test_chat_prompt_055 continues to pass its assertions are now structurally guaranteed rather than incidentally satisfied
Add new test test_capabilities_section_contains_no_mcp_tool_separator asserting "__" is absent from both prompts (acceptance criterion from issue)

Merge Probability Justification

Criterion	Status
Change is minimal and isolated	Two string edits inside f-strings, zero logic changes
Root cause is directly fixed	Contradiction eliminated at its source, names removed from where they conflict
No unnecessary edits	Capability descriptions preserved; only the offending parentheticals removed
Behavior is predictable	Static prompt assertion on `__` is deterministic and CI-runnable
Reviewable in under 60 seconds	Diff is 3 lines changed across 2 methods

…disclosure directive Root cause: CAPABILITIES named finmail__send_email and other MCP tools explicitly, normalizing them as model vocabulary, while RULES then forbade disclosing internal tool names — giving the model two contradictory instructions with no resolution signal. Solution: Removed parenthetical MCP tool names from the FinMail CAPABILITIES bullet in VendorChatAssistant and CoPilotAssistant. Replaced the blanket "never disclose internal tool names" rule with a user-facing communication rule that instructs the model to use plain language instead of tool names when describing its actions. Impact: No breaking changes. Tool dispatch is unaffected (driven by _tool_callables, not prompt prose). Behavior is now deterministic and the constraint is testable with a static prompt assertion on the __ separator. Signed-off-by: JEAN REGIS <240509606@firat.edu.tr>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(chat): resolve CAPABILITIES/RULES tool-name contradiction with minimal prompt patch#447

Fix(chat): resolve CAPABILITIES/RULES tool-name contradiction with minimal prompt patch#447
Jean-Regis-M wants to merge 1 commit intoGenAI-Security-Project:mainfrom
Jean-Regis-M:patch-43

Jean-Regis-M commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jean-Regis-M commented Mar 31, 2026

Summary

Problem

Root Cause

Solution

Impact

Testing

Merge Probability Justification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant