Trojan Source (CVE-2021-42574) bidirectional control characters in file contents were undetected#40
Open
asadbekXodjayev wants to merge 1 commit into
Conversation
The static prompt-injection analyzer now flags Unicode bidirectional control characters (U+202A-U+202E, U+2066-U+2069, U+061C) in file contents. These enable Trojan Source attacks (CVE-2021-42574 / CVE-2021-42694) where source code or text renders differently than it executes, so a reviewer can approve logic the agent/interpreter does not actually run. This was a real gap: P2 only scans markdown/other for zero-width characters, and the bidi check in mcp_tool_poisoning inspects only skill *metadata* fields (and omits U+202A/U+202B/U+061C). A bidi-reordered helper.py was undetected by any analyzer. P9 scans every file type, including source code. - Add P9_PATTERNS + detection loop to static_patterns_prompt_injection.analyze() - Register P9 explanation/category/name/remediation in pattern_defaults - Tests: RLO override in .py, RLI isolate in SKILL.md body, and a no-false-positive case on legitimate Arabic (RTL) text - README: pattern count 64 -> 65; Prompt Injection table adds P9 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: asadbekXodjayev <xadasad67@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: Trojan Source (CVE-2021-42574) bidirectional control characters in file contents were undetected — P2 scans only markdown zero-width chars; mcp_tool_poisoning's bidi check covers only metadata
▎ and omits U+202A/202B/061C.
▎
▎ Change: Adds pattern P9 to static_patterns_prompt_injection, scanning every file type (incl. source code) for the full bidi set. Registers explanation/category/name/remediation in pattern_defaults;
▎ updates README (64→65).
▎
▎ Tests: RLO override in .py, RLI isolate in SKILL.md body, plus a no-false-positive case on legitimate Arabic text. ruff check/format clean; new tests pass.
▎
▎ Environment: Authored with Claude Code (Opus 4.8). Verified locally on CPython 3.13 via uv (the repo's make-equivalent: ruff + pytest). DCO sign-off included.