[submit-attacker] strengthen attacker policy feedback loop by MollyMoriJing · Pull Request #8 · sszz01/logomesh-lambda

MollyMoriJing · 2026-03-30T23:23:17Z

Summary

This PR strengthens the attacker selection loop on top of the latest main.

The goal is to improve attacker-side generalization by making strategy selection more responsive to prior defender behavior, instead of relying mainly on static heuristics.

What Changed

expanded defender response classification into a richer failure taxonomy
- hard refusal
- policy citation
- safe redirect
- structured fallback
- hard filter
- unknown engagement
fed failure type back into the attacker policy loop
- family cooldown / banning
- strategy reranking
- injection-approach reranking
made SearchPolicy participate more directly in final candidate selection
- not just candidate admission
- also affects final direct / injection ranking
improved target modeling for injection attacks
- schema-aware target extraction
- payload-shape routing
- shape-specific generic fallback templates
kept the change attacker-only for this submission path

Validation

env UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/test_attacker.py -q
env UV_CACHE_DIR=/tmp/uv-cache uv run pytest -q

Local result:

153 passed for attacker tests
321 passed for full test suite

Notes

This branch was rebased onto the latest origin/main first, including the recent attacker update in 7d42c51.

Direct pushes to main are currently blocked by repo rules, so this PR is the clean path to land the prepared [submit-attacker] commit and trigger the attacker submission flow after merge.

[submit-attacker] strengthen attacker policy feedback loop

08fa7c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[submit-attacker] strengthen attacker policy feedback loop#8

[submit-attacker] strengthen attacker policy feedback loop#8
MollyMoriJing wants to merge 1 commit intomainfrom
codex/submit-attacker-feedback-loop

MollyMoriJing commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MollyMoriJing commented Mar 30, 2026

Summary

What Changed

Validation

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant