Skip to content

[submit-deffender] grounding, provenance, single-profile guardrails, streamlined system prompt#10

Open
MollyMoriJing wants to merge 1 commit intomainfrom
submit-defender
Open

[submit-deffender] grounding, provenance, single-profile guardrails, streamlined system prompt#10
MollyMoriJing wants to merge 1 commit intomainfrom
submit-defender

Conversation

@MollyMoriJing
Copy link
Copy Markdown
Collaborator

Changes:

  • Structured guardrails: single-profile mode, provenance-aware rationale synthesis, entity grounding with foreign-entity detection, improved fallback scoring
  • Security pipeline: updated spotlighting format, refined threat analysis
  • Agent: evidence compilation, generation pipeline hardening
  • System template: streamlined security principles for better LLM adherence
  • Framework: run_scenario and scorecard improvements
  • Tests: updated defender execute, guardrails, security, ablation, and scorecard tests

@MollyMoriJing MollyMoriJing changed the title defender: grounding, provenance, single-profile guardrails, streamlined system prompt [submit-attacker] grounding, provenance, single-profile guardrails, streamlined system prompt Mar 31, 2026
@MollyMoriJing MollyMoriJing changed the title [submit-attacker] grounding, provenance, single-profile guardrails, streamlined system prompt [submit-deffender] grounding, provenance, single-profile guardrails, streamlined system prompt Mar 31, 2026
@MollyMoriJing MollyMoriJing requested a review from sszz01 March 31, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant