Skip to content

feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27)#461

Open
steadhac wants to merge 2 commits intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/orchestrator-coverage-tests
Open

feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27)#461
steadhac wants to merge 2 commits intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/orchestrator-coverage-tests

Conversation

@steadhac
Copy link
Copy Markdown
Contributor

@steadhac steadhac commented Apr 4, 2026

PR — OrchestratorAgent Unit Tests

Add a comprehensive unit test suite for the OrchestratorAgent — the LLM-powered
workflow coordinator that delegates tasks across 6 specialized agents. Tests cover
initialization, config, system/user prompts, tool definitions, delegation limits,
workflow context propagation, delegate callables, event emission, CTF vulnerabilities,
and edge cases.

Tests follow the established pattern with:

Title / Basically question / Steps / Expected Results

Bug-exposing tests included for each confirmed production defect.


📁 Test Files

tests/unit/agents/test_orchestrator.py


TestOrchestratorInit

Test ID Title
ORCH-INIT-001 _delegation_attempts is an empty dict on init
ORCH-INIT-002 _current_task_data starts as None
ORCH-INIT-003 _workflow_context starts as empty list
ORCH-INIT-004 agent_name is 'orchestrator_agent'
ORCH-INIT-005 max_delegation_attempts is 2
ORCH-INIT-006 workflow_id stored on init

TestOrchestratorConfig

Test ID Title
ORCH-CFG-001 load_config returns custom_goals as None by default
ORCH-CFG-002 max_iterations is 15

TestOrchestratorPrompts

Test ID Title
ORCH-PROMPT-001 System prompt contains all six agents
ORCH-PROMPT-002 System prompt has no CUSTOM GOALS block by default
ORCH-PROMPT-003 System prompt includes custom goals when set
ORCH-PROMPT-004 User prompt returns default when no task data
ORCH-PROMPT-005 User prompt includes task description
ORCH-PROMPT-006 User prompt includes context fields

TestOrchestratorTools

Test ID Title
ORCH-TOOLS-001 get_tool_definitions returns exactly 6 tools
ORCH-TOOLS-002 Tool names match expected set
ORCH-TOOLS-003 get_callables returns 6 entries
ORCH-TOOLS-004 All callables are callable
ORCH-TOOLS-005 Communication tool has notification_type enum

TestDelegationLimit

Test ID Title
ORCH-DELIM-001 First delegation call returns None
ORCH-DELIM-002 Second delegation call returns None
ORCH-DELIM-003 Third call returns failure result
ORCH-DELIM-004 Counters are per agent key
ORCH-DELIM-005 Attempt count increments correctly

TestWorkflowContext

Test ID Title
ORCH-CTX-001 _enrich_with_prior_context returns original when no context
ORCH-CTX-002 _enrich_with_prior_context appends prior agent summaries
ORCH-CTX-003 _capture_agent_context stores summary
ORCH-CTX-004 _capture_agent_context skips empty summary
ORCH-CTX-005 Multiple contexts accumulated correctly
ORCH-CTX-006 _enrich_with_prior_context includes all prior contexts

TestDelegateCallables

Test ID Title
ORCH-DEL-001 delegate_to_onboarding calls runner
ORCH-DEL-002 delegate_to_invoice calls runner
ORCH-DEL-003 delegate_to_fraud calls runner
ORCH-DEL-004 delegate_to_payments appends next_step on success
ORCH-DEL-005 delegate_to_communication passes notification_type
ORCH-DEL-006 Delegation cap blocks third call
ORCH-DEL-007 Delegation captures agent context
ORCH-DEL-008 attachment_file_ids forwarded to invoice agent
ORCH-DEL-009 to_addresses included when provided

TestEventEmission

Test ID Title
ORCH-EVENT-001 _emit_delegation_event calls event bus
ORCH-EVENT-002 Emit includes target agent
ORCH-EVENT-003 task_summary truncated to 200 chars

TestCtfVulnerability

Test ID Title
ORCH-CTF-001 Attacker-controlled summary injected verbatim into downstream prompt
ORCH-CTF-002 ⚠️ Whitespace-only summary stored in _workflow_context
ORCH-CTF-003 ⚠️ payment_confirmation next_step injected on failed payment

TestEdgeCases

Test ID Title
ORCH-EDGE-001 _capture_agent_context ignores missing task_summary key
ORCH-EDGE-002 _capture_agent_context skips None task_summary
ORCH-EDGE-003 _enrich_with_prior_context appends context to empty description
ORCH-EDGE-004 Context block contains 'include all directives' header
ORCH-EDGE-005 Empty task_data uses fallback description
ORCH-EDGE-006 Capped agent stays blocked indefinitely
ORCH-EDGE-008 Empty cc_addresses not forwarded
ORCH-EDGE-009 Empty bcc_addresses not forwarded
ORCH-EDGE-010 _emit_delegation_event handles empty result dict
ORCH-EDGE-011 Summary exactly 200 chars not truncated
ORCH-EDGE-012 system_maintenance routes through fraud agent
ORCH-EDGE-013 on_task_completion does not raise
ORCH-EDGE-014 process stores task_data before running

TestQAFindings

Test ID Title
ORCH-QA-001 ⚠️ Whitespace-only task_summary should not be captured
ORCH-QA-002 ⚠️ next_step on failed payment misleads LLM
ORCH-QA-004 system_maintenance injects dangerous tool names
ORCH-QA-005 Context header promotes injected context to directives
ORCH-QA-006 Poisoned context propagated to system_maintenance with tool access

Related Bug Tickets

Bug_202, Bug_203

saikishu and others added 2 commits March 24, 2026 17:41
- 62 tests across 11 groups: ORCH-INIT, ORCH-CFG, ORCH-PROMPT, ORCH-TOOLS,
  ORCH-DELIM, ORCH-CTX, ORCH-DEL, ORCH-EVENT, ORCH-CTF, ORCH-EDGE, ORCH-QA
- 61 passing / 2 failing (ORCH-QA-001, ORCH-QA-002) — real bugs Bug_202, Bug_203
- CTF vulnerability tests confirm 3 known vulnerabilities are present:
  context injection, whitespace summary bypass, unconditional payment_confirmation
@steadhac steadhac changed the title feat: Add ORCH — OrchestratorAgent Unit Tests (Bug_202, Bug_203) feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27) Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants