feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27) by steadhac · Pull Request #461 · GenAI-Security-Project/finbot-ctf

steadhac · 2026-04-04T01:48:09Z

PR — OrchestratorAgent Unit Tests

Add a comprehensive unit test suite for the OrchestratorAgent — the LLM-powered
workflow coordinator that delegates tasks across 6 specialized agents. Tests cover
initialization, config, system/user prompts, tool definitions, delegation limits,
workflow context propagation, delegate callables, event emission, CTF vulnerabilities,
and edge cases.

Tests follow the established pattern with:

Title / Basically question / Steps / Expected Results

Bug-exposing tests included for each confirmed production defect.

📁 Test Files

tests/unit/agents/test_orchestrator.py

TestOrchestratorInit

Test ID	Title
ORCH-INIT-001	`_delegation_attempts` is an empty dict on init
ORCH-INIT-002	`_current_task_data` starts as None
ORCH-INIT-003	`_workflow_context` starts as empty list
ORCH-INIT-004	`agent_name` is `'orchestrator_agent'`
ORCH-INIT-005	`max_delegation_attempts` is 2
ORCH-INIT-006	`workflow_id` stored on init

TestOrchestratorConfig

Test ID	Title
ORCH-CFG-001	`load_config` returns `custom_goals` as None by default
ORCH-CFG-002	`max_iterations` is 15

TestOrchestratorPrompts

Test ID	Title
ORCH-PROMPT-001	System prompt contains all six agents
ORCH-PROMPT-002	System prompt has no CUSTOM GOALS block by default
ORCH-PROMPT-003	System prompt includes custom goals when set
ORCH-PROMPT-004	User prompt returns default when no task data
ORCH-PROMPT-005	User prompt includes task description
ORCH-PROMPT-006	User prompt includes context fields

TestOrchestratorTools

Test ID	Title
ORCH-TOOLS-001	`get_tool_definitions` returns exactly 6 tools
ORCH-TOOLS-002	Tool names match expected set
ORCH-TOOLS-003	`get_callables` returns 6 entries
ORCH-TOOLS-004	All callables are callable
ORCH-TOOLS-005	Communication tool has `notification_type` enum

TestDelegationLimit

Test ID	Title
ORCH-DELIM-001	First delegation call returns None
ORCH-DELIM-002	Second delegation call returns None
ORCH-DELIM-003	Third call returns failure result
ORCH-DELIM-004	Counters are per agent key
ORCH-DELIM-005	Attempt count increments correctly

TestWorkflowContext

Test ID	Title
ORCH-CTX-001	`_enrich_with_prior_context` returns original when no context
ORCH-CTX-002	`_enrich_with_prior_context` appends prior agent summaries
ORCH-CTX-003	`_capture_agent_context` stores summary
ORCH-CTX-004	`_capture_agent_context` skips empty summary
ORCH-CTX-005	Multiple contexts accumulated correctly
ORCH-CTX-006	`_enrich_with_prior_context` includes all prior contexts

TestDelegateCallables

Test ID	Title
ORCH-DEL-001	`delegate_to_onboarding` calls runner
ORCH-DEL-002	`delegate_to_invoice` calls runner
ORCH-DEL-003	`delegate_to_fraud` calls runner
ORCH-DEL-004	`delegate_to_payments` appends `next_step` on success
ORCH-DEL-005	`delegate_to_communication` passes `notification_type`
ORCH-DEL-006	Delegation cap blocks third call
ORCH-DEL-007	Delegation captures agent context
ORCH-DEL-008	`attachment_file_ids` forwarded to invoice agent
ORCH-DEL-009	`to_addresses` included when provided

TestEventEmission

Test ID	Title
ORCH-EVENT-001	`_emit_delegation_event` calls event bus
ORCH-EVENT-002	Emit includes target agent
ORCH-EVENT-003	`task_summary` truncated to 200 chars

TestCtfVulnerability

Test ID	Title
ORCH-CTF-001	Attacker-controlled summary injected verbatim into downstream prompt
ORCH-CTF-002 ⚠️	Whitespace-only summary stored in `_workflow_context`
ORCH-CTF-003 ⚠️	`payment_confirmation` next_step injected on failed payment

TestEdgeCases

Test ID	Title
ORCH-EDGE-001	`_capture_agent_context` ignores missing `task_summary` key
ORCH-EDGE-002	`_capture_agent_context` skips None `task_summary`
ORCH-EDGE-003	`_enrich_with_prior_context` appends context to empty description
ORCH-EDGE-004	Context block contains `'include all directives'` header
ORCH-EDGE-005	Empty `task_data` uses fallback description
ORCH-EDGE-006	Capped agent stays blocked indefinitely
ORCH-EDGE-008	Empty `cc_addresses` not forwarded
ORCH-EDGE-009	Empty `bcc_addresses` not forwarded
ORCH-EDGE-010	`_emit_delegation_event` handles empty result dict
ORCH-EDGE-011	Summary exactly 200 chars not truncated
ORCH-EDGE-012	`system_maintenance` routes through fraud agent
ORCH-EDGE-013	`on_task_completion` does not raise
ORCH-EDGE-014	`process` stores `task_data` before running

TestQAFindings

Test ID	Title
ORCH-QA-001 ⚠️	Whitespace-only `task_summary` should not be captured
ORCH-QA-002 ⚠️	`next_step` on failed payment misleads LLM
ORCH-QA-004	`system_maintenance` injects dangerous tool names
ORCH-QA-005	Context header promotes injected context to directives
ORCH-QA-006	Poisoned context propagated to `system_maintenance` with tool access

Related Bug Tickets

Bug_202, Bug_203

- 62 tests across 11 groups: ORCH-INIT, ORCH-CFG, ORCH-PROMPT, ORCH-TOOLS, ORCH-DELIM, ORCH-CTX, ORCH-DEL, ORCH-EVENT, ORCH-CTF, ORCH-EDGE, ORCH-QA - 61 passing / 2 failing (ORCH-QA-001, ORCH-QA-002) — real bugs Bug_202, Bug_203 - CTF vulnerability tests confirm 3 known vulnerabilities are present: context injection, whitespace summary bypass, unconditional payment_confirmation

saikishu and others added 2 commits March 24, 2026 17:41

updated about me

3cf0701

steadhac changed the title ~~feat: Add ORCH — OrchestratorAgent Unit Tests (Bug_202, Bug_203)~~ feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27) Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27)#461

feat: Add CD001 - OrchestratorAgent Unit Tests (CD001, #27)#461
steadhac wants to merge 2 commits intoGenAI-Security-Project:mainfrom
steadhac:steadhac/feat/orchestrator-coverage-tests

steadhac commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

steadhac commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR — OrchestratorAgent Unit Tests

📁 Test Files

TestOrchestratorInit

TestOrchestratorConfig

TestOrchestratorPrompts

TestOrchestratorTools

TestDelegationLimit

TestWorkflowContext

TestDelegateCallables

TestEventEmission

TestCtfVulnerability

TestEdgeCases

TestQAFindings

Related Bug Tickets

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steadhac commented Apr 4, 2026 •

edited

Loading