Skip to content

Latest commit

 

History

History
244 lines (192 loc) · 7.51 KB

File metadata and controls

244 lines (192 loc) · 7.51 KB

Multi-Agent Orchestration

Coordinating multiple AI agents working on the same codebase.


The Challenge

When multiple AI agents work concurrently:

  • They may modify the same files
  • They don't know about each other's changes
  • Merge conflicts increase
  • Quality inconsistency across agents

Orchestration Patterns

Pattern 1: Queue-Based (Recommended)

Agents process tickets sequentially from a shared queue.

Ticket Queue: [T1] [T2] [T3] [T4] [T5]
                │
                ▼
         ┌─────────────┐
         │ Orchestrator│
         └─────────────┘
                │
    ┌───────────┴───────────┐
    ▼                       ▼
┌─────────┐           ┌─────────┐
│ Agent A │◄──────────│ Agent B │
│ Gets T1 │  Waits    │ Gets T2 │
└─────────┘  for T1   └─────────┘

Advantages:

  • No merge conflicts
  • Predictable behavior
  • Easy to debug

Disadvantages:

  • Limited parallelism
  • Slower throughput

Pattern 2: File-Locking

Agents work in parallel but lock files they're modifying.

┌─────────┐     ┌─────────────────┐     ┌─────────┐
│ Agent A │────▶│   File Locks    │◀────│ Agent B │
│ T1      │     │ auth.py: A      │     │ T2      │
│         │     │ api.py: B       │     │         │
└─────────┘     │ models.py: FREE │     └─────────┘
                └─────────────────┘

Lock Rules:

  1. Agent requests lock on files it needs
  2. If locked by another agent, wait or skip ticket
  3. Release locks on commit
  4. Timeout after 30 minutes

Advantages:

  • Higher parallelism
  • Independent tickets run concurrently

Disadvantages:

  • Lock contention on shared files
  • More complex implementation

Pattern 3: Branch Isolation

Each agent works on a separate branch, human resolves conflicts.

main ────────────────────────────────────▶
       │              │              │
       ▼              ▼              ▼
   agent/T1       agent/T2       agent/T3
       │              │              │
       └──────────────┼──────────────┘
                      ▼
              Human Review Queue
              (resolve conflicts)

Advantages:

  • Maximum parallelism
  • Agents never block each other

Disadvantages:

  • Conflict resolution burden on humans
  • Potential for significant rework

Ticket Assignment

Assignment Criteria

Factor Weight Description
Complexity High Match ticket complexity to agent capability
File overlap High Avoid assigning tickets that touch same files
Dependencies Medium Sequence dependent tickets
Priority Medium High-priority tickets assigned first
Agent load Low Balance work across agents

Assignment Algorithm

FOR each unassigned ticket T:
  1. Check dependencies — if depends on unfinished ticket, skip
  2. Estimate file overlap with in-progress tickets
  3. If overlap > threshold, skip
  4. Find agent with lowest load and matching capability
  5. Assign T to agent, lock estimated files

Escalation Handling

When an agent hits an escalation trigger:

Agent detects escalation trigger
        │
        ▼
┌───────────────────┐
│ Pause execution   │
│ Record context    │
│ Notify orchestrator│
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Orchestrator adds │
│ to Human Queue    │
│ with priority     │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Human responds    │
│ Resolution logged │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│ Agent resumes     │
│ with guidance     │
└───────────────────┘

Escalation Queue Fields

Field Description
ticket_id Original ticket reference
agent_id Which agent is blocked
trigger_type Architecture/Security/Data/etc.
context What agent was doing, what it tried
options Agent's proposed solutions
priority Based on ticket priority + wait time

Failure Handling

Agent Failure Scenarios

Scenario Response
Agent times out Restart agent, reassign ticket
Agent produces invalid code Mark ticket for human review
Agent hits rate limit Pause agent, continue others
Agent escalates repeatedly Flag ticket as complex, require human

Retry Policy

MAX_RETRIES = 3
BACKOFF = [1min, 5min, 15min]

ON failure:
  IF retries < MAX_RETRIES:
    wait BACKOFF[retries]
    retry with fresh context
  ELSE:
    move to human queue
    mark as "agent-failed"

Monitoring

Key Metrics

Metric Target Alert Threshold
Tickets in progress < 2x agent count > 3x agent count
Average completion time < 30 min > 60 min
Escalation rate < 20% > 40%
Conflict rate < 5% > 15%
Agent utilization > 70% < 40%

Orchestrator Dashboard

╔═══════════════════════════════════════════════════════╗
║                 AGENT ORCHESTRATOR                     ║
╠═══════════════════════════════════════════════════════╣
║ Agents: 3/3 active    Tickets: 12 queued, 3 in progress║
╠═══════════════════════════════════════════════════════╣
║ Agent A: T-42 (15 min) ████████░░ 80%                 ║
║ Agent B: T-39 (8 min)  ████░░░░░░ 40%                 ║
║ Agent C: ESCALATED     ⚠️ Waiting for human           ║
╠═══════════════════════════════════════════════════════╣
║ Escalation Queue: 2 pending                           ║
║ Review Queue: 5 MRs ready                             ║
╚═══════════════════════════════════════════════════════╝

Implementation Checklist

  • Ticket queue with priority and dependencies
  • Agent pool with health monitoring
  • File locking mechanism (if using Pattern 2)
  • Escalation queue with notification
  • Retry logic with backoff
  • Metrics collection
  • Dashboard for visibility

Start with Pattern 1 (Queue-Based) and evolve based on actual throughput needs.