Navigation:
- Documentation Index - Central documentation hub
- Feature Specifications - Detailed feature specs
- Architecture Diagrams - System architecture
- State Machine - Task state transitions
- Sequence Diagrams - Runtime flow diagrams
Drover is a durable workflow orchestrator for parallel AI agent execution. It coordinates multiple Claude Code instances to complete project tasks while handling failures, dependencies, and resource contention.
Primary Workflow Engine: Drover uses DBOS (Durable Operating System for Workflows) as its primary orchestration engine. DBOS provides:
- Automatic crash recovery and checkpointing
- Built-in retry logic for failed operations
- Exactly-once execution guarantees
- Queue-based parallel execution with concurrency control
Database Configuration:
- Development: SQLite (default, zero setup)
- Production: PostgreSQL (via
DBOS_SYSTEM_DATABASE_URL)
- Durability β Never lose progress, survive any failure
- Parallelism β Maximize throughput with concurrent agents
- Correctness β Respect dependencies, avoid conflicts
- Simplicity β Minimal configuration, sensible defaults
- Observability β Clear visibility into progress and issues (OpenTelemetry integration)
- Real-time collaboration (agents work independently)
- Custom AI models (Claude Code only)
- Distributed execution (single machine, multiple workers)
- IDE integration (CLI-first)
A task is the atomic unit of work. Each task:
- Has a unique ID, title, and optional description
- Belongs to zero or one epic
- Can depend on other tasks (blocked-by relationship)
- Has a priority (higher = more urgent)
- Tracks execution attempts and errors
type Task struct {
ID string
Title string
Description string
EpicID string
Priority int
Status TaskStatus // ready, claimed, in_progress, blocked, completed, failed
Attempts int
MaxAttempts int
LastError string
}An epic groups related tasks. Epics provide:
- Logical organization
- Filtered execution (
drover run --epic X) - Progress tracking at the feature level
Tasks can declare dependencies via blocked-by relationships:
Task A ββblocked-byβββΊ Task B
ββblocked-byβββΊ Task C
Task A remains in blocked status until both B and C are completed. When the last blocker completes, A automatically transitions to ready.
A worker is a goroutine that:
- Claims a ready task (atomic operation)
- Creates an isolated git worktree
- Executes Claude Code with the task prompt
- Commits changes and merges to main
- Marks task complete and unblocks dependents
Workers are managed by a DBOS queue with concurrency limits.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLI (Cobra) β
β drover init | run | add | epic | status | resume β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DBOS Workflow Engine β
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β DroverRunWorkflowβ β ExecuteTaskWorkflow β β
β β (orchestrator) βββββΊβ (per-task, queued) β β
β βββββββββββββββββββ βββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β DBOS Queue (concurrency=workers) ββ
β β Controls parallel execution, handles backpressure ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Git Worktree 1 β β Git Worktree 2β β Git Worktree Nβ
β βββββββββββββββ β β βββββββββββββ β β βββββββββββββ β
β β Claude Code β β β βClaude Codeβ β β βClaude Codeβ β
β βββββββββββββββ β β βββββββββββββ β β βββββββββββββ β
βββββββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PostgreSQL / SQLite β
β ββββββββββββ ββββββββββββ ββββββββββββββββ ββββββββββββββ β
β β tasks β β epics β β dependencies β β dbos_state β β
β ββββββββββββ ββββββββββββ ββββββββββββββββ ββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β ββββββββ ββββββββ ββββββββ ββββββββ ββββββββββββ β
β βready βββββΊβclaimedβββββΊβin_ βββββΊβcompl-βββββΊβ unblock β β
β β β β β βprogr-β βeted β βdependentsβ β
β ββββββββ ββββββββ βess β ββββββββ ββββββββββββ β
β β² ββββ¬ββββ β β
β β β β β
β β βββββββββββββββΌββββββββββββββ β β
β β βΌ βΌ βΌ β β
β β ββββββββββ ββββββββββββ ββββββββββ β β
β β βblocked β β failed β β retry β β β
β β β β β(max triesβ β β β β
β β ββββββ¬ββββ ββββββββββββ βββββ¬βββββ β β
β β β β β β
β βββββββββββ΄βββββββββββββββββββββββββββ β β
β β² β β
β ββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Decision: Use DBOS Go as the primary workflow orchestration engine.
Rationale:
- Battle-tested durable execution with automatic crash recovery
- Built-in queues with concurrency control
- Exactly-once execution semantics via checkpointing
- SQLite support for local development (zero setup)
- PostgreSQL for production (optional upgrade)
Default Configuration:
- Development: SQLite (
.drover/drover.db) - no additional setup required - Production: PostgreSQL via
DBOS_SYSTEM_DATABASE_URLenvironment variable
Trade-offs:
- Additional dependency (mitigated by SQLite zero-config)
- Learning curve for DBOS patterns (well-documented)
- Production requires PostgreSQL (but dev works with SQLite)
Decision: Each worker operates in its own git worktree.
Rationale:
- Complete filesystem isolation between workers
- No merge conflicts during parallel execution
- Easy cleanup on task completion
- Natural audit trail via git history
Trade-offs:
- Disk space overhead (full working copy per worker)
- Worktree management complexity
- Merge conflicts deferred to completion time
Decision: Store task state in SQLite by default, with PostgreSQL for production.
Rationale:
- Atomic operations via transactions
- Consistent view across workers
- Native DBOS integration (workflow state + task state in same DB)
- Query flexibility for status and reporting
- Zero setup for local development
Configuration:
- Default: SQLite at
.drover/drover.db - Production: Set
DBOS_SYSTEM_DATABASE_URLto PostgreSQL connection string
Trade-offs:
- State not visible in filesystem (but queryable via CLI)
- Schema migrations required for changes (handled by DBOS)
Decision: One DroverRunWorkflow orchestrates all task execution.
Rationale:
- Single point of control for scheduling
- Easier to reason about concurrency
- Natural fit for DBOS durable workflow pattern
- Survives restarts cleanly
Trade-offs:
- Single point of failure (mitigated by DBOS durability)
- Potential bottleneck at high scale
- All state in one workflow context
Decision: Execute Claude Code as a subprocess, not embedded.
Rationale:
- Claude Code is a separate tool with its own runtime
- Clean separation of concerns
- Easy to swap AI backends in future
- No Node.js dependency in Go codebase
Trade-offs:
- Process spawn overhead
- Limited integration (stdout/stderr only)
- Error handling across process boundary
Tasks are claimed atomically using database transactions:
UPDATE tasks
SET status = 'claimed', claimed_at = NOW()
WHERE id = ? AND status = 'ready'Only one worker can successfully claim a task. Failed claims (0 rows affected) mean another worker got there first.
DBOS queues manage worker concurrency:
queue := dbos.NewWorkflowQueue(ctx, "drover-tasks",
dbos.QueueConcurrency(config.Workers),
)The queue ensures:
- At most N tasks execute simultaneously
- Backpressure when all workers busy
- Fair scheduling across enqueued tasks
When a task completes:
- Query all tasks blocked by this one
- For each blocked task, count remaining incomplete blockers
- If count = 0, transition to
ready
-- Find newly unblocked tasks
SELECT task_id FROM task_dependencies td
JOIN tasks t ON td.task_id = t.id
WHERE td.blocked_by = ? AND t.status = 'blocked'
GROUP BY task_id
HAVING COUNT(*) = (
SELECT COUNT(*) FROM task_dependencies
WHERE task_id = td.task_id
AND blocked_by IN (SELECT id FROM tasks WHERE status = 'completed')
)- Retry: If
attempts < max_attempts, re-enqueue - Fail: If max retries exhausted, mark
failed - Block: If failure indicates external blocker, mark
blocked
DBOS handles workflow failures automatically:
- Each step is checkpointed before execution
- On crash,
dbos.Launch()recovers all incomplete workflows - Workflows resume from last completed step
- Idempotent steps prevent duplicate side effects
If a worker crashes mid-task:
- Task remains in
claimedorin_progressstatus - Orchestrator detects stale claims (claimed_at > timeout)
- Stale tasks returned to
readypool - Another worker picks up the task
-- Epics group related tasks
CREATE TABLE epics (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
description TEXT,
status TEXT DEFAULT 'open',
created_at INTEGER NOT NULL
);
-- Tasks are the unit of work
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
description TEXT,
epic_id TEXT,
parent_id TEXT, -- For hierarchical sub-tasks
sequence_number INTEGER DEFAULT 0, -- For ordering within parent
priority INT DEFAULT 0,
status TEXT DEFAULT 'ready',
attempts INT DEFAULT 0,
max_attempts INT DEFAULT 3,
last_error TEXT,
claimed_by TEXT, -- Worker that claimed this task
claimed_at INTEGER, -- Unix timestamp
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
FOREIGN KEY (epic_id) REFERENCES epics(id),
FOREIGN KEY (parent_id) REFERENCES tasks(id) ON DELETE CASCADE
);
-- Dependencies define blocked-by relationships
CREATE TABLE task_dependencies (
task_id TEXT NOT NULL,
blocked_by TEXT NOT NULL,
PRIMARY KEY (task_id, blocked_by),
FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE,
FOREIGN KEY (blocked_by) REFERENCES tasks(id) ON DELETE CASCADE
);
-- Worktrees track git worktree lifecycle for cleanup
CREATE TABLE worktrees (
task_id TEXT PRIMARY KEY,
path TEXT NOT NULL,
branch TEXT NOT NULL,
created_at INTEGER NOT NULL,
last_used_at INTEGER NOT NULL,
status TEXT DEFAULT 'active',
disk_size INTEGER DEFAULT 0,
FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
);
-- Indexes for common queries
CREATE INDEX idx_tasks_status ON tasks(status);
CREATE INDEX idx_tasks_epic ON tasks(epic_id);
CREATE INDEX idx_tasks_parent ON tasks(parent_id);
CREATE INDEX idx_tasks_parent_seq ON tasks(parent_id, sequence_number);
CREATE INDEX idx_dependencies_blocked_by ON task_dependencies(blocked_by);
CREATE INDEX idx_worktrees_status ON worktrees(status);
-- DBOS manages its own tables for workflow state
-- dbos_workflow_status, dbos_workflow_inputs, etc.| Workers | Expected Throughput | Database Load |
|---|---|---|
| 1-4 | Low | Minimal |
| 4-8 | Medium | Light |
| 8-16 | High | Moderate |
| 16+ | Very High | Heavy (needs tuning) |
- Database connections: Each worker needs a connection
- Git operations: Worktree creation/merge are I/O heavy
- Claude Code API: Rate limits may apply
- Disk space: Each worktree is a full copy
- Connection pooling via DBOS
- Lazy worktree creation (on claim, not startup)
- Batch dependency resolution
- Incremental status updates (not full scan)
Drover provides built-in observability via OpenTelemetry:
Traces - Distributed tracing for operations:
- Workflow execution (root span)
- Task execution (child spans per task)
- Agent execution (Claude Code calls)
- Git operations (commit, merge)
- Error tracking with categorized errors
Metrics - Quantitative measurements:
- Counters: tasks claimed/completed/failed
- Histograms: task duration, agent duration
- Gauges: active workers, pending tasks
- Agent-specific: prompts sent, errors by type
Storage: ClickHouse for scalable trace/metric storage Visualization: Grafana dashboards (included)
Configuration:
export DROVER_OTEL_ENABLED=true # Enable (default: false)
export DROVER_OTEL_ENDPOINT=localhost:4317 # OTLP collectordrover.workflow.run (root)
βββ drover.task.execute
β βββ drover.agent.execute (claude-code)
β βββ dbos.step (worktree, commit, merge)
βββ drover.workflow.metrics
Drover uses OpenTelemetry semantic conventions:
| Attribute | Type | Description |
|---|---|---|
drover.task.id |
string | Task identifier |
drover.task.title |
string | Human-readable title |
drover.task.state |
string | ready/in_progress/completed/failed |
drover.worker.id |
string | Worker identifier |
drover.agent.type |
string | "claude-code" |
drover.epic.id |
string | Epic identifier |
See scripts/telemetry/ for full documentation.
Claude Code executes arbitrary code. Drover does not sandbox this execution. Users should:
- Review generated code before merging to protected branches
- Use separate credentials for Drover execution
- Run in isolated environments for sensitive projects
- Use least-privilege database users
- Enable SSL for Postgres connections
- Rotate credentials regularly
- Drover commits as configured git user
- Consider dedicated git identity for Drover
- Protect main branch with required reviews
- Web UI: Real-time dashboard with task visualization
- Distributed execution: Multiple machines coordinating
- Custom agents: Support for other AI coding tools
- Beads integration: Bidirectional sync with Beads format
- Webhooks: Notifications on task completion/failure
Metrics: Prometheus/OpenTelemetry integrationβ Implemented
- Single machine only (no distributed workers)
- Git-based projects only
- Claude Code dependency
- No real-time collaboration between agents
- Limited conflict resolution (fail-fast on merge conflicts)
Drover incorporates ideas and concepts from several innovative projects:
Beads heavily influenced Drover's task hierarchy design:
- Hierarchical Task IDs β Beads'
task-id.subtaskformat inspired Drover'stask-123.1syntax - Sub-task decomposition β Breaking complex work into sequential, ordered pieces
- Flat storage with hierarchy β Hierarchical structure without nested complexity
Drover's sub-task system directly implements these patterns while adding durable workflow orchestration.
Geoffrey Huntley articulated the "Ralph Wiggum" pattern:
- Delegating repetitive, well-defined tasks to AI agents
- Human provides direction and oversight
- Agents execute routine work at scale
- Focus on completion over perfection
This philosophy is central to Drover's design: break down projects, queue the work, and let agents drive it to completion.
DBOS provides the foundational workflow engine:
- Durable execution with automatic crash recovery
- Exactly-once semantics via checkpointing
- Built-in queues with concurrency control
- SQLite support for zero-config development
The DBOS Go SDK makes these capabilities accessible to Go applications.
Anthropic's Claude Code is the AI execution layer:
- Advanced code understanding and generation
- Tool use for file operations, testing, and more
- Stateful conversation context for complex tasks
- Human-in-the-loop oversight when needed
| Term | Definition |
|---|---|
| Task | Atomic unit of work for an AI agent |
| Epic | Logical grouping of related tasks |
| Worker | Goroutine executing tasks via Claude Code |
| Worktree | Isolated git working directory |
| Claim | Atomic acquisition of a task by a worker |
| Blocker | Task that must complete before another can start |
| Checkpoint | DBOS-persisted workflow state for recovery |
| Step | Individual operation within a workflow |