Docker Secret Operator (DSO)

Runtime secret injection and automatic rotation for Docker Compose — as a Docker CLI Plugin

DSO is a cloud-native infrastructure platform for Docker Compose—not a CLI utility. It handles the complete secret lifecycle: initialization, rotation, audit, and recovery.

What is DSO?

DSO is a runtime secret injection daemon for Docker and Docker Compose. It solves a concrete operational problem: how to rotate secrets in containerized applications safely without exposing them to the host filesystem or Docker's metadata layers.

Key features:

Avoids disk persistence — decrypted secrets exist only in process memory and container tmpfs, never written to host storage
Supports automatic rotation — detects secret changes and refreshes containers with blue-green deployment
Multi-provider — works with Vault, AWS Secrets Manager, Azure Key Vault, or local encrypted storage
Deterministic rollback — failed rotations automatically restore the previous container state
No Swarm/Kubernetes — works with standard docker compose on any machine

Designed for: Teams running Docker Compose in development, CI/CD, and single-host production who need secrets management without adopting Kubernetes or Swarm.

Quick Start

Local Development (5 minutes)

# 1. Install DSO as a Docker plugin
curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sh

# 2. Initialize your local environment
docker dso bootstrap local

# 3. Set a secret
docker dso secret set app/db_password mypassword

# 4. Update docker-compose.yaml to use the secret
# services:
#   postgres:
#     environment:
#       DB_PASSWORD: dso://app/db_password

# 5. Deploy
docker dso compose up

Production Deployment (Systemd)

# 1. Install DSO binary
curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sudo sh

# 2. Initialize agent runtime
# Optional: Add --enable-nonroot to auto-configure current user for non-root CLI access
sudo docker dso bootstrap agent --enable-nonroot

# 3. Configure providers and settings
sudo nano /etc/dso/dso.yaml

# 4. Enable and start the agent
sudo docker dso system enable

# 5. Verify installation
docker dso doctor
docker dso status

# 6. Deploy with docker-compose.yaml
docker compose up

The Problem

Most teams running Docker Compose face a difficult choice: either hardcode secrets in compose files, store them in .env files (which risk accidental commits), or adopt Kubernetes. DSO addresses three concrete issues:

Disk Persistence: Secrets in .env files or environment variables live on disk, risking forensic recovery and accidental exposure.
Metadata Leakage: Any user with Docker socket access can run docker inspect and read all environment variables in plaintext.
Manual Rotation: Rotating a secret requires manual container recreation, coordination, and error-prone scripts.

How DSO Works

DSO injects secrets at container startup and automatically rotates them via blue-green deployment:

Secret Backend (Vault/AWS/Local)
    ↓
DSO Agent detects change
    ↓
Create new container with updated secret
    ↓
Verify health
    ↓
Atomic container swap (rename)
    ↓
Stop old container
    ↓
Rollback on failure (auto-restore)

Result:

Secrets never written to host disk
Rotation is deterministic and reversible
Rotation completes in ~30 seconds with minimal application disruption
Failed rotations automatically restore previous state

Key Capabilities

Capability	Details
Secret Injection	Inject via `dso://` (environment) or `dsofile://` (tmpfs). Resolved at container startup, not during parsing.
Automatic Rotation	Event-driven detection from secret backend. Blue-green deployment with automatic rollback on failure.
Multi-Provider	Vault, AWS Secrets Manager, Azure Key Vault, local encrypted storage, or custom plugins.
Health Verification	Native Docker health checks or custom probes. Configurable timeouts and retries.
Observability	Prometheus metrics, structured JSON logs, runtime status queries.
Crash Recovery	Persisted rotation state enables detection and recovery after agent restarts.

Architecture Overview

DSO is a single-agent daemon that monitors Docker events and secret backends, then orchestrates container rotations via blue-green deployment.

Secret Backends (Vault/AWS/Local)
                ↓
        DSO Agent Process
    ┌────────────────────┐
    │ • Event Watcher    │
    │ • Rotation Engine  │
    │ • Health Checks    │
    │ • State Tracker    │
    │ • Provider Plugins │
    └────────────────────┘
                ↓
        Docker Host Containers

Core design:

Single-agent per host — manages all containers on a Docker host
Event-driven — responds to Docker events and provider webhooks
Blue-green rotation — create new container, verify health, atomic swap
Local state — persists rotation state for crash recovery
Plugin-based providers — isolated subprocesses for each secret backend

For detailed architecture, see docs/architecture.md.

Core Concepts

Secret Resolution: When containers reference dso:// or dsofile://, DSO fetches secrets at container startup (not during parsing). Secrets are resolved from the configured provider, optionally cached, then injected via environment or tmpfs.

Event-Driven Rotation: DSO monitors Docker and provider events. When a secret changes, it's queued, debounced (5-second window), then rotation is triggered.

Blue-Green Deployment: Rotation follows a fixed sequence: create new container with updated secret → verify health → atomic container rename → stop old → rollback on any failure.

Crash Recovery: Rotation state is persisted to disk, enabling detection of incomplete rotations on restart.

For detailed workflows, see docs/architecture.md and docs/runtime.md.

Installation

Prerequisites

Requirement	Details
Docker	Any recent version (20.10+); `docker compose` required for Local Mode
OS	Linux (amd64, arm64) or macOS (amd64, arm64)
Root	Not required for Local Mode; required for Cloud Mode (systemd setup)
Go	Not required — DSO ships as prebuilt binary

Binary Installation

User install (Local Mode only):

curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sh

Global install (Cloud Mode + systemd):

curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sudo sh

Manual install:

# Download for your platform
curl -Lo dso https://github.com/docker-secret-operator/dso/releases/download/v3.4.0/dso-linux-amd64

# Verify checksum (optional but recommended)
echo "..." | sha256sum -c -

# Install to Docker plugin directory
mkdir -p ~/.docker/cli-plugins
chmod +x dso
mv dso ~/.docker/cli-plugins/

# Or install globally
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo mv dso /usr/local/lib/docker/cli-plugins/
sudo chmod +x /usr/local/lib/docker/cli-plugins/dso

Verify Installation

docker dso version
# Docker Secret Operator v3.4.0

docker dso doctor
# ┌─────────────────────────────────────────┐
# │     DSO Diagnostics Report              │
# ├─────────────────────────────────────────┤
# │ ✓ Docker socket                        │
# │   running                              │
# │ - Local vault                          │
# │   not initialized (run: dso bootstrap) │
# ...

Operational Commands

DSO provides a comprehensive set of operational commands following infrastructure platform patterns:

Bootstrap — Initialize Runtime

# Development environment (non-root)
docker dso bootstrap local

# Production agent (requires sudo + systemd)
sudo docker dso bootstrap agent

Creates directory structure, generates configuration, initializes encryption, and validates environment. Run this first after installation.

Doctor — Validate Environment

# Quick health check
docker dso doctor

# Comprehensive validation (system checks)
docker dso doctor --level full

# Machine-readable JSON output
docker dso doctor --json

Validates Docker connectivity, providers, containers, cache, system resources, and permissions. Use this to diagnose setup issues.

Status — Monitor Operations

# Single status check
docker dso status

# Real-time monitoring (refreshes every 2 seconds)
docker dso status --watch

# JSON output for scripting
docker dso status --json

Shows runtime mode, version, uptime, provider health, container status, cache metrics, rotation stats, and queue health.

Config — Manage Configuration

# View configuration file
docker dso config show

# Edit configuration in $EDITOR
docker dso config edit

# Validate configuration for errors
docker dso config validate

Manages the YAML configuration file with syntax validation and smart error reporting.

System — Agent Management

# Show systemd service status
sudo docker dso system status

# Enable and start agent service
sudo docker dso system enable

# Disable and stop agent service
sudo docker dso system disable

# Restart agent service
sudo docker dso system restart

# View agent logs (follow with -f)
docker dso system logs
docker dso system logs -f -n 50

Manages the DSO agent systemd service for production deployments.

Quick Start

Local Mode (5 minutes)

1. Initialize vault

docker dso init
# Created ~/.dso/vault.enc with AES-256-GCM encryption

2. Store a secret

docker dso secret set app/postgres_password
# Enter secret value (will be hidden): ••••••••••

3. Create docker-compose.yaml

version: '3.8'
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_PASSWORD_FILE: dsofile://app/postgres_password
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

4. Deploy with DSO

docker dso up -d

# What happens:
# 1. DSO intercepts docker compose up
# 2. Resolves dsofile://app/postgres_password from vault
# 3. Mounts tmpfs at /run/secrets/app/postgres_password
# 4. Starts postgres with POSTGRES_PASSWORD_FILE=/run/secrets/app/postgres_password
# 5. postgres reads secret from file (never exposed to docker inspect)

5. Verify

# Secret is NOT in docker inspect
docker inspect postgres | grep POSTGRES_PASSWORD
# (no output)

# But the file exists in the container
docker exec postgres cat /run/secrets/app/postgres_password
# postgres_secret_value

6. Rotate the secret

docker dso secret set app/postgres_password
# Enter new secret value: ••••••••••

# DSO detects the change and automatically:
# 1. Creates new postgres container with new secret
# 2. Waits for it to become healthy
# 3. Atomically swaps container names
# 4. Stops old container
# (All happens in ~30 seconds; stateless workloads see minimal disruption)

Cloud Mode (with Vault)

1. Install globally

curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sudo sh

2. Create configuration

sudo tee /etc/dso/dso.yaml > /dev/null <<EOF
providers:
  vault:
    addr: https://vault.example.com:8200
    auth:
      method: "token"
      token_env: "VAULT_TOKEN"
    mount_path: "secret/data"

agent:
  watch:
    polling_interval: "5m"
    debounce_window: "5s"
  
  cache:
    ttl: "1h"
    max_cache_size: "100Mi"
EOF

3. Setup systemd service

sudo VAULT_TOKEN=s.xxxxxxxxxxxxx docker dso system setup
# Downloads provider plugins
# Creates /etc/systemd/system/dso-agent.service
# Starts dso-agent daemon

4. Create docker-compose.yaml

version: '3.8'
services:
  app:
    image: myapp:latest
    environment:
      DATABASE_URL: dso://vault:database/url
      API_KEY: dso://vault:api/key

5. Deploy

docker compose up -d
# DSO agent fetches secrets from Vault
# Injects into containers
# Rotates when Vault secret changes

Configuration

dso.yaml Schema

# Define secret providers
providers:
  vault:
    addr: "https://vault.example.com:8200"
    auth:
      method: "token"  # token, approle, kubernetes, jwt
      token: "s.xxxxx" # or token_env: "VAULT_TOKEN"
    mount_path: "secret/data"  # KV v2 mount
    tls:
      insecure: false
      ca_cert: "/etc/dso/ca.crt"

  aws:
    region: "us-east-1"
    auth:
      method: "iam"  # iam, sso, static
      # IAM role attached to instance for credentials

  azure:
    vault_url: "https://my-vault.vault.azure.net"
    auth:
      method: "managed_identity"  # managed_identity, client_secret

# Agent runtime configuration
agent:
  # Secret resolution behavior
  watch:
    polling_interval: "5m"  # How often to check if secrets changed
    debounce_window: "5s"   # Batch rapid changes
  
  # Health verification
  health_check:
    timeout: "30s"           # Max time to wait for container healthy
    retries: 3
    interval: "2s"
  
  # Cache behavior
  cache:
    ttl: "1h"                # How long before re-fetching from provider
    max_cache_size: "100Mi"  # Total cache size limit
    max_secret_size: "10Mi"  # Per-secret size limit
  
  # Rotation behavior
  rotation:
    strategy: "restart"      # restart, signal, none
    timeout: "30s"           # Max time for rotation
    rollback_on_failure: true

# Defaults for all containers
defaults:
  rotation:
    strategy: "restart"

Secret Mapping in Compose

services:
  app:
    image: myapp:latest
    environment:
      # Environment variable injection (visible to docker inspect)
      DATABASE_PASSWORD: dso://vault:db/password
      
      # File injection via tmpfs (invisible to docker inspect, preferred)
      DATABASE_PASSWORD_FILE: dsofile://vault:db/password
      
      # Multiple providers
      API_KEY: dso://aws:api/key
      SIGNING_KEY: dso://vault:crypto/signing-key

Provider System

Supported Providers

Provider	Status	Authentication	Rotation Support
Local	✅ GA	File-based encryption	Yes (manual)
Vault	✅ GA	Token, AppRole, Kubernetes, JWT	Yes (lease renewal)
AWS Secrets Manager	✅ GA	IAM role, static credentials	Yes (automatic)
Azure Key Vault	✅ GA	Managed identity, client secret	Yes (automatic)
Custom	✅ Supported	Via plugin interface	Via plugin

Plugin Architecture

Custom providers implement the SecretProvider interface:

type SecretProvider interface {
    // Get secret by name
    GetSecret(ctx context.Context, secretName string) (map[string]string, error)
    
    // Watch for secret changes (webhooks)
    WatchSecret(secretName string, interval time.Duration) (<-chan SecretUpdate, error)
}

Plugins are:

Standalone binaries implementing go-plugin interface
Discovered in /usr/local/lib/dso/plugins/
Executed as isolated subprocesses
Verified via SHA256 hash before execution

Example: Create a custom provider for an internal secrets service

# Plugin binary at /usr/local/lib/dso/plugins/dso-provider-internal
# DSO automatically discovers and loads it

Secret Rotation Lifecycle

From Change Detection to Completion

Time    Event                          State
────    ─────                          ─────
T0      Secret changes at Vault        [original container running]
        ↓
        Provider webhook notifies DSO
        ↓
T1      DSO watcher receives event     [event queued]
        ↓
        Debounce window (5s)
        ↓
T6      Rotation triggered             [lock acquired, shadow creating]
        ↓
T7      Shadow container created       [health check starting]
        with new secret
        ↓
T8-25   Health verification            [waiting for "healthy" status]
        (native health check or timeout)
        ↓
T25     Health confirmed               [original running, shadow healthy]
        ↓
T26     Atomic swap:                   [brief window: both named]
        - original → original-old
        - original-new → original
        ↓
T27     Old container stopped          [only new container remains]
        ↓
T28     Rotation complete              [new container fully active]

Timeline: ~28 seconds for typical rotation

5s debounce
~15-20s health verification (depends on app startup time)
<1s for atomic swap
3s graceful stop of old container

Failure Scenarios

Scenario 1: Health check fails

Shadow container doesn't pass health check within 30s
  ↓
Timeout detected
  ↓
Rollback: Stop shadow, don't perform swap
  ↓
Original container still running
  ↓
Operator reviews logs, fixes issue, retries

Scenario 2: Atomic swap fails

docker rename original → original-old succeeds
docker rename original-new → original FAILS
  ↓
State verification detects: original is running (swap failed)
  ↓
Recovery: Restore original-new back to shadow name
           Stop original-new
  ↓
Rollback: docker rename original-old → original
  ↓
Original container restored

Scenario 3: Agent crashes mid-rotation

Rotation in progress (status: "in_progress" in state tracker)
Agent crashes
  ↓
Agent restarts
  ↓
recoverPendingRotations() runs
  ↓
Detects: rotation > 5 minutes old
  ↓
Mark as "rollback_required" in state tracker
  ↓
Operators see ERROR logs with container IDs
  ↓
Manual recovery: 
    - Inspect state with docker ps
    - Clean up shadow containers
    - Restart rotation if needed

Observability

Prometheus Metrics

DSO exposes metrics on :9090/metrics:

# Rotation metrics
dso_rotation_duration_seconds{secret="db_password",status="success"}
dso_rotation_total{secret="db_password",status="success|failure|rollback"}
dso_rotation_health_check_duration_seconds{secret="db_password"}

# Event metrics
dso_event_queue_size{} = 15
dso_event_queue_drops_total{reason="full"} = 2
dso_event_debounce_batches_total{} = 123

# Cache metrics
dso_cache_size_bytes{} = 1048576
dso_cache_entries{} = 5
dso_cache_hits_total{} = 1500
dso_cache_misses_total{} = 23

# Provider metrics
dso_provider_request_duration_seconds{provider="vault",method="GetSecret"}
dso_provider_errors_total{provider="vault",error="timeout"}

# Container metrics
dso_container_rotation_duration_seconds{container="postgres"}
dso_container_rename_failures_total{}

Logging

Logs are structured JSON for parsing:

{"level":"info","msg":"Rotation started","secret":"db_password","provider":"vault","ts":"2024-01-15T10:30:00Z"}
{"level":"info","msg":"Health check passed","container":"postgres-new","status":"healthy","duration_ms":5200}
{"level":"info","msg":"Atomic swap completed","original":"postgres","old":"postgres-old","new":"postgres"}
{"level":"error","msg":"Health check failed","container":"postgres-new","reason":"unhealthy","timeout_ms":30000}
{"level":"error","msg":"ROTATION ROLLBACK","container":"postgres","original_container":"postgres-old","reason":"health_check_failed"}

Runtime Status

Query agent status:

docker dso status
# Pending Rotations:
#   db_password (started 2m5s ago, status: in_progress)
#
# Cache:
#   Entries: 5
#   Size: 1.2 MB
#   TTL: 1h
#
# Providers:
#   vault: healthy (last_contact: 30s ago)
#   aws: healthy (last_contact: 45s ago)

Security Model

Quick summary: DSO keeps secrets out of persistent storage and Docker metadata, but inherits trust assumptions from Docker and the configured secret provider.

Key design decisions:

Secrets are decrypted only in process memory (agent or container)
No secrets written to host filesystem (except encrypted vault file)
Container secrets injected via tmpfs (invisible to docker inspect)
Agents assume Docker daemon is trusted and secure

What DSO relies on:

Docker daemon isolation (if Docker is compromised, secrets are compromised)
Configured provider security (Vault, AWS, Azure)
Host kernel process isolation
Operator trust (admins have access to secrets)

What DSO does NOT provide:

Multi-tenant isolation (all secrets in single agent process)
Operator audit logs (use provider audit: Vault, AWS CloudTrail)
Secret scanning or compliance checking
Protection against root/privileged attackers (they can inspect agent memory)
Plugin sandboxing (binary verification only)

Detailed threat model: See SECURITY.md and THREAT_MODEL.md.

Operational Considerations

Deployment Model

DSO is designed for single-agent deployments per Docker host:

One dso-agent process per machine
Manages all secrets for all containers on that machine
State is local to machine (no distributed coordination)

Not recommended for:

Multi-agent setups (concurrency issues without consensus layer)
HA deployments (state is local; failover requires manual recovery)
Kubernetes (use ExternalSecrets Operator instead)

Scaling Constraints

Constraint	Limit	Reason
Containers per agent	~500	Event queue depth (2000 events max)
Secrets per agent	~1000	Cache memory (default 100 MB limit)
Concurrent rotations	~10	File-based locking, Docker API limits
Secret size	~10 MB	Per-secret limit; configurable
Event queue depth	2000	Fixed size; drops on overflow

For larger deployments:

Run multiple agents (one per subset of containers)
Partition secrets by agent
Monitor queue drop rate (dso_event_queue_drops_total)

Failure Handling Philosophy

DSO's approach to failure:

Fail safe — Incomplete rotation rolls back automatically; original container remains running
Fail observable — All errors logged at ERROR level with context; no silent failures
Fail manual — Agent crashes during rotation require operator review (marked in state tracker)
Fail recoverable — State tracker enables detection and recovery on restart

Operator responsibility:

Monitor logs for ERROR level messages
Periodically check docker dso status for pending rotations
Clean up orphaned containers if manual recovery required
Review provider connectivity

Network Partition Behavior

If Docker daemon becomes unreachable:

Event stream disconnects
  ↓
Exponential backoff reconnect (max 30s between attempts)
  ↓
Event stream reconnects
  ↓
CRITICAL: Immediate reconcileRuntimeState()
  ├─ List containers
  ├─ Detect dual-running (e.g., postgres + postgres-old both named "postgres")
  ├─ If found: keep newest, stop older
  └─ Detect stale rotation state
     └─ Mark for manual intervention

Dual-running detection: DSO uses naming pattern (e.g., <name>-old, <name>-new) to detect incomplete rotations. On reconnection, it cleans up.

Current Scope & Limitations

DSO is optimized for single-host Docker Compose environments. It is not a replacement for Kubernetes-grade secret management.

Explicit scope:

Single Docker host per agent (no multi-host coordination)
Local state persistence (no distributed consensus)
File-based locking (scales to ~100s of secrets, not 1000s)
Manual recovery after agent crashes during rotation (marked in state tracker)
Docker socket access required (must trust Docker daemon)

Known limitations:

No HA: State is local; failover requires manual recovery
No distributed locking: Multiple agents on same host cause race conditions
No multi-tenant isolation: All secrets decrypted in single agent process
No audit logs: Use provider audit logs (Vault, AWS CloudTrail) for compliance
Reconciliation still evolving: Dual-running detection works but may need manual cleanup in edge cases
Plugin trust: Binary verification only; no plugin sandboxing

Intentional non-goals:

Kubernetes support (use ExternalSecrets Operator)
HSM integration (future, if needed)
Multi-region coordination (out of scope)
Secret scanning / compliance checking (use external tools)

When Not to Use DSO

DSO is not appropriate for:

Use Case	Better Choice	Reason
Kubernetes environments	ExternalSecrets Operator or similar	DSO is Docker Compose–only
Multi-tenant strict isolation	Vault or dedicated HSM	DSO has no multi-tenant controls
Highly regulated environments	Vault with audit trail	DSO has minimal audit capabilities
Distributed systems	Consul or etcd–based secrets	DSO assumes single host
1000s of secrets	Enterprise secret management	DSO's cache/locking assumes 100s of secrets
HA/failover required	Kubernetes or Consul	DSO requires manual recovery

If you're already using: Vault Agent, Sealed Secrets, or Doppler, DSO may duplicate your infrastructure. Evaluate if the Docker Compose–specific focus justifies replacing existing tooling.

Project Structure

dso/
├── cmd/                          # CLI entrypoints
│   ├── dso/                      # Main CLI binary
│   ├── dso-agent/                # Agent daemon (Cloud Mode)
│   └── dso-cli/                  # Operations CLI
│
├── internal/                     # Private packages (not importable by plugins)
│   ├── agent/                    # Agent lifecycle, state, trigger engine
│   │   ├── trigger.go            # Secret rotation orchestration
│   │   ├── state_tracker.go      # Persist rotation state (crash recovery)
│   │   ├── cache.go              # Secret caching with TTL
│   │   └── timeout_controller.go # Per-secret timeout isolation
│   │
│   ├── rotation/                 # Container rotation logic
│   │   ├── rolling_strategy.go   # Blue-green deployment, atomic swap
│   │   ├── health_check.go       # Health verification with retries
│   │   └── lock_manager.go       # Distributed locking (concurrency)
│   │
│   ├── watcher/                  # Event detection
│   │   └── controller.go         # Docker event stream, reconciliation
│   │
│   ├── events/                   # Event queue
│   │   ├── queue.go              # Bounded queue with backpressure
│   │   ├── deduplication.go      # Rapid event coalescing
│   │   └── backpressure.go       # Drop policy when queue full
│   │
│   ├── providers/                # Provider system
│   │   ├── manager.go            # Plugin discovery, lifecycle
│   │   ├── store.go              # Provider registry
│   │   └── supervisor.go         # Plugin restart on crash
│   │
│   └── core/                     # Shared utilities
│       ├── compose.go            # Docker Compose AST handling
│       └── encryption.go         # AES-256-GCM cipher
│
├── pkg/                          # Public packages (importable by plugins)
│   ├── api/                      # SecretProvider interface
│   ├── config/                   # dso.yaml schema
│   └── observability/            # Prometheus metrics
│
├── providers/                    # Built-in provider plugins
│   ├── vault/                    # HashiCorp Vault
│   ├── aws/                      # AWS Secrets Manager
│   ├── azure/                    # Azure Key Vault
│   └── local/                    # Local encrypted vault
│
├── test/                         # Tests
│   ├── integration/              # Docker integration tests
│   ├── unit/                     # Unit tests
│   └── testdata/                 # Test fixtures
│
├── docs/                         # Documentation
├── scripts/                      # Build, install, CI
└── ARCHITECTURE.md               # Deep architecture reference

Key Package Dependencies

Agent tier (orchestration):
  trigger.go → rotation/ + providers/ + watcher/

Rotation tier (execution):
  rolling_strategy.go → health_check.go, lock_manager.go

Provider tier (backends):
  providers/manager.go → plugins (isolated processes)

Event tier (detection):
  watcher/controller.go → events/ (queue, dedup)

Cache tier (optimization):
  agent/cache.go → (no dependencies, standalone)

Development

Local Setup

# Clone repository
git clone https://github.com/docker-secret-operator/dso.git
cd dso

# Install dependencies
go mod download

# Build locally
go build -o dso ./cmd/dso
go build -o dso-agent ./cmd/dso-agent

# Run tests with race detection
go test -race ./...

# Lint
golangci-lint run ./...

Running Tests

# Unit tests only
go test ./internal/... -v

# Integration tests (requires Docker)
go test -tags=integration ./test/integration/... -v

# Race detector (catch concurrency bugs)
go test -race ./...

# Coverage
go test -cover ./...

Key Test Patterns

State Tracker: Verifies persistence of rotation state across restarts Health Check: Tests container restart detection, timeout handling Lock Manager: Tests concurrent access, stale lock cleanup Event Queue: Tests backpressure, deduplication under load Rotation: End-to-end blue-green deployment with health checks

Contributing

Fork repository
Create feature branch: git checkout -b feature/...
Make changes; ensure tests pass: go test -race ./...
Lint: golangci-lint run ./...
Commit with clear message
Push and open PR against main

Code review criteria:

Tests for all new functionality
No race conditions (-race flag must pass)
No silent error swallowing
Backward compatible APIs
Clear log messages for operational visibility

Architecture Deep Dive

Why Event-Driven Rotation?

Alternative approaches and their tradeoffs:

Approach	Latency	Complexity	Reliability
Event-driven (DSO)	~30s	High (event queue, dedup)	High (handles failures well)
Polling only	5-10m	Low	Medium (periodic misses)
Manual (operator)	Hours	Low	Low (error-prone)
Kubernetes operators	Real-time	Very high	High (consensus built-in)

DSO chose event-driven because:

Latency: 30s is acceptable for rotation; faster than polling
Reliability: Complete recovery from network partitions
Operability: Structured events make debugging easier

Why Local State Tracking?

DSO persists rotation state to ~/.dso/state/ because:

Crash detection: Orphaned containers are detectable
Operator visibility: State query shows pending rotations
Manual recovery: Operators can inspect and retry

Tradeoff: Single-agent deployments only. Multi-agent coordination would require distributed consensus (etcd, Consul).

Why File-Based Locking?

DSO uses file-based locks (/var/lib/dso/locks/) because:

Simplicity: No external service required
Visibility: Locks are inspectable files
Stale detection: Can identify and clean up old locks

Tradeoff: Does not scale to 1000+ secrets. For large deployments, would need etcd-based locking.

Roadmap & Future Direction

Near-term (Current Release)

✅ Event-driven rotation
✅ Blue-green deployment with rollback
✅ Multi-provider support (Vault, AWS, Azure)
✅ State recovery after crashes
✅ Prometheus observability

Medium-term (Next releases)

Distributed consensus layer (etcd/Consul) for multi-agent coordination
Automated cleanup of orphaned containers
Additional providers (HashiCorp Boundary, custom HTTP backends)
Health check extensibility (custom scripts, HTTP endpoints)
Metrics-driven rotation (rotate only if changed)

Long-term (Strategic direction)

HA deployments (active-active or active-standby)
Integration with orchestrators (Docker, Podman)
Policy-based rotation (rotate all secrets every X days)
Audit trail integration (ship to external audit log)

Out of Scope

Kubernetes (use ExternalSecrets Operator)
Secrets management backend (use Vault, AWS, Azure)
Secret scanning / compliance checking
Multi-tenant isolation

FAQ

Why not Docker Secrets?

Docker Secrets requires Docker Swarm. If you're on docker compose (most teams), it's not available.

Feature	Docker Secrets	DSO
Works with `docker compose`	❌ No	✅ Yes
Requires Swarm	✅ Yes	❌ No
Automatic rotation	❌ No	✅ Yes
Local dev workflow	❌ No	✅ Yes

Use Docker Secrets if: You're already on Swarm and happy with it.
Use DSO if: You're on docker compose and need rotation.

Does DSO persist secrets to disk?

No. Secrets are:

Decrypted only in agent memory
Injected into container at runtime via tmpfs
Never written to host filesystem
Cleaned up when container stops

Only the encrypted vault file (~/.dso/vault.enc or provider backend) is persisted.

Does DSO support Kubernetes?

No. DSO is for Docker Compose. For Kubernetes, use ExternalSecrets Operator.

How fast is rotation?

~30 seconds typical:

5s debounce
15-20s app startup + health check
1s atomic swap
3s graceful stop

Depends on app startup time. Can be tuned with health_check.timeout in config.

What happens if rotation fails?

Rotation is fully rolled back:

Old container is restored (renamed back)
New container is stopped and removed
Error is logged with container IDs
Operators review logs and retry

No data loss; original container remains active.

What if the agent crashes?

State is persisted to disk. On restart:

Agent reads rotation state file
Detects in-progress rotations (older than 5 minutes)
Marks them as "rollback_required"
Logs ERROR message with container IDs
Operators clean up manually

Prevented by: automatic rollback of incomplete rotations.

Can I run multiple DSO agents?

Not recommended. Each agent assumes it owns the local Docker socket and manages all containers. Multiple agents could:

Race on container renames
Interfere with each other's rotations
Corrupt state

Recommended pattern: One agent per Docker host. For multi-host, run agent on each host independently.

How do I debug a stuck rotation?

# Check agent status
docker dso status

# Check logs
docker logs <agent_container>

# Inspect container state
docker ps -a | grep <secret_name>

# Check state file
cat ~/.dso/state/rotations.json

# For manual recovery:
docker inspect <container_id> | jq '.Name'
docker rm <orphaned_container>

How do I add a custom provider?

Implement the SecretProvider interface in pkg/api/:

type SecretProvider interface {
    GetSecret(ctx context.Context, name string) (map[string]string, error)
    WatchSecret(name string, interval time.Duration) (<-chan SecretUpdate, error)
}

Create binary dso-provider-myservice
Implement go-plugin interface
Place at /usr/local/lib/dso/plugins/dso-provider-myservice
Reference in dso.yaml: provider: myservice

See provider development guide for details.

How do I report a security issue?

Please email security@docker-secret-operator.org with:

Description of vulnerability
Steps to reproduce
Your contact information

Do not open public issues for security vulnerabilities.

Comparison to Alternatives

vs. Doppler

	DSO	Doppler
Requires SaaS	❌ No	✅ Yes
Self-hosted	✅ Yes	❌ No
Docker Compose support	✅ Yes	✅ Yes (CLI)
Automatic rotation	✅ Yes	Manual
Local development	✅ Yes	✅ Yes
Cost	Free (open source)	Paid SaaS

Use DSO if you want self-hosted, automatic rotation, zero cost.

vs. External Secrets Operator

ESO is for Kubernetes; DSO is for Docker Compose.

	DSO	ESO
Kubernetes	❌ No	✅ Yes
Docker Compose	✅ Yes	❌ No
Auto rotation	✅ Yes	✅ Yes
Multi-backend	✅ Yes	✅ Yes

Use ESO if you're on Kubernetes; use DSO if you're on Docker Compose.

vs. Vault Agent

Vault Agent templates secrets into files for consumption. DSO injects at container startup.

	DSO	Vault Agent
No setup required	✅ Local mode	❌ Requires Vault
Auto rotation	✅ Yes	✅ Yes
Works with docker compose	✅ Yes	⚠️ Via template
Operational complexity	Low	Medium

Use DSO for simplicity; use Vault Agent if already running Vault infrastructure.

Support & Community

GitHub Issues: Report bugs
Discussions: Community Q&A
Security: security@docker-secret-operator.org
Documentation: Full docs

Built by the open-source community for Docker Compose teams who need secret rotation without Swarm or Kubernetes.

Troubleshooting

Bootstrap Issues

"Docker socket not accessible"

# Verify Docker is running
docker ps

# Check socket permissions
ls -la /var/run/docker.sock

# Add current user to docker group
sudo usermod -aG docker $USER
newgrp docker

"Vault initialization failed"

# Remove failed state and retry
rm -rf ~/.dso
docker dso bootstrap local

Common Diagnostics

Check environment health:

docker dso doctor --level full

Monitor runtime status:

docker dso status --watch

View agent logs (production):

docker dso system logs -f

Validate configuration:

docker dso config validate

Agent Service Issues

Service won't start:

# Check systemd status
sudo systemctl status dso-agent

# View full service logs
sudo journalctl -u dso-agent -n 100

# Verify permissions
sudo docker dso doctor --level full

Configuration mismatch:

# Edit configuration
sudo nano /etc/dso/dso.yaml

# Validate changes
sudo docker dso config validate

# Restart service
sudo docker dso system restart

Development Modes

Local Development

Use for development and testing on a single machine. Secrets stored in encrypted local vault.

docker dso bootstrap local
docker dso status
docker dso compose up

Characteristics:

Non-root operation
Local encrypted storage
No systemd service
Quick setup (<2 minutes)

Agent Mode (Production)

Use for production deployments on single Docker hosts. Managed via systemd with automatic restarts.

sudo docker dso bootstrap agent
sudo docker dso system enable
docker dso status --watch

Characteristics:

Requires root/sudo
Systemd-managed service
Persistent state tracking
Health checks and monitoring
Automatic restart on failure

Complete Workflow Example

Step 1: Install

# Download and install binary
curl -fsSL https://raw.githubusercontent.com/docker-secret-operator/dso/main/scripts/install.sh | sh

Step 2: Initialize

# Bootstrap local environment
docker dso bootstrap local

Step 3: Validate

# Check everything is working
docker dso doctor
docker dso status

Step 4: Configure

# View current configuration
docker dso config show

# Edit if needed
docker dso config edit

# Validate changes
docker dso config validate

Step 5: Deploy

# Use secrets in docker-compose.yaml
# services:
#   database:
#     environment:
#       PASSWORD: dso://myapp/db_password

# Deploy with DSO
docker dso compose up

Step 6: Monitor

# Watch status in real-time
docker dso status --watch

# View logs
docker dso system logs -f

# Check diagnostics
docker dso doctor

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
cmd		cmd
docs		docs
examples		examples
internal		internal
pkg		pkg
plugin		plugin
scripts		scripts
test		test
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DOCUMENTATION_INDEX.md		DOCUMENTATION_INDEX.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
THREAT_MODEL.md		THREAT_MODEL.md
dso		dso
go.mod		go.mod
go.sum		go.sum
run_all_tests.sh		run_all_tests.sh

Folders and files

Latest commit

History

Repository files navigation

Docker Secret Operator (DSO)

What is DSO?

Quick Start

Local Development (5 minutes)

Production Deployment (Systemd)

The Problem

How DSO Works

Key Capabilities

Architecture Overview

Core Concepts

Installation

Prerequisites

Binary Installation

Verify Installation

Operational Commands

Bootstrap — Initialize Runtime

Doctor — Validate Environment

Status — Monitor Operations

Config — Manage Configuration

System — Agent Management

Quick Start

Local Mode (5 minutes)

Cloud Mode (with Vault)

Configuration

dso.yaml Schema

Secret Mapping in Compose

Provider System

Supported Providers

Plugin Architecture

Secret Rotation Lifecycle

From Change Detection to Completion

Failure Scenarios

Observability

Prometheus Metrics

Logging

Runtime Status

Security Model

Operational Considerations

Deployment Model

Scaling Constraints

Failure Handling Philosophy

Network Partition Behavior

Current Scope & Limitations

When Not to Use DSO

Project Structure

Key Package Dependencies

Development

Local Setup

Running Tests

Key Test Patterns

Contributing

Architecture Deep Dive

Why Event-Driven Rotation?

Why Local State Tracking?

Why File-Based Locking?

Roadmap & Future Direction

Near-term (Current Release)

Medium-term (Next releases)

Long-term (Strategic direction)

Out of Scope

FAQ

Why not Docker Secrets?

Does DSO persist secrets to disk?

Does DSO support Kubernetes?

How fast is rotation?

What happens if rotation fails?

What if the agent crashes?

Can I run multiple DSO agents?

How do I debug a stuck rotation?

How do I add a custom provider?

How do I report a security issue?

Comparison to Alternatives

vs. Doppler

vs. External Secrets Operator

vs. Vault Agent

Support & Community

Packages