sind - Slurm in Docker

SPDX-License-Identifier: LGPL-3.0-or-later

A CLI tool for running local Slurm clusters using Docker containers, inspired by kind (Kubernetes in Docker).

Prerequisites

Linux host with cgroupv2 and nsdelegate mount option (mount -o remount,nsdelegate /sys/fs/cgroup)
Docker Engine 28.0+ (required for --security-opt writable-cgroups=true)
For clusters with 10+ nodes: fs.inotify.max_user_instances >= 1024 (default 128 is too low)

Supported Versions

Slurm 25.11
OpenMPI 5.0 (with PMIx 6.x, PRRTE 4.x, UCX 1.20)

Overview

sind creates and manages containerized Slurm clusters for development, testing, and CI/CD workflows. Each node runs as a separate Docker container with systemd as init, providing a realistic multi-node Slurm environment without requiring bare-metal infrastructure.

Operational Model

While the cluster configuration file resembles a Kubernetes manifest, sind is not a reconciling controller. The configuration is a one-shot, one-way input for cluster creation:

sind create cluster interprets the manifest once to generate the cluster (via --config FILE or piped to stdin)
sind does not continuously watch or reconcile cluster state
sind does not automatically repair drift or failures

sind provides commands for inspection (get), modification (create/delete worker), and simulation (power) but these are imperative operations, not declarative state management.

This design is intentional: sind is a development and testing tool that aids the creation of more sophisticated Slurm cluster management tooling, not a production cluster controller.

Container Startup

sind creates cluster resources in a specific order to ensure dependencies are available:

Phase 1: Global Infrastructure

Create sind-mesh network (if not exists)
Start sind-dns container (if not exists)
Create sind-ssh-config volume and generate keypair (if not exists)
Start sind-ssh container (if not exists)

Phase 2: Cluster Resources (concurrent pipelines, no barriers)

Create cluster network
Create config volume → write Slurm configuration
Create munge volume → generate and write munge key
Create data volume (if needed)

Phase 3: Node Containers

Create and start each node container in parallel
Start per-node systemd D-Bus monitor immediately after each container starts
Wait for each node to become ready, accelerated by events

There is no barrier between node creation and readiness probing — each node's goroutine creates its container, starts a systemd monitor, and begins probing in a single pipeline. This allows early-starting nodes to be probed while later nodes are still being created.

Event-Driven Readiness

sind uses two event sources to accelerate readiness detection:

Docker events — a single docker events stream watches all cluster containers for start/die events
Systemd D-Bus monitors — per-node busctl monitor --watch-bind=yes streams watch for unit state changes (e.g., sshd.service becoming active)

When an event arrives, readiness probes re-evaluate immediately instead of waiting for the next poll tick. If the event sources are unavailable, sind falls back to poll-only mode transparently.

Readiness Checks

Check	Description
Container running	Docker container in running state
systemd ready	`systemctl is-system-running` returns `running` or `degraded`
sshd listening	Port 22 accepting connections
slurmctld ready	`scontrol ping` succeeds (controller only)
slurmd ready	slurmd service active (worker only)

If any node fails to become ready within the timeout, sind create cluster fails and reports which nodes/checks failed. Partial clusters are not automatically cleaned up—use sind delete cluster to remove.

Phase 4: Mesh Registration and Slurm (concurrent)

After all nodes are ready, sind runs mesh registration (batch DNS + known_hosts) and Slurm enablement concurrently. This is safe because Slurm uses short hostnames (controller, worker-0) resolved by Docker's embedded DNS on the cluster network. The mesh DNS records (*.cluster.realm.sind) are only used for SSH relay access and host-side resolution.

Design Goals

Familiar UX for kind users
No root/admin privileges required
SELinux compatible
Support for both static and dynamic Slurm node configurations

Implementation

sind is written in Go and designed for dual use:

CLI tool - Standalone command-line interface
Go library - Embeddable package for wrapper tools and integrations

The CLI command structure is reflected in the library API, allowing programmatic access to all sind operations.

Go Dependencies

sind uses a minimal set of dependencies, following kind's approach of favoring simplicity and compatibility.

Dependency	Purpose
`github.com/spf13/cobra`	CLI framework
`sigs.k8s.io/yaml`	YAML configuration parsing
`log/slog` (stdlib)	Structured logging interface
`github.com/charmbracelet/log`	Colorized log output (slog handler)
`github.com/mattn/go-isatty`	TTY detection for interactive commands
`github.com/njayp/ophis`	MCP server framework
`github.com/spf13/afero`	Filesystem abstraction for testability
`golang.org/x/sys`	Advisory file locking (flock) for realm locks

Nodeset expansion (e.g., worker-[0-2,5] → individual hostnames) is implemented internally rather than using an external library, keeping the dependency footprint small.

Docker Interaction

sind interacts with Docker by shelling out to the docker CLI rather than using the Docker SDK for Go. This approach, proven by kind, provides:

Simpler maintenance and fewer dependencies
Wider compatibility across Docker versions
Avoids tight coupling to Docker daemon internals

sind wraps command execution in a thin abstraction layer (pkg/cmdexec) using Go's os/exec package, with proper output handling and error reporting. The executor interface is shared across pkg/docker, pkg/mesh, and pkg/cluster.

Runtime support: Docker only. Support for alternative runtimes (Podman, nerdctl) may be added later via a provider abstraction pattern.

License

This project is licensed under the GNU Lesser General Public License v3.0 or later (LGPL-3.0-or-later).

Commit Guidelines

The git history follows Conventional Commits style.

Principles

Fine-grained commits - Each commit should represent a single logical change, sized for easy comprehension when reading history
Context-free messages - Commit messages state facts about the change, not the development story; they are written for future readers of the history, not as a journal of the development process
No narrative - Avoid "I tried X, then Y, finally Z worked"; instead state what the commit does

Format

<type>(<scope>): <description>

[optional body]

[optional footer]

Types: feat, fix, docs, style, refactor, test, chore, build, ci

CLI Design Guidelines

Rules for maintaining consistency when adding new commands, flags, and output.

Command Structure

Commands follow verb-noun ordering with a two-level hierarchy:

sind <verb> <noun> [ARGS] [FLAGS]

Multi-resource verbs (create, delete, get, power) group noun subcommands
Single-purpose verbs (ssh, enter, exec, logs, doctor) stand alone
Standalone verbs are reserved for frequently-used operations that justify a short path

Argument Conventions

Pattern	Positional	Default	Examples
Cluster name	`[NAME]` or `[CLUSTER]`	`"default"`	`get cluster`, `enter`, `get nodes`
Node targets	`NODES` (required)	—	`power shutdown`, `delete worker`
Node format	`shortname.cluster`	cluster defaults to `"default"`	`worker-0.dev`, `controller`
Nodeset expansion	bracket patterns	—	`worker-[0-2].dev`
Pass-through	after `--` separator	—	`ssh NODE -- cmd`, `exec -- cmd`

Rules:

Cluster names are always positional, never flags
Node targets support nodeset expansion and comma-separated specs
Use cobra.MaximumNArgs(1) for optional cluster, cobra.MinimumNArgs(1) for required nodes

Flag Conventions

Long-form only by default; add short flags (-f) only for frequently-typed flags
Kebab-case for multi-word flags: --tmp-size, --munge-key
Boolean flags for mode switches: --all, --pull, --unmanaged
One persistent root flag: --realm (inherited by every subcommand)
One persistent root counter: -v (repeatable, controls log verbosity; inherited by every subcommand)

Output Conventions

Command type	Output	Target
List resources (`get`)	tabwriter table, uppercase headers, 3-space padding	stdout
Single value (`get munge-key`)	raw value, one line	stdout
Mutations (`create`, `delete`, `power`)	silent on success	—
Errors	structured slog at error level (always visible)	stderr
Warnings	`Warning: ...` prefix	stderr
Logs (`-v`)	structured key=value, colorized on TTYs	stderr

Rules:

Mutations are silent — exit 0 is the confirmation; use -v for progress
Errors are always visible (slog error level is always enabled, even without -v)
Command output (tables, status, doctor) is monochrome — no ANSI escapes
Log output (-v) is colorized on interactive terminals, plain when piped
Unicode checkmarks (✓/✗) only in get cluster and doctor output
All get subcommands accept --output|-o {human,json}; default is human

Logging Conventions

Logging uses pkg/log with context-based injection. Silent by default. All log lines include millisecond timestamps (HH:MM:SS.mmm) for timing analysis.

Level	Flag	What to log
Error	—	Always visible; command failures
Info	`-v`	Phase transitions: "creating cluster", "nodes ready", "slurm services enabled"
Debug	`-vv`	Individual operations: "waiting for node", "creating network", "enabling slurmd"
Trace	`-vvv`	Docker commands, probe retry attempts with error details

Rules:

Use sindlog.From(ctx) to extract the logger — never slog.Default()
In errgroup goroutines, log with gctx not the outer ctx
Log messages use lowercase, present tense: "creating network", not "Created network"
Include identifying attrs: "node", shortName, "name", netName, "service", svcName

Shell Completion

All commands that accept cluster names or node names must set ValidArgsFunction:

Cluster name commands → completeClusterNames
Node name commands → completeNodeNames
Commands with DisableFlagParsing (ssh, exec) → ValidArgsFunction with heuristics (best-effort despite cobra limitations)

When adding a get subcommand with positional arg completion for a second argument (like logs NODE SERVICE), write a dedicated completion function that switches on len(args).

New Command Checklist

When introducing a new command:

Structure: verb-noun ordering, consistent with existing hierarchy
Args: cluster as optional positional (default "default"), nodes as required positional
Flags: long-form, kebab-case, minimal short flags
Completion: add ValidArgsFunction for cluster/node args
Output: table for lists, confirmation for mutations, silence for passthrough
Logging: info for phases, debug for operations, trace for raw commands
Errors: wrap with fmt.Errorf("context: %w", err), no error prefixes
Tests: unit test with mock executor, integration test in lifecycle test
Docs: update DESIGN.md CLI Commands section, update docs/content/

Testing

Development follows Test-Driven Development (TDD) style:

Write failing test
Implement minimal code to pass
Refactor

Requirements

High unit test coverage for all packages
Integration tests for CLI commands and cluster operations
Tests run in CI for every commit

CLI Commands

Cluster Management

sind create cluster [NAME] [--config FILE] [--data PATH] [--pull]
sind delete cluster [NAME]
sind delete cluster --all
sind get cluster [NAME]
sind get clusters
sind get node NODE[.CLUSTER]
sind get nodes [CLUSTER]
sind get networks
sind get realms
sind get volumes
sind get mesh
sind get dns
sind get ssh-config
sind get ssh-private-key
sind get ssh-public-key
sind get ssh-known-hosts
sind get munge-key [CLUSTER]

All get subcommands accept --output|-o {human,json}. The default is human (tabular text); json emits a machine-readable document.

NAME/CLUSTER defaults to default if omitted.

sind create cluster validates the environment before creating, warning or failing if conflicting resources (containers, networks, volumes with matching names) already exist.

sind delete cluster is idempotent and robust:

Deleting a non-existent cluster is not an error
Handles partial/broken clusters (e.g., failed creation)
Removes all matching Docker resources regardless of state
Updates ~/.local/state/sind/<realm>/known_hosts (or $XDG_STATE_HOME/sind/<realm>/known_hosts) to remove deleted nodes
Order: stops/removes containers → disconnects/removes networks → removes volumes

Example output:

$ sind get clusters
NAME      NODES (S/C/W)   SLURM   STATUS
default   4 (1/1/2)       25.11   running
dev       3 (0/1/2)       25.11   running

NODES column shows total count and breakdown: Submitter / Controller / Worker.

$ sind get nodes dev
CONTAINER            ROLE         FQDN                         IP           STATUS
sind-dev-controller  controller   controller.dev.sind.sind     172.19.0.2   running
sind-dev-worker-0    worker       worker-0.dev.sind.sind       172.19.0.3   running
sind-dev-worker-1    worker       worker-1.dev.sind.sind       172.19.0.4   running

Without a cluster argument, sind get nodes lists every node in the realm and adds a CLUSTER column. Rows are sorted by (cluster, role, natural-name) so worker-2 precedes worker-10:

$ sind get nodes
CONTAINER                CLUSTER   ROLE         FQDN                             IP           STATUS
sind-default-controller  default   controller   controller.default.sind.sind     172.19.0.2   running
sind-default-worker-0    default   worker       worker-0.default.sind.sind       172.19.0.3   running
sind-dev-controller      dev       controller   controller.dev.sind.sind         172.20.0.2   running
sind-dev-worker-2        dev       worker       worker-2.dev.sind.sind           172.20.0.3   running
sind-dev-worker-10       dev       worker       worker-10.dev.sind.sind          172.20.0.4   running

Cluster Diagnostics

sind get cluster [NAME] displays detailed health information for a cluster:

$ sind get cluster dev
CLUSTER   SLURM     STATUS (R/S/P/T)
dev       25.11.4   running (3/0/0/3)

NETWORKS
NAME             DRIVER   SUBNET           GATEWAY        STATUS
sind-mesh        bridge   172.18.0.0/16    172.18.0.1     ✓
sind-dev-net     bridge   172.19.0.0/16    172.19.0.1     ✓

MESH SERVICES
NAME   CONTAINER   STATUS
dns    sind-dns    ✓

MOUNTS
MOUNT        SOURCE                    TYPE       STATUS
/etc/slurm   sind-dev-config           volume     ✓
/etc/munge   sind-dev-munge            volume     ✓
/data        /home/user/project        hostPath   ✓

NODES
NAME              ROLE        IP            STATUS    SERVICES
controller.dev    controller  172.19.0.2    running   munge ✓ slurmctld ✓ sshd ✓
worker-0.dev      worker      172.19.0.3    running   munge ✓ slurmd ✓ sshd ✓
worker-1.dev      worker      172.19.0.4    running   munge ✓ slurmd ✗ sshd ✓

sind get node NODE[.CLUSTER] shows detailed health for a single node. NODE uses the format shortName or shortName.cluster (defaults to cluster "default"). Passing a full DNS FQDN ending in .sind is rejected — use the bare short name or the NODE.CLUSTER form:

$ sind get node controller.dev
CONTAINER            ROLE         FQDN                         IP           STATUS
sind-dev-controller  controller   controller.dev.sind.sind     172.19.0.2   running

SERVICES
NAME        STATUS
munge       ✓
sshd        ✓
slurmctld   ✓

Host Diagnostics

sind doctor validates host prerequisites for running sind:

sind doctor                              # check Docker version, cgroupv2, DNS policy

Checks the Docker Engine version, that cgroupv2 is mounted with nsdelegate, and that polkit allows host DNS resolution via systemd-resolved. Exits non-zero if any required prerequisite fails.

Node Access

sind ssh [SSH_OPTIONS] NODE [-- COMMAND]  # SSH into a specific node (passthrough)
sind enter [CLUSTER]                      # Interactive shell on submitter/controller
sind exec [CLUSTER] -- <cmd>              # One-shot command on submitter/controller

NODE uses DNS-style naming (see Node Arguments). CLUSTER defaults to default.

sind ssh passes all options and arguments through to the underlying SSH command. See the SSH section for details.

Worker Lifecycle

sind create worker [CLUSTER] [FLAGS]    # add worker nodes
sind delete worker NODES               # remove worker nodes from cluster

create worker flags:

Flag	Default	Description
`--count N`	1	Number of nodes to add
`--image IMAGE`	cluster default	Container image
`--cpus N`	cluster default (1)	CPU limit per node
`--memory SIZE`	cluster default (512m)	Memory limit
`--tmp-size SIZE`	256m	/tmp tmpfs size
`--unmanaged`	false	Don't start slurmd, don't add to slurm.conf
`--pull`	false	Pull images before creating containers
`--cap-add CAP`	none	Add Linux capability (repeatable; e.g. `SYS_ADMIN`)
`--cap-drop CAP`	none	Drop Linux capability (repeatable)
`--device PATH`	none	Expose host device (repeatable; e.g. `/dev/fuse`)
`--security-opt OPT`	none	Security option (repeatable)

Examples:

sind create worker                           # 1 managed node with cluster defaults
sind create worker --count 3                 # 3 managed nodes
sind create worker --count 2 --unmanaged     # 2 unmanaged nodes (slurmd not started)
sind create worker --cpus 2 --memory 1g      # 1 managed node with resource limits
sind create worker dev --count 2             # 2 managed nodes in dev cluster

Managed node workflow:

By default (without --unmanaged), sind:

Verifies sind-nodes.conf exists in /etc/slurm (fails if not present)
Creates the worker container(s)
Appends node definition(s) to sind-nodes.conf
Reconfigures slurmctld (scontrol reconfigure)
Starts slurmd on the new node(s)

Managed nodes require the sind-generated Slurm configuration (see Generated Configuration). If sind-nodes.conf is missing (e.g., user replaced the config), the command fails with an error. Use --unmanaged to add nodes without modifying Slurm configuration.

delete worker deletes containers entirely. Works with both managed and unmanaged nodes. For managed nodes, sind removes them from sind-nodes.conf and reconfigures slurmctld before deleting the container.

Power Control

sind power shutdown NODES               # graceful shutdown
sind power cut NODES                    # hard power off
sind power on NODES                     # power on
sind power reboot NODES                 # graceful cycle (shutdown + on)
sind power cycle NODES                  # hard cycle (cut + on)
sind power freeze NODES                 # simulate unresponsive node
sind power unfreeze NODES               # resume frozen node

Command	Implementation
shutdown	`docker stop` (SIGTERM, then SIGKILL)
cut	`docker kill` (immediate SIGKILL)
on	`docker start`
reboot	`docker stop` + `docker start`
cycle	`docker kill` + `docker start`
freeze	`docker pause` (cgroup freezer)
unfreeze	`docker unpause`

Freeze/unfreeze uses Docker's cgroup freezer to suspend all processes. The container remains "running" but is completely unresponsive, simulating a hung or unreachable node.

Logs

sind logs NODE [--follow]              # container logs (stdout/stderr)
sind logs NODE SERVICE [--follow]      # journalctl for specific service

Examples:

sind logs controller --follow          # tail container logs
sind logs controller slurmctld         # slurmctld journal logs
sind logs worker-0 slurmd --follow    # follow slurmd logs

Utilities

sind version [--json]                  # print version information
sind doctor                            # check host prerequisites
sind get realms                        # list active realms
sind get munge-key [CLUSTER]           # output munge key (base64)
sind get ssh-config                    # show SSH config path for Include
sind get mesh                          # show mesh infrastructure info
sind get dns                           # list mesh DNS records
sind get ssh-private-key               # output SSH private key
sind get ssh-public-key                # output SSH public key
sind get ssh-known-hosts               # output SSH known_hosts

sind version prints version, commit, Go version, and platform. For release builds the output is sind <version> (<commit>). For dev builds git describe --tags --always --dirty is used as the version, embedding tag distance and commit hash directly: sind 0.5.0-3-gabc1234-dirty. The --json flag outputs all fields as JSON.

sind get munge-key outputs the cluster's munge key encoded as base64, suitable for injection into external management tooling.

sind get ssh-config outputs the path to the SSH config file for the current realm. Add it as an Include in ~/.ssh/config to enable direct SSH access to nodes.

sind get mesh shows mesh infrastructure info: network name, DNS container/IP/zone/image, SSH container/volume/image. Useful for external consumers that need to connect to sind networks.

sind get ssh-private-key, sind get ssh-public-key, and sind get ssh-known-hosts dump SSH credentials to stdout. This replaces the need to extract files from Docker volumes.

Node Arguments

Commands accepting node arguments use DNS-style names with optional nodeset expansion.

Format

<role>.<cluster>
<role>-<N>.<cluster>

The cluster suffix defaults to .default if omitted.

Nodeset Notation

Nodeset notation (as used in Slurm, pdsh, ClusterShell) is supported for specifying multiple nodes:

Pattern	Expansion
`worker-[0-3]`	worker-0, worker-1, worker-2, worker-3
`worker-[0,2,4]`	worker-0, worker-2, worker-4
`worker-[0-2,5]`	worker-0, worker-1, worker-2, worker-5
`worker-[0-1].dev`	worker-0.dev, worker-1.dev

Multiple nodesets can be comma-separated:

sind power shutdown controller,worker-[0-3]
sind power cycle worker-[0-1].dev,worker-[0-3].default

Examples

sind power shutdown controller                    # controller.default
sind power cycle worker-0                        # worker-0.default
sind power freeze worker-[0-3].dev               # 4 nodes in dev cluster
sind power reboot controller,worker-[0-1]        # multiple nodes in default

Configuration Schema

Minimal Configuration

The simplest valid configuration creates a minimal cluster with 1 controller and 1 worker node using the generic sind-node image:

kind: Cluster

This is equivalent to:

kind: Cluster
name: default
defaults:
  image: ghcr.io/gsi-hpc/sind-node:latest
nodes:
  - role: controller
  - role: worker

When defaults.image is omitted, sind uses the generic image ghcr.io/gsi-hpc/sind-node:latest.

Shorthand Node Syntax

Nodes can be specified in short form when only role (and optionally count) are needed:

nodes:
  - controller                           # just the role
  - submitter                            # optional roles work too
  - worker: 3                           # role with count

This is equivalent to:

nodes:
  - role: controller
  - role: submitter
  - role: worker
    count: 3

The shorthand and full forms can be mixed in the same configuration.

Full Configuration Example

kind: Cluster
name: test-cluster                       # default: "default"
realm: sind                              # default: "sind"

defaults:
  image: ghcr.io/gsi-hpc/sind-node:25.11.2  # default: sind-node:latest
  tmpSize: 256m                          # per-node /tmp tmpfs size
  cpus: 1                                # container CPU limit
  memory: 512m                           # container memory limit

storage:
  dataStorage:
    type: volume                         # volume | hostPath
    hostPath: ./data                     # only if type=hostPath
    mountPath: /data                     # default: /data

slurm:
  main: |                                # appended to slurm.conf
    SelectType=select/cons_tres
    SelectTypeParameters=CR_Core_Memory
  cgroup: |                              # appended to cgroup.conf
    ConstrainCores=yes

nodes:
  - role: controller
    tmpSize: 512m                        # override default
    cpus: 1
    memory: 1g

  - role: submitter                      # optional, at most one

  - role: worker
    count: 3                             # default: 1
    cpus: 2
    memory: 1g

  - role: worker
    count: 2
    managed: false                       # slurmd not started, not in slurm.conf

Slurm Configuration Sections

The slurm key contains named sections that map to Slurm config files. Each section supports two forms:

String: content appended directly to the config file
Map: each key creates a fragment in a .conf.d/ directory, included via explicit include directives per fragment file

Section	Config file	sind generates defaults
`main`	`slurm.conf`	yes
`cgroup`	`cgroup.conf`	yes
`gres`	`gres.conf`	no
`topology`	`topology.conf`	no
`plugstack`	`plugstack.conf`	yes (always scaffolded)

String form — content appended to the config file:

slurm:
  main: |
    SelectType=select/cons_tres
    SelectTypeParameters=CR_Core_Memory
  cgroup: |
    ConstrainCores=yes

Map form — named fragments in a .conf.d/ directory:

slurm:
  main:
    scheduling: |
      SchedulerType=sched/backfill
      SchedulerParameters=bf_continue
    resources: |
      SelectType=select/cons_tres

This produces:

/etc/slurm/
├── slurm.conf              # sind defaults + explicit includes per fragment
├── slurm.conf.d/
│   ├── resources.conf
│   └── scheduling.conf
├── sind-nodes.conf
├── cgroup.conf
├── plugstack.conf          # always: include plugstack.conf.d/*
└── plugstack.conf.d/

plugstack.conf is always created with an include plugstack.conf.d/* directive, and PlugStackConfig is always set in slurm.conf. This allows SPANK plugins to be dropped in without additional configuration.

Standalone sections (gres, topology) are only created when configured. They require enabling in slurm.conf (e.g., GresTypes=gpu, TopologyPlugin=topology/tree) via the main section.

Validation rules:

Fragment names must be plain filenames (no path separators)
Fragment names and content must not be empty

Node Roles

Role	Count	Required	Slurm Daemons	Description
`controller`	exactly 1	yes	slurmctld	Cluster controller
`submitter`	0-1	no	none (clients only)	Job submission node
`worker`	1+	yes	slurmd	Worker nodes

Node Parameters

Parameter	Scope	Default	Description
`image`	global + per-node	`ghcr.io/gsi-hpc/sind-node:latest`	Container image
`tmpSize`	global + per-node	`256m`	tmpfs size for /tmp
`cpus`	global + per-node	`1`	CPU limit
`memory`	global + per-node	`512m`	Memory limit
`capAdd`	global + per-node	none	Extra Linux capabilities (e.g. `SYS_ADMIN`)
`capDrop`	global + per-node	none	Dropped Linux capabilities
`devices`	global + per-node	none	Host devices to expose (e.g. `/dev/fuse`)
`securityOpt`	global + per-node	none	Extra security options
`count`	worker only	`1`	Number of worker nodes
`managed`	worker only	`true`	Start slurmd and add to slurm.conf

Per-node scalar values override the defaults section. List fields (capAdd, capDrop, devices, securityOpt) are merged with defaults rather than replacing them.

Validation Rules

nodes - optional; if omitted, creates 1 controller + 1 worker
role: controller - exactly one (auto-created if nodes omitted)
role: submitter - at most one
role: worker - at least one (auto-created if nodes omitted)
count - only valid for worker role

Docker Resources

Per-Cluster Resources

Type	Name Pattern	Example (`sind create cluster dev`)
Network	`<realm>-<cluster>-net`	`sind-dev-net`
Controller	`<realm>-<cluster>-controller`	`sind-dev-controller`
Submitter	`<realm>-<cluster>-submitter`	`sind-dev-submitter`
Worker	`<realm>-<cluster>-worker-<N>`	`sind-dev-worker-0`
Config volume	`<realm>-<cluster>-config`	`sind-dev-config`
Munge volume	`<realm>-<cluster>-munge`	`sind-dev-munge`
Data volume	`<realm>-<cluster>-data`	`sind-dev-data`

Global Resources (Mesh)

Type	Name Pattern	Example
Mesh network	`<realm>-mesh`	`sind-mesh`
DNS container	`<realm>-dns`	`sind-dns`
SSH container	`<realm>-ssh`	`sind-ssh`
SSH volume	`<realm>-ssh-config`	`sind-ssh-config`

Defaults

The default realm is sind and the default cluster name is default, resulting in prefixes like sind-default-*.

Volume Mounts

Volume	Mount Point	Controller	Worker	Submitter
`sind-<cluster>-config`	`/etc/slurm`	rw	ro	ro
`sind-<cluster>-munge`	`/etc/munge`	ro	ro	ro
`sind-<cluster>-data`	`/data`	rw	rw	rw
tmpfs	`/tmp`	per-node	per-node	per-node

Mount Options

SELinux relabeling (:z) is not used because containers run with --security-opt label=disable. This avoids expensive recursive relabeling of bind-mounted host directories.

Container mount flags:

-v sind-<cluster>-config:/etc/slurm:rw     # controller
-v sind-<cluster>-config:/etc/slurm:ro     # all others
-v sind-<cluster>-munge:/etc/munge:ro      # all nodes
-v sind-<cluster>-data:/data:rw            # all nodes
--tmpfs /tmp:rw,nosuid,nodev,size=1g       # configurable size
--tmpfs /run:exec,mode=755                 # systemd runtime
--tmpfs /run/lock                          # systemd lock files

Data Mount

By default, sind create cluster bind-mounts the current working directory as /data on all nodes:

-v /absolute/path/to/cwd:/data:rw

The --data flag controls the mount source:

--data . (default) — bind-mount the current working directory
--data /path — bind-mount a specific host directory
--data volume — use a Docker-managed volume (sind-<cluster>-data)

When a YAML config specifies storage.dataStorage, the config takes precedence over --data.

The resolved host path is stored on each container as the sind.data.hostpath label so that dynamically added workers (sind create worker) inherit the same mount.

Container Labels

sind applies labels to containers for filtering and metadata:

Label	Example	Description
`sind.realm`	`sind`	Realm namespace
`sind.cluster`	`dev`	Cluster name
`sind.role`	`worker`	Node role
`sind.slurm.version`	`25.11.4`	Slurm version
`sind.data.hostpath`	`/home/user/project`	Resolved data mount host path

Enter and Exec

sind enter and sind exec run commands directly inside the target container via docker exec with the working directory set to /data. This means commands operate on the shared data mount.

sind ssh continues to use the SSH relay container for full SSH access (port forwarding, etc.).

Networking

Cluster Network

Each cluster has an isolated Docker bridge network:

Name: sind-<cluster>-net
Nodes can reach each other by container hostname

Mesh Network

All clusters automatically join a shared mesh network for cross-cluster communication:

Event	Result
First cluster created	Creates `sind-mesh` network, starts `sind-dns`
Subsequent clusters	Connects cluster nodes to `sind-mesh`, updates DNS
Cluster deleted	Disconnects cluster nodes, updates DNS
Last cluster deleted	Removes `sind-dns` and `sind-mesh` network

DNS

The sind-dns container (CoreDNS) provides name resolution across meshed clusters using a realm-aware zone:

<realm>.sind:53

Records follow the pattern:

<role>.<cluster>.<realm>.sind → container IP

Nodes are configured with:

--dns <sind-dns-ip>
--dns-search <cluster>.<realm>.sind

The DNS container is lightweight and does not run systemd/sshd.

SSH

The sind-ssh container provides SSH access to all cluster nodes. It is a lightweight container (no systemd) that runs on the mesh network.

Global SSH Resources

Resource	Purpose
`sind-ssh` container	SSH client for accessing nodes
`sind-ssh-config` volume	SSH keypair and known_hosts

The sind-ssh-config volume contains:

File	Description
`id_ed25519`	Private key (generated on first cluster creation)
`id_ed25519.pub`	Public key (injected into node images)
`known_hosts`	Host keys of all nodes (updated dynamically)

Lifecycle

Event	Result
First cluster created	Creates `sind-ssh-config` volume, generates keypair, starts `sind-ssh` container
Node created	Collects sshd host key, appends to `known_hosts`
Node deleted	Removes entry from `known_hosts`
Last cluster deleted	Removes `sind-ssh` container and `sind-ssh-config` volume

Host Key Collection

When sind creates a node, it waits for sshd to start, then collects the host key:

docker exec <node> cat /etc/ssh/ssh_host_ed25519_key.pub

The key is added to known_hosts with the node's DNS name:

controller.dev.sind.sind ssh-ed25519 AAAA...
worker-0.dev.sind.sind ssh-ed25519 AAAA...

Public Key Injection

The public key from sind-ssh-config is injected into nodes via:

docker exec <node> mkdir -p /root/.ssh
docker exec <node> sh -c 'cat >> /root/.ssh/authorized_keys' < pubkey

This happens after container start, before host key collection.

User Access

sind only configures SSH access for the root user. Additional user management (creating users, distributing SSH keys, configuring sudo, etc.) is left to the user.

sind ssh Implementation

sind ssh executes SSH via the sind-ssh container:

sind ssh [SSH_OPTIONS] NODE [-- COMMAND [ARGS...]]

Internally:

docker exec -it sind-ssh ssh [SSH_OPTIONS] <node>.<realm>.sind [COMMAND [ARGS...]]

All SSH options and arguments are passed through verbatim. Examples:

sind ssh worker-0                           # interactive shell
sind ssh worker-0.dev                       # node in dev cluster
sind ssh -v worker-0                        # verbose SSH
sind ssh worker-0 -- hostname               # run command
sind ssh -t worker-0 -- top                 # force TTY allocation
sind ssh -L 8080:localhost:80 controller     # port forwarding

User SSH Client Integration

sind exports SSH configuration per realm to $XDG_STATE_HOME/sind/<realm>/ (defaulting to ~/.local/state/sind/<realm>/) for integration with the user's SSH client:

File	Description
`ssh_config`	SSH config snippet
`id_ed25519`	Private key (copy from volume)
`known_hosts`	Host keys (copy from volume)

The generated ssh_config (for default realm sind):

CanonicalizeHostname yes
CanonicalDomains default.sind.sind sind.sind
CanonicalizeMaxDots 2

Host *.sind.sind
    ProxyCommand docker exec -i sind-ssh bash -c 'exec 3<>/dev/tcp/%h/22; cat <&3 & cat >&3; kill $!'
    IdentityFile ~/.local/state/sind/sind/id_ed25519
    UserKnownHostsFile ~/.local/state/sind/sind/known_hosts
    User root
    StrictHostKeyChecking yes

The Canonicalize* directives enable short-name resolution for the default realm: ssh controller expands to controller.default.sind.sind, and ssh controller.dev expands to controller.dev.sind.sind. For custom realms, the CanonicalDomains list reflects that realm's clusters.

To find the path for a realm, use sind get ssh-config. Add to the top of ~/.ssh/config (before any Host or Match blocks) for a single realm:

Include ~/.local/state/sind/sind/ssh_config

Or include all realms at once using a wildcard (supported by OpenSSH's Include):

Include ~/.local/state/sind/*/ssh_config

This allows direct use of standard SSH tools:

ssh controller.default.sind.sind
ssh worker-0.dev.sind.sind hostname
scp file.txt controller.dev.sind.sind:/tmp/

sind updates these files automatically when clusters or nodes are created/deleted. When the last cluster in a realm is deleted, the files and realm directory are removed.

Command Routing

Interactive sessions are routed based on cluster configuration:

Command	Target Node
`sind ssh <node>`	explicit node
`sind enter [cluster]`	submitter (if exists) → controller
`sind exec [cluster] -- <cmd>`	submitter (if exists) → controller

sind enter

Opens an interactive shell on the submitter (or controller if no submitter configured). Equivalent to sind ssh submitter or sind ssh controller.

sind exec

One-shot command execution. Equivalent to sind ssh <target> -- <cmd>.

Container Images

Generic Image

sind provides an generic multi-role image that works for all node types:

ghcr.io/gsi-hpc/sind-node:latest
ghcr.io/gsi-hpc/sind-node:<slurm-version>

This is the default image when defaults.image is not specified in the cluster configuration.

The generic image:

Based on Rocky Linux 10
Builds Slurm, OpenMPI, PMIx, PRRTE, and UCX from source
Contains all Slurm daemons (slurmctld, slurmd) and a full MPI stack
Slurm is built with --with-pmix for native PMIx job launch support
sind enables the appropriate services based on node role

The Dockerfile uses a multi-stage build with a shared builder-base stage. UCX and PMIx build in parallel, PRRTE and Slurm depend on PMIx, and OpenMPI depends on all three. Component versions are pinned as ARG defaults in the Dockerfile and mirrored in docker-bake.hcl.

Custom Images

Custom images must provide:

All roles:

systemd as init (PID 1)
sshd service (enabled, sind injects authorized_keys at runtime)
munge service (enabled)
Slurm client tools (srun, sbatch, squeue, etc.)

Per-role requirements:

Role	Additional Requirements
controller	slurmctld (installed, not enabled)
worker	slurmd (installed, not enabled)
submitter	Slurm client tools only

sind enables Slurm services at container start based on the node's role. Services should be installed but not enabled in the image.

Example Dockerfiles are provided in the images/ directory.

Generated Configuration

Munge

During sind create cluster, before starting any containers, sind generates a random munge key and writes it to the sind-<cluster>-munge volume. This ensures all nodes share the same key from first boot.

Slurm Configuration

sind auto-generates a minimal Slurm configuration based on cluster topology and writes it to the sind-<cluster>-config volume.

Multi-file Configuration

sind generates a multi-file configuration structure:

/etc/slurm/
├── slurm.conf              # main config
├── sind-nodes.conf         # sind-managed node definitions
├── cgroup.conf             # cgroupv2 configuration
├── plugstack.conf          # SPANK plugin config (always created)
├── plugstack.conf.d/       # SPANK plugin fragments (always created)
├── slurm.conf.d/           # main config fragments (if slurm.main is a map)
├── cgroup.conf.d/          # cgroup fragments (if slurm.cgroup is a map)
├── gres.conf               # generic resources (if slurm.gres is set)
└── topology.conf           # network topology (if slurm.topology is set)

The main slurm.conf always contains:

include /etc/slurm/sind-nodes.conf
PlugStackConfig=/etc/slurm/plugstack.conf

sind-nodes.conf

This file contains node and partition definitions for sind-managed nodes. sind assumes exclusive ownership of this file:

sind create cluster generates initial node definitions here
sind create worker appends new nodes (unless --unmanaged)
sind delete worker removes nodes (for managed nodes)

Users should not edit sind-nodes.conf directly. To add custom node definitions, create a separate file and add an include directive to slurm.conf.

Nodes with managed: false in the cluster config are excluded from sind-nodes.conf.

cgroup.conf

sind generates a cgroup.conf for cgroupv2 support on worker nodes. This enables resource isolation and accounting for jobs.

User Customization

sind delivers a working starter configuration. The slurm config key allows extending it declaratively at creation time (see Slurm Configuration Sections above). For post-creation changes, the /etc/slurm volume is writable on the controller node.

Users may:

Use slurm.main, slurm.cgroup, etc. to extend config at creation time
Edit config files directly after creation (sind does not modify them after creation)
Add additional include files for custom configuration
Replace the entire configuration (but sind create worker will then fail for managed nodes)

Slurm Version Discovery

sind does not manage Slurm versions directly—the version is implicit in the chosen container images. However, sind discovers the Slurm version before cluster creation to:

Generate version-appropriate configuration (slurm.conf)
Display version information in CLI output
Store version metadata on containers and volumes

Discovery Method

Before creating any cluster resources, sind runs an ephemeral container to discover the Slurm version:

docker run --rm <image> scontrol --version
# Output: "slurm 25.11.0"

This happens once per unique image in the cluster configuration. The discovered version is then stored as labels on cluster resources:

--label sind.slurm.version=25.11.0

Version Consistency

When the cluster configuration specifies multiple images (e.g., different images per role), sind discovers the version from each unique image. If images report different Slurm versions, sind logs a warning but continues with cluster creation. The controller image's version is used for configuration generation.

Mismatched Slurm versions can cause subtle runtime issues, but users may have legitimate reasons for mixed versions (e.g., testing rolling upgrades).

Config Adaptation

sind maintains awareness of version-specific configuration changes and generates compatible slurm.conf. This includes handling deprecated parameters and new required parameters across Slurm versions.

DNS Naming Convention

The mesh DNS uses a realm-aware hierarchical namespace:

<role>.<cluster>.<realm>.sind
<role>-<N>.<cluster>.<realm>.sind

The hierarchy is: node . cluster . realm . sind

Each realm gets its own CoreDNS zone (<realm>.sind), and nodes within a cluster are configured with --dns-search <cluster>.<realm>.sind so short names resolve within the cluster.

Examples (default realm sind):

controller.default.sind.sind
submitter.default.sind.sind
worker-0.default.sind.sind
worker-1.default.sind.sind
controller.dev.sind.sind

Examples (custom realm ci-42):

controller.default.ci-42.sind
worker-0.dev.ci-42.sind

Within a cluster, short names resolve via the search domain: a node in the dev cluster of realm sind can reach controller without the full controller.dev.sind.sind.

Realm Advisory Locking

Mutating operations acquire a per-realm advisory lock (flock) to prevent concurrent modifications to shared realm state. The lock file is stored at:

$XDG_STATE_HOME/sind/<realm>/lock    # default: ~/.local/state/sind/<realm>/lock

Protected operations

sind create cluster
sind delete cluster (single and --all)
sind create worker
sind delete worker

Read-only operations (get, logs, ssh, etc.) do not acquire the lock.

Behavior

Lock is attempted non-blocking first; if free, the operation proceeds immediately
If another operation holds the lock, sind logs "waiting for another operation to complete" (info level) and blocks until the lock is released
Lock is released when the operation completes (success or failure)
Context cancellation (e.g., Ctrl+C) unblocks a waiting operation

Realm independence

Locks are per-realm. Operations in different realms run concurrently without contention. This makes realm-based CI isolation safe for parallel jobs.

Future Features

Cluster Lifecycle Commands

Planned commands for suspending and resuming clusters without destroying them:

sind stop cluster [NAME]               # stop all containers, preserve volumes
sind start cluster [NAME]              # start previously stopped cluster

stop cluster:

Stops all node containers (docker stop)
Preserves all volumes (config, munge, data)
Preserves network configuration
Cluster appears as "stopped" in sind get clusters

start cluster:

Starts previously stopped containers
Nodes rejoin mesh network
DNS records restored
Slurm daemons resume normal operation

This enables resource conservation when clusters are not actively in use without losing cluster state or configuration.

Database Role (slurmdbd)

Planned support for a dedicated database node role:

nodes:
  - role: db                           # slurmdbd + MariaDB
  - role: controller
  - role: worker
    count: 3

The db role would run slurmdbd and MariaDB for job accounting. sind would:

Generate slurmdbd.conf with appropriate settings
Configure slurm.conf to use the accounting database
Initialize the MariaDB database schema

This enables testing of Slurm accounting features and multi-cluster federation scenarios.

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

sind - Slurm in Docker

Prerequisites

Supported Versions

Overview

Operational Model

Container Startup

Event-Driven Readiness

Readiness Checks

Design Goals

Implementation

Go Dependencies

Docker Interaction

License

Commit Guidelines

Principles

Format

CLI Design Guidelines

Command Structure

Argument Conventions

Flag Conventions

Output Conventions

Logging Conventions

Shell Completion

New Command Checklist

Testing

Requirements

CLI Commands

Cluster Management

Cluster Diagnostics

Host Diagnostics

Node Access

Worker Lifecycle

Power Control

Logs

Utilities

Node Arguments

Format

Nodeset Notation

Examples

Configuration Schema

Minimal Configuration

Shorthand Node Syntax

Full Configuration Example

Slurm Configuration Sections

Node Roles

Node Parameters

Validation Rules

Docker Resources

Per-Cluster Resources

Global Resources (Mesh)

Defaults

Volume Mounts

Mount Options

Data Mount

Container Labels

Enter and Exec

Networking

Cluster Network

Mesh Network

DNS

SSH

Global SSH Resources

Lifecycle

Host Key Collection

Public Key Injection

User Access

sind ssh Implementation

User SSH Client Integration

Command Routing

sind enter

sind exec

Container Images

Generic Image

Custom Images