Skip to content

feat: containerd runtime — drive containerd's gRPC API in process#7

Open
adiled wants to merge 13 commits into
mainfrom
feat/containerd-runtime
Open

feat: containerd runtime — drive containerd's gRPC API in process#7
adiled wants to merge 13 commits into
mainfrom
feat/containerd-runtime

Conversation

@adiled

@adiled adiled commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Adds --runtime containerd: orchd runs containers by driving containerd's gRPC API in process.

How it works

The runtime emits one stateless foreground command, orchd containerd-run --spec <base64>, which the supervisor (orchdi/launchd/systemd) tracks. That process:

  • pulls the image via containerd's Transfer service (daemon-side),
  • folds the rootfs chainID and prepares an overlayfs snapshot,
  • creates the container + task with an OCI spec in the host network namespace (so there's no CNI/iptables dependency),
  • waits in the foreground, and on SIGTERM kills + deletes the task, container, and snapshot (idempotent on restart).

One command owns the whole lifecycle, so there's no separate pre_start/stop/post_stop.

Design notes

  • Rust, feature-gated. containerd owns the wire (gRPC = HTTP/2 + protobuf), so this is the right place for tonic/prost. It's behind the containerd cargo feature: the default orchd build pulls zero of these deps and needs no protoc. Build the container-capable orchd with --features containerd (just build-linux does this).
  • no_pivot_root. orchd-osx boots its VM as an initramfs (ramfs root), where runc's pivot_root fails; the runc options make it use MS_MOVE+chroot instead.
  • Layout. src/runtime/containerd/{mod.rs,run.rs} (runtime + the containerd-run leaf); apple.rs likewise moved to apple/mod.rs.

Supporting commits on the branch

  • just build-linux: cross-compile a static aarch64-linux orchd (--features containerd).
  • orchd-osx: boot a local OCI image layout offline, and follow the nested image-index container image save emits.

Validated end to end — examples/inception

A nested composability test: orchd (macOS) boots a Debian VM via its apple runtime → containerd runs inside → orchd (Linux) drives containerd over gRPC → an alpine task runs in host netns. containerd's own view confirms it:

orchd survey:        orch.web   running
ctr -n orch tasks:   orch-web   191   RUNNING
containerd-run: started orch-web (docker.io/library/alpine:latest)

This doubles as the spec-alignment proof: the VM is sized from the Orchfile (memory/cpus) and the containerd toolchain (containerd + runc) is mounted as a volume, both of which only work because orchd-osx honors the full service spec (shipped in v0.3.1).

Tests

Runtime exec_set unit tests (decode the spec, assert image/id/args/env). Full suite green; the default (lean) build pulls no heavy deps.

@adiled adiled force-pushed the feat/containerd-runtime branch from ad6692a to d92fda5 Compare June 6, 2026 15:58
adiled added 4 commits June 6, 2026 22:55
Add containerd as a third runtime alongside bare and apple. It drives
containerd through nerdctl (its docker-compatible CLI), so containers run
with no Docker daemon on top. Fits the four-string ExecSet contract today:

  pre_start  nerdctl pull <image>
  start      nerdctl run --name <ns>-<svc> --init [flags] <image> [cmd]
  stop       nerdctl stop <ns>-<svc>
  post_stop  nerdctl rm -f <ns>-<svc>

Full spec coverage (env, env-files, volumes, publish, memory, cpus, user,
workdir, entrypoint, cmd) and host-mode passthrough, same shape as apple's
CLI path. check() gates on nerdctl/containerd reachability, so it errors
cleanly off-Linux. A future mode-2 will drive containerd's gRPC API in
process; this CLI path is the v1.

  orchd --runtime containerd grow
First ingredient of the containerd dogfood harness: a Linux orchd to run
inside an orchd-osx VM. cargo-zigbuild + rustup musl target produce a 1.4M
statically-linked ELF with no runtime deps, droppable into any Linux guest.
resolve() now routes a reference that names a local path (starts with '/'
or '.') to the existing unpackLayout(), skipping the registry pull. This is
the harness enabler: a locally-assembled image (containerd + runc + orchd)
boots with no registry push.

  orchd-osx run <name> /path/to/oci-layout
container image save emits index.json -> image-index -> arm64 manifest.
unpackLayout picked the first index.json entry and parsed it as a manifest,
failing on the nested index. Now it follows nested indexes (bounded depth)
until it reaches a manifest with layers. This is what lets orchd-osx boot an
image built by the container CLI and exported to a local OCI layout.
@adiled adiled force-pushed the feat/containerd-runtime branch from d92fda5 to 75f40be Compare June 6, 2026 17:56
adiled added 3 commits June 6, 2026 23:11
A nested composability + robustness example: orchd --runtime apple (osx mode)
boots a Linux microVM, sized and mounted from the Orchfile spec, and inside it
orchd --runtime containerd drives containerd to run a container. orchd
orchestrating orchd, two runtimes deep.

It only runs if the full spec is honored end to end (the VM's memory/cpus come
from the Orchfile; the containerd toolchain is mounted as a volume rather than
baked into an image that could never fit the in-RAM initramfs), so it doubles
as the spec-alignment proof. setup.sh stages the ~600MB toolchain (not
committed); needs ~3 GiB free RAM to boot the VM.
debian-slim ships no CA bundle; containerd's TLS could not verify the registry
(x509: unknown authority). setup.sh now stages the host CA bundle into tools/,
and run-test.sh points both Go (SSL_CERT_FILE) and apt/openssl (/etc/ssl/certs)
at it. The alpine pull now completes; the remaining gap is nerdctl's CNI bridge
requiring iptables (the case for driving containerd directly instead).
Replace the nerdctl ExecSet with an in-process containerd client. The runtime
now emits a single stateless foreground command, `orchd containerd-run --spec
<base64>`, which talks to containerd's gRPC socket directly: Transfer-service
pull -> chainID snapshot -> create container -> create/start task (host netns,
no CNI/iptables) -> wait -> SIGTERM kill+delete. No nerdctl, no ctr, no Docker.

- src/runtime/containerd/{mod.rs,run.rs}: runtime + the containerd-run leaf,
  moved into a folder; apple.rs likewise moved to apple/mod.rs.
- Feature-gated behind `containerd` (containerd-client/tonic/tokio + host
  protoc), so the default build stays lean (verified: zero heavy deps pulled).
- no_pivot_root=true in the runc options: orchd-osx boots the VM as an
  initramfs where pivot_root fails; runc uses MS_MOVE+chroot instead.
- just build-linux now builds --features containerd.

Validated end to end via examples/inception: orchd boots a Debian VM (apple-osx),
containerd runs inside, orchd drives it over gRPC, alpine task RUNNING (ctr
confirms PID). The iptables wall is gone (host netns).
@adiled adiled changed the title feat: containerd runtime (nerdctl ExecSet) feat: containerd runtime — drive containerd's gRPC API in process (no nerdctl) Jun 6, 2026
…placement'

Drop the nerdctl framing from the living surfaces (module docs, comments, the
inception example): describe the containerd runtime by what it is (drives
containerd's gRPC API in process, container in host netns) rather than by what
it is not. The inception toolchain now fetches containerd + runc directly
(~106 MB) instead of the nerdctl-full bundle (~612 MB); leaner and nerdctl-free.
History keeps the earlier commits intact. Re-validated end to end: orch-web
RUNNING under containerd via gRPC.
@adiled adiled changed the title feat: containerd runtime — drive containerd's gRPC API in process (no nerdctl) feat: containerd runtime — drive containerd's gRPC API in process Jun 6, 2026
adiled added 5 commits June 7, 2026 01:31
Stress testing (repeated grow/fell over multiple containers) surfaced a leak:
teardown killed the task then immediately deleted the task/container/snapshot,
but the container's PID 1 ignores SIGTERM (kernel shields namespace init), so
the task survived the delete and leaked. Teardown now kills (SIGTERM), waits
for the task to actually exit, SIGKILLs if it overruns the grace, and only
then deletes. Verified: repeated cycles leave containerd with zero leaked
tasks/containers/snapshots.
stress.sh runs N grow/fell cycles over several containers and asserts
containerd is left with zero leaked tasks/containers/snapshots after each
teardown (waiting out the SIGTERM grace before measuring). setup.sh stages it
and writes Orchfile.stress. Result on the dev box: PASS.
The orchdi supervisor parsed services and then ran each exactly once: RESTART,
RESTART_DELAY, START_LIMIT_*, ONESHOT, and STDOUT/STDERR were all dropped.
Now the supervise loop honors them: restart on policy (no/on-failure/always)
after a delay, give up past the start-limit burst/interval, never restart a
oneshot, and redirect the service's stdout/stderr to the configured paths.
Verified: 5 restarts observed for a crash-looping service, frozen by fell.
The containerd runtime dropped env_files and volumes and ignored all of
resources (the OCI spec even hardcoded RLIMIT_NOFILE 1024). Now the spec
carries them: env_files are merged into the env, volumes become rw bind mounts,
and resources map to the OCI spec's cgroup block (memory.limit, cpu quota/period,
pids.limit, blockIO.weight) and rlimits (nofile/nproc from the spec). A
username (vs numeric uid) is warned about rather than silently run as root.
Verified in-container: VOLUME bind mount, ENV, and cgroup memory.max=64MiB.
… spec)

stress2.sh exercises 6-container fan-out, a oneshot (must not restart),
crash/restart under RESTART on-failure (and fell freezing the loop), and a
spec-alignment check that reads the volume mount, env, and cgroup memory.max
from inside a real container. Result: PASS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant