feat: containerd runtime — drive containerd's gRPC API in process#7
Open
adiled wants to merge 13 commits into
Open
feat: containerd runtime — drive containerd's gRPC API in process#7adiled wants to merge 13 commits into
adiled wants to merge 13 commits into
Conversation
ad6692a to
d92fda5
Compare
Add containerd as a third runtime alongside bare and apple. It drives containerd through nerdctl (its docker-compatible CLI), so containers run with no Docker daemon on top. Fits the four-string ExecSet contract today: pre_start nerdctl pull <image> start nerdctl run --name <ns>-<svc> --init [flags] <image> [cmd] stop nerdctl stop <ns>-<svc> post_stop nerdctl rm -f <ns>-<svc> Full spec coverage (env, env-files, volumes, publish, memory, cpus, user, workdir, entrypoint, cmd) and host-mode passthrough, same shape as apple's CLI path. check() gates on nerdctl/containerd reachability, so it errors cleanly off-Linux. A future mode-2 will drive containerd's gRPC API in process; this CLI path is the v1. orchd --runtime containerd grow
First ingredient of the containerd dogfood harness: a Linux orchd to run inside an orchd-osx VM. cargo-zigbuild + rustup musl target produce a 1.4M statically-linked ELF with no runtime deps, droppable into any Linux guest.
resolve() now routes a reference that names a local path (starts with '/' or '.') to the existing unpackLayout(), skipping the registry pull. This is the harness enabler: a locally-assembled image (containerd + runc + orchd) boots with no registry push. orchd-osx run <name> /path/to/oci-layout
container image save emits index.json -> image-index -> arm64 manifest. unpackLayout picked the first index.json entry and parsed it as a manifest, failing on the nested index. Now it follows nested indexes (bounded depth) until it reaches a manifest with layers. This is what lets orchd-osx boot an image built by the container CLI and exported to a local OCI layout.
d92fda5 to
75f40be
Compare
A nested composability + robustness example: orchd --runtime apple (osx mode) boots a Linux microVM, sized and mounted from the Orchfile spec, and inside it orchd --runtime containerd drives containerd to run a container. orchd orchestrating orchd, two runtimes deep. It only runs if the full spec is honored end to end (the VM's memory/cpus come from the Orchfile; the containerd toolchain is mounted as a volume rather than baked into an image that could never fit the in-RAM initramfs), so it doubles as the spec-alignment proof. setup.sh stages the ~600MB toolchain (not committed); needs ~3 GiB free RAM to boot the VM.
debian-slim ships no CA bundle; containerd's TLS could not verify the registry (x509: unknown authority). setup.sh now stages the host CA bundle into tools/, and run-test.sh points both Go (SSL_CERT_FILE) and apt/openssl (/etc/ssl/certs) at it. The alpine pull now completes; the remaining gap is nerdctl's CNI bridge requiring iptables (the case for driving containerd directly instead).
Replace the nerdctl ExecSet with an in-process containerd client. The runtime
now emits a single stateless foreground command, `orchd containerd-run --spec
<base64>`, which talks to containerd's gRPC socket directly: Transfer-service
pull -> chainID snapshot -> create container -> create/start task (host netns,
no CNI/iptables) -> wait -> SIGTERM kill+delete. No nerdctl, no ctr, no Docker.
- src/runtime/containerd/{mod.rs,run.rs}: runtime + the containerd-run leaf,
moved into a folder; apple.rs likewise moved to apple/mod.rs.
- Feature-gated behind `containerd` (containerd-client/tonic/tokio + host
protoc), so the default build stays lean (verified: zero heavy deps pulled).
- no_pivot_root=true in the runc options: orchd-osx boots the VM as an
initramfs where pivot_root fails; runc uses MS_MOVE+chroot instead.
- just build-linux now builds --features containerd.
Validated end to end via examples/inception: orchd boots a Debian VM (apple-osx),
containerd runs inside, orchd drives it over gRPC, alpine task RUNNING (ctr
confirms PID). The iptables wall is gone (host netns).
…placement' Drop the nerdctl framing from the living surfaces (module docs, comments, the inception example): describe the containerd runtime by what it is (drives containerd's gRPC API in process, container in host netns) rather than by what it is not. The inception toolchain now fetches containerd + runc directly (~106 MB) instead of the nerdctl-full bundle (~612 MB); leaner and nerdctl-free. History keeps the earlier commits intact. Re-validated end to end: orch-web RUNNING under containerd via gRPC.
Stress testing (repeated grow/fell over multiple containers) surfaced a leak: teardown killed the task then immediately deleted the task/container/snapshot, but the container's PID 1 ignores SIGTERM (kernel shields namespace init), so the task survived the delete and leaked. Teardown now kills (SIGTERM), waits for the task to actually exit, SIGKILLs if it overruns the grace, and only then deletes. Verified: repeated cycles leave containerd with zero leaked tasks/containers/snapshots.
stress.sh runs N grow/fell cycles over several containers and asserts containerd is left with zero leaked tasks/containers/snapshots after each teardown (waiting out the SIGTERM grace before measuring). setup.sh stages it and writes Orchfile.stress. Result on the dev box: PASS.
The orchdi supervisor parsed services and then ran each exactly once: RESTART, RESTART_DELAY, START_LIMIT_*, ONESHOT, and STDOUT/STDERR were all dropped. Now the supervise loop honors them: restart on policy (no/on-failure/always) after a delay, give up past the start-limit burst/interval, never restart a oneshot, and redirect the service's stdout/stderr to the configured paths. Verified: 5 restarts observed for a crash-looping service, frozen by fell.
The containerd runtime dropped env_files and volumes and ignored all of resources (the OCI spec even hardcoded RLIMIT_NOFILE 1024). Now the spec carries them: env_files are merged into the env, volumes become rw bind mounts, and resources map to the OCI spec's cgroup block (memory.limit, cpu quota/period, pids.limit, blockIO.weight) and rlimits (nofile/nproc from the spec). A username (vs numeric uid) is warned about rather than silently run as root. Verified in-container: VOLUME bind mount, ENV, and cgroup memory.max=64MiB.
… spec) stress2.sh exercises 6-container fan-out, a oneshot (must not restart), crash/restart under RESTART on-failure (and fell freezing the loop), and a spec-alignment check that reads the volume mount, env, and cgroup memory.max from inside a real container. Result: PASS.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
--runtime containerd: orchd runs containers by driving containerd's gRPC API in process.How it works
The runtime emits one stateless foreground command,
orchd containerd-run --spec <base64>, which the supervisor (orchdi/launchd/systemd) tracks. That process:One command owns the whole lifecycle, so there's no separate pre_start/stop/post_stop.
Design notes
tonic/prost. It's behind thecontainerdcargo feature: the defaultorchdbuild pulls zero of these deps and needs noprotoc. Build the container-capable orchd with--features containerd(just build-linuxdoes this).no_pivot_root. orchd-osx boots its VM as an initramfs (ramfs root), where runc'spivot_rootfails; the runc options make it useMS_MOVE+chrootinstead.src/runtime/containerd/{mod.rs,run.rs}(runtime + thecontainerd-runleaf);apple.rslikewise moved toapple/mod.rs.Supporting commits on the branch
just build-linux: cross-compile a static aarch64-linux orchd (--features containerd).container image saveemits.Validated end to end —
examples/inceptionA nested composability test: orchd (macOS) boots a Debian VM via its apple runtime → containerd runs inside → orchd (Linux) drives containerd over gRPC → an alpine task runs in host netns. containerd's own view confirms it:
This doubles as the spec-alignment proof: the VM is sized from the Orchfile (memory/cpus) and the containerd toolchain (containerd + runc) is mounted as a volume, both of which only work because orchd-osx honors the full service spec (shipped in v0.3.1).
Tests
Runtime exec_set unit tests (decode the spec, assert image/id/args/env). Full suite green; the default (lean) build pulls no heavy deps.