fix(bootstrap): load kernel modules on install and fix Podman socket detection#24
Conversation
…detection RPM spec: - Add br_netfilter to modules-load.d config for K3s bridge netfilter - Ship sysctl.d/99-openshell.conf with net.bridge.bridge-nf-call-iptables - Add %post scriptlet to modprobe modules immediately (no reboot required) - Add Recommends: podman-docker as belt-and-suspenders for socket compat Podman socket detection: - Add connect_local_auto() helper in docker.rs for auto-detecting runtime - Replace all 7 Docker::connect_with_local_defaults() calls outside docker.rs with runtime-aware alternatives (connect_local, connect_local_auto, or metadata-based lookup with fallback) - Remove unused bollard::Docker import from build.rs
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…ctions Add connect_for_gateway(name) helper that resolves the container runtime from stored gateway metadata first, falling back to detect_runtime() with full error propagation instead of silently defaulting to Docker. Replace the duplicated inline metadata-detect-fallback blocks in extract_and_store_pki and gateway_container_logs with the new helper.
|
Humm...I am skeptical of this "load kernel modules" stuff. First of all most of that stuff should be dynamically loaded already - it's an anti-pattern to eagerly load modules. Is something blocking the load? |
|
@cgwalters I don't know if anything is blocking it, but it's definitely not loading dynamically. The usage of it happens inside the k3s container if that provides any insight as to what it might be. |
|
In order for logic to work on e.g. MacOS, kernel modules have to be loaded inside the podman machine VM. So anything that involves having the client tool (in this case, an RPM) manipulate the host system state is I think wrong. I'm not an expert in the iptables bits, it's possible that the problem is k3s is trying to do iptables/nft from inside a privileged container, which would break dynamic module loading. A general fix with privileged containers like this is to have them fork off host level operations via |
(followup since I know this was confusing) - For sure most people on Linux use podman without podman-machine, but it is architecturally valid to do so, and there are some use cases for it (albeit obscure). But it's helpful to think of it this way - anything we do in shipping the client binary should I think work symmetrically across MacOS and Linux. And the client binary shouldn't have anything to do with kmods itself. |
…es modules When running under Podman, the k3s cluster now uses: - Native nftables kube-proxy mode (--kube-proxy-arg=proxy-mode=nftables) - Host DNS resolution instead of iptables DNAT proxy (Podman DNS is routable) - Skipped iptables backend probe (unnecessary with nftables kube-proxy) This eliminates the need for legacy iptables kernel modules (ip_tables, iptable_nat, iptable_filter, iptable_mangle) on the host when using Podman. The Docker path is completely unchanged — all new behavior is gated on CONTAINER_RUNTIME=podman. Container image: add nftables package (provides nft binary for kube-proxy). RPM spec: modules-load.d now only loads br_netfilter (still required for bridged pod traffic regardless of iptables/nftables). Remove podman-docker recommends (no longer needed with native Podman socket detection and nftables networking).
Can you elaborate? It's not immediately obvious to me what you mean there. Thanks! :) |
|
Basically in a privileged container, bind mounting in the host |
Add :dev tag to both gateway and cluster multi-arch manifests in the midstream container build workflow. Local cargo builds default to the dev tag (OPENSHELL_IMAGE_TAG is unset), so this ensures locally-built CLI binaries can pull images from GHCR without needing to override the tag. The dev and midstream tags are kept in sync — both point to the same image built from the midstream branch on every merge.
|
@cgwalters I don't follow ... openshell doesn't bind mount in the host |
|
@cgwalters I want to continue to discuss this, but I'm going to merge this for now to unblock some other efforts |
Summary
Fix two issues that cause gateway startup failures on Podman-only Fedora systems.
Issue 2: Kernel Module Loading in
%postThe RPM spec ships a
modules-load.d/openshell.conffile, butsystemd-modules-load.serviceruns at boot — long before package installation. Modules are never loaded until reboot, causinggateway startto fail on fresh installs.Additionally,
br_netfilter(required by K3s fornet.bridge.bridge-nf-call-iptables) was missing entirely.Changes:
br_netfiltertomodules-load.d/openshell.confsysctl.d/99-openshell.confwith bridge netfilter settings%postscriptlet that runsmodprobe -aimmediately +%sysctl_apply%filesIssue 3: Podman Socket Detection
Multiple code paths call
Docker::connect_with_local_defaults()which hardcodes/var/run/docker.sock. On Podman-only systems withoutpodman-docker, this fails withSocket not found.The crate already has a runtime-aware
docker::connect_local(runtime)function, but 7 call sites bypassed it.Changes:
Recommends: podman-dockerto RPM spec (belt-and-suspenders)connect_local_auto()helper that auto-detects runtime and connectsconnect_with_local_defaults()calls with runtime-aware alternatives:runtimein scope →docker::connect_local(runtime)name→ metadata lookup for stored runtime, fallback to auto-detectdocker::connect_local_auto()connect_local_auto()with graceful fallbackbollard::Dockerimport frombuild.rsRelated Issue
See plan:
architecture/plans/fix-rpm-modules-and-podman-socket.mdChanges
openshell.specbr_netfilter, sysctl config,%postscriptlet,Recommends: podman-dockercrates/openshell-bootstrap/src/docker.rsconnect_local_auto()helpercrates/openshell-bootstrap/src/lib.rsconnect_with_local_defaults()callscrates/openshell-bootstrap/src/build.rsconnect_with_local_defaults()calls, remove unused importTesting
cargo check— full workspace, zero errorscargo test -p openshell-bootstrap— 125/125 tests passcargo fmt --check— cleanconnect_with_local_defaults()is the legitimate one insideconnect_local()for the Docker runtime pathChecklist