Skip to content

fix(bootstrap): fix nftables healthcheck and warn on missing flannel modules#25

Merged
maxamillion merged 1 commit intomidstreamfrom
fix/flannel-iptables-and-healthcheck
Apr 8, 2026
Merged

fix(bootstrap): fix nftables healthcheck and warn on missing flannel modules#25
maxamillion merged 1 commit intomidstreamfrom
fix/flannel-iptables-and-healthcheck

Conversation

@maxamillion
Copy link
Copy Markdown

@maxamillion maxamillion commented Apr 8, 2026

Summary

Fixes openshell gateway start failing on Fedora 43+ and other modern distributions when using Podman with nftables kube-proxy mode.

Related: Follow-up to #24 which added nftables kube-proxy support for Podman.

Problem

After #24 switched kube-proxy to nftables mode under Podman, two issues remained:

  1. Flannel still needs legacy iptables modules. The flannel traffic manager embedded in k3s v1.35.x is compiled without an nft backend — it only has iptables.(*IPTablesManager), no nftablesMgr or equivalent. When flannel calls iptables -t nat for masquerade rules, it fails on Fedora 43+ because iptable_nat is not loaded:

    iptables v1.8.11 (legacy): can't initialize iptables table 'nat': Table does not exist
    
  2. Health check NodePort probe fails with nftables kube-proxy. The health check tests 127.0.0.1:30051, but nftables kube-proxy only adds the node's real IP to the nodeport-ips set — loopback is never matched, so the probe always fails.

Changes

deploy/docker/cluster-entrypoint.sh

  • When running under Podman, check whether iptable_nat is loaded and emit an actionable warning if not
  • The RPM %post scriptlet loads these modules immediately at install time and persists them via modules-load.d; the warning covers non-RPM installs

deploy/docker/cluster-healthcheck.sh

  • Replace hardcoded 127.0.0.1 with the node's actual InternalIP from kubectl
  • Falls back to 127.0.0.1 if the node IP can't be determined
  • Works with both iptables and nftables kube-proxy modes

Testing

Full lifecycle tested on an ephemeral Fedora 43 Linode instance (kernel 6.19.10, Podman 5.8.1, openshell 0.0.22 from COPR):

Test Result
openshell gateway start ✅ Gateway ready
openshell status ✅ Connected
openshell provider create (OpenAI + Anthropic)
openshell provider list / delete
openshell sandbox create
openshell sandbox exec (uname, whoami, os-release)
openshell sandbox delete
openshell sandbox create --no-keep (ephemeral)

Checklist

  • Follows conventional commit format
  • No secrets or credentials committed
  • Changes scoped to the issue at hand
  • Tested on target platform (Fedora 43 + Podman)

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d8a19db-b957-44fc-9ae1-af3d9fa49ca6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/flannel-iptables-and-healthcheck

Comment @coderabbitai help to get the list of available commands and usage tips.

@maxamillion maxamillion force-pushed the fix/flannel-iptables-and-healthcheck branch 2 times, most recently from d0b6ee6 to 3f9a770 Compare April 8, 2026 02:13
…modules

Flannel's embedded traffic manager in k3s v1.35.x is compiled without the
nft backend — it only has iptables-legacy support, which requires kernel
modules (ip_tables, iptable_nat, iptable_filter, iptable_mangle) that
modern distributions (Fedora 43+, RHEL 10+) no longer load by default.

Changes:

- cluster-entrypoint.sh: When running under Podman, check whether the
  iptable_nat module is loaded and emit an actionable warning if not.
  The modules are expected to be loaded at boot via modules-load.d
  (installed by the RPM spec); the warning covers the case where the
  host hasn't rebooted since installation.

- cluster-healthcheck.sh: Replace the hardcoded 127.0.0.1 NodePort check
  with the node's actual InternalIP.  When kube-proxy runs in nftables
  mode, NodePort DNAT rules only match the node's real IP addresses —
  loopback is not in the nftables nodeport-ips set, so the old check
  always failed.

Tested on Fedora 43 (kernel 6.19, Podman 5.8.1) with the full lifecycle:
gateway start, provider create/list/delete, sandbox create/exec/delete.
@maxamillion maxamillion force-pushed the fix/flannel-iptables-and-healthcheck branch from 3f9a770 to 32f46b7 Compare April 8, 2026 02:15
@maxamillion maxamillion changed the title fix(bootstrap): load flannel iptables modules and fix nftables healthcheck fix(bootstrap): fix nftables healthcheck and warn on missing flannel modules Apr 8, 2026
@maxamillion maxamillion merged commit 797bca0 into midstream Apr 8, 2026
10 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant