Add agent-managed firewall with nftables for NATS/monit access control by rkoster · Pull Request #394 · cloudfoundry/bosh-agent

rkoster · 2026-01-30T16:24:43Z

Summary

Move firewall management from stemcell scripts to bosh-agent to properly handle mixed cgroup environments in nested container scenarios
Add nftables-based firewall using github.com/google/nftables library (pure Go, no C dependencies)
Support DNS re-resolution on NATS reconnect for HA failover scenarios where director may move to a different IP

Replaces

Add cgroups v2 support for Jammy stemcells bosh-linux-stemcell-builder#468

Supported Container Scenarios

Host	Container	Cgroup	Matching Strategy
Noble	Jammy	v2 (inherited)	cgroup v2 socket matching
Noble	Noble	v2	cgroup v2 socket matching
Jammy	Noble	v1/hybrid	UID-based fallback (meta skuid 0)
Jammy	Jammy	v1	net_cls classid matching

Architectural Changes

Firewall lifecycle moved from network setup to agent bootstrap + mbus hooks

Before (iptables):

Firewall rules set up during SetupNetworking() in net managers
Single-shot setup at network configuration time
DNS resolved once at startup; no re-resolution on reconnect

After (nftables):

SetupFirewall(mbusURL) called during agent bootstrap after networking
Monit chain: Set up once at bootstrap, never modified
NATS chain: Flushed and rebuilt before each NATS connect/reconnect via NatsFirewallHook.BeforeConnect() hook
DNS re-resolved on every reconnect attempt, enabling HA failover to new director IP

New Platform interface methods

SetupFirewall(mbusURL string) error
GetNatsFirewallHook() firewall.NatsFirewallHook

Cross-table coordination with packet marks

nftables evaluates each table's chains independently. To coordinate with the stemcell's base bosh_firewall table:

Agent's bosh_agent table (priority -1) sets packet mark 0xb054 on allowed packets
Stemcell's bosh_firewall table (priority 0) checks for this mark and skips DROP rules

Runtime cgroup detection

Cgroup version detected at runtime from /proc/self/cgroup:

Jammy container on Noble host → detects cgroup v2 (inherited from host)
Hybrid cgroup systems (v2 mounted but no controllers) → falls back to UID matching

New CLI command for BOSH jobs

bosh-agent firewall-allow monit

Called by BOSH jobs needing monit API access for controlled failover scenarios.

Note

The CLI-based nftables fallback (NftablesFirewallCLI) is temporary and only included for debugging while working through getting the integration tests to pass. It will be removed before merge.

Files Changed

New

platform/firewall/ - nftables implementation with cgroup detection

Modified

agent/bootstrap.go - Call SetupFirewall() after networking
mbus/nats_handler.go - Add firewall hook before connect/reconnect
platform/*_platform.go - Implement new interface methods
main/agent.go - Add firewall-allow CLI command

Removed

platform/net/firewall_provider*.go - Old iptables/Windows Firewall implementation

platform/firewall/nftables_firewall.go

aramprice · 2026-01-30T21:24:22Z

It looks like there are a lot of diffs in generated fakes. Are these because of a change in counterfeiter? If so could these changes be broken out?

rkoster · 2026-01-30T21:48:06Z

Yeah definitely, will do some more cleanup before marking it as ready for review.

Move firewall management from stemcell scripts to bosh-agent to properly handle Jammy containers running on Noble hosts (cgroup v2 inherited from host). - Create platform/firewall package with nftables implementation using github.com/google/nftables library (pure Go, no C deps) - Add SetupFirewall() to Platform interface, called during bootstrap - Add 'bosh-agent firewall-allow <service>' CLI command for monit wrapper - Detect cgroup version at runtime from /proc/self/cgroup - NATS firewall only enabled for Jammy (Noble uses ephemeral credentials) - Gracefully handle missing base firewall (warn, don't fail)

Refactor NftablesFirewall to use dependency injection for testability: - Add NftablesConn, CgroupResolver, and OSReader interfaces - Add NewNftablesFirewallWithDeps constructor for testing - Export TableName, ChainName, and MonitPort constants - Generate counterfeiter fakes with Linux build tags Add scenario-based test coverage for: - SetupAgentRules with various NATS URL formats - OS-specific behavior (Jammy adds NATS rules, Noble skips them) - Cgroup v2 rule generation with path verification (including nested container paths) - Cgroup v1 rule generation with classid verification - AllowService and Cleanup operations

The firewall-allow CLI is called by BOSH-deployed jobs that need to interact with the monit API directly for controlled failover scenarios (e.g., monitoring process lifecycle). Jobs call this via the permit_monit_access helper function.

Remove OSReader dependency from firewall manager. NATS firewall enablement is now controlled via Platform.Linux.EnableNATSFirewall in agent.json. - Jammy stemcells set EnableNATSFirewall: true (uses static NATS credentials) - Noble stemcells leave it unset/false (uses ephemeral NATS credentials) This properly handles all deployment scenarios including Jammy containers running on Noble hosts with cgroup v2.

Migrate NATS firewall management from network configuration time (iptables) to mbus connection hook (nftables). This enables DNS re-resolution before each connection/reconnection, supporting HA failover scenarios where the director may move to a different IP. Architecture changes: - Monit chain: set up once at bootstrap, never modified - NATS chain: flushed and rebuilt before each NATS connect/reconnect via BeforeConnect hook in mbus handler Key changes: - Add NatsFirewallHook interface with BeforeConnect(mbusURL) method - Add GetNatsFirewallHook() to Platform interface - Store firewall manager in linux platform for reuse - Call BeforeConnect before initial connection and in reconnect handler - Remove old iptables-based firewall_provider from platform/net - Update integration tests from iptables to nftables

The NATS firewall now adds two rules per director IP: 1. ACCEPT rule for agent's cgroup (allows agent to connect) 2. DROP rule for everyone else (blocks malicious workloads) This protects against metadata service attacks where workloads could extract NATS certs and impersonate the agent.

The netlink-based nftables library fails with EOVERFLOW in nested container environments (e.g., Garden containers inside Concourse). This adds a CLI-based implementation that uses the nft command directly, which works reliably in these environments. When the netlink implementation fails, the agent automatically falls back to the CLI implementation.

On hybrid cgroup systems (cgroup v2 mounted but with no controllers enabled), nftables 'socket cgroupv2' matching doesn't work because the socket-to-cgroup association isn't established. This is detected by checking if /sys/fs/cgroup/cgroup.controllers has content. If empty, we fall back to UID-based matching (meta skuid 0) which allows any root process to connect to monit/NATS. Trade-off: UID matching is less secure (any root process can connect) but works on all systems including nested containers on hybrid hosts.

nftables evaluates each table's chains independently - an ACCEPT verdict in one table doesn't prevent other tables from also evaluating the packet. When the agent allows a packet to monit, it now sets mark 0xb054. The base bosh_firewall table checks for this mark and skips the DROP rule when set. This enables proper coordination between the agent-managed bosh_agent table (priority -1) and the stemcell's bosh_firewall table (priority 0).

…ments - Accept both socket cgroupv2 classid and meta cgroup matching patterns - Use (?s) flag for multiline regex matching - Parse hostname from BOSH_ENVIRONMENT URL - Handle hybrid cgroup v1+v2 systems (grep ^0:: for unified hierarchy) - Support both /sys/fs/cgroup and /sys/fs/cgroup/unified mount points - Skip cgroup access test when cgroup manipulation is not permitted (nested containers)

Noble stemcells use systemd for service management instead of runit. This change adds OS detection to use the appropriate service manager: - isSystemdSystem() detects Noble via /etc/lsb-release - StopAgent/StartAgent use systemctl for Noble, sv for Jammy - system_mounts_test accepts both service manager log patterns

Use OS detection to stop agent with appropriate service manager (systemctl for Noble, sv for Jammy) before installing new agent binary.

- Add regex pattern for pure cgroupv2 with systemd path format - Wait for NATS rules to be populated before checking nftables - Use Eventually for nftables check to handle timing issues

- Truncate log file in BeforeEach to avoid finding old logs from pipeline setup - Delete existing nftables table to ensure clean state for fresh agent startup

- Remove unreliable log-based waiting (differs between runit/systemd) - Wait up to 300 seconds for NATS rules to appear in nftables - Add meta skuid pattern for hybrid cgroup UID-based matching

Remove nohup wrapper for systemd commands since systemctl is designed to work synchronously and doesn't need background execution like runit's sv.

On systemd systems, 'systemctl start' is a no-op if the service is already running. Tests that need a fresh agent state (e.g., firewall tests that delete the nftables table) require a restart to ensure the agent recreates its firewall rules on startup.

On Noble (systemd-based stemcells), auditd requires /var/log/audit to exist. When CleanupDataDir unmounts /var/log and recreates it as an empty directory, the audit subdirectory is missing, causing auditd to fail. The agent's bootstrap then fails because the bosh-start-logging-and-auditing script cannot start auditd. This fix ensures /var/log/audit is created with proper permissions after /var/log is recreated.

- Fix SetupAgentRules signature in nftables_firewall_other.go to match Manager interface (add enableNATSFirewall parameter) - Fix nil pointer dereference in nats_handler.go ClosedHandler when connection closes gracefully without error - Remove CLI-based nftables fallback from linux_platform.go - Delete nftables_firewall_cli.go (CLI implementation no longer needed)

Add //nolint:errcheck comments for intentionally ignored errors and check url.Parse error return value.

The CgroupResolver and NftablesConn interfaces are defined in nftables_firewall.go which has //go:build linux. Use counterfeiter's -header option to automatically include the linux build tag when generating these fakes. Add linux_header.txt containing the build tag, and reference it in the counterfeiter:generate directives for Linux-only interfaces.

Validate that the parsed port is within the valid TCP port range (1-65535) before converting to uint16 in buildTCPDestPortExprs. This prevents potential integer overflow when converting from architecture-dependent int to uint16.

The google/nftables Go library has a bug encoding 'socket cgroupv2' expressions via netlink, causing 'netlink receive: value too large for defined data type' errors on Ubuntu Noble. The nft CLI works correctly, but the Go netlink implementation is broken. Until the library is fixed, always use UID-based matching (meta skuid 0) for cgroup v2 systems instead of socket cgroupv2 path matching. This allows root processes to access protected services (NATS, monit). Updated tests to reflect the new behavior where cgroup v2 systems always use UID-based matching regardless of IsCgroupV2SocketMatchFunctional.

The nftables kernel expects an 8-byte cgroup inode ID for 'socket cgroupv2' matching, not the path string. The nft CLI translates paths to inode IDs automatically, but the Go nftables library does not. This commit: - Adds GetCgroupID() to look up the cgroup inode via syscall.Stat - Updates buildCgroupMatchExprs() to use the 8-byte inode ID - Re-enables cgroup v2 socket matching (was disabled in previous workaround) - Updates tests to verify cgroup ID-based matching Verified on Ubuntu Noble (24.04) with cgroup v2: agent starts successfully and nftables rules correctly match the bosh-agent cgroup.

Simplify cgroup matching logic by removing the UID-based fallback that was used for hybrid cgroup systems (v2 mounted but with v1 controllers). The socket cgroupv2 matching should work regardless of whether cgroup controllers are delegated to v2, since the kernel's socket-to-cgroup association is based on the cgroup hierarchy, not controller delegation. This simplifies the code from 3 matching strategies to 2: - Cgroup v1: meta cgroup <classid> - Cgroup v2: socket cgroupv2 level 2 <inode_id> Integration tests will verify if this works in nested container scenarios.

Add integration tests to verify nftables firewall rules work correctly inside nested Garden/runC containers using OCI stemcell images. New files: - integration/utils/garden_client.go: Garden client helper that connects through SSH tunnel, supports container creation, command execution, file streaming, and cgroup detection - integration/garden_firewall_test.go: Tests cgroup detection, nftables availability, and firewall rule creation in Garden containers Tests verify that on cgroup v2 (Noble), firewall rules use proper cgroup socket matching rather than falling back to UID-based matching. Tests skip automatically if GARDEN_ADDRESS env var is not set. Dependencies added: - code.cloudfoundry.org/garden - code.cloudfoundry.org/lager/v3 Vendor cleanup: removed unused dependencies that were cleaned up by go mod tidy (iptables, go-systemd, staticcheck, etc.)

Tests now fail immediately with clear error messages when: - GARDEN_ADDRESS environment variable is not set - Garden client fails to connect - bosh-agent-linux-amd64 binary is not built - Agent fails to create firewall table This ensures test failures are visible in CI rather than silently skipped.

…g testing The Garden container firewall tests need to run in an environment where Garden remains functional throughout the test. The main integration test suite's SynchronizedBeforeSuite runs CleanupDataDir() which removes /var/vcap/data, destroying the Garden job. This commit: - Moves garden_firewall_test.go to integration/garden/ as a separate test suite - Adds deploy-to-noble.sh script for local testing with Garden deployment - Fixes test assertions and adds proper agent configuration (agent.json) - Uses CGO_ENABLED=0 for static binary that works in containers All 7 Garden firewall tests now pass, confirming that the agent correctly uses cgroup v2 socket matching (not UID fallback) in nested containers.

- Create nft-dump Go utility that uses nftables library to inspect firewall rules without requiring nft CLI. Outputs human-readable YAML with interpreted values (IP addresses, ports, marks, etc.) - Update Garden firewall tests to use nft-dump instead of nft CLI for verifying firewall rules work correctly in containers - Fix Jammy stemcell container tests by unmounting Garden's bind-mounted /etc/resolv.conf, /etc/hosts, /etc/hostname before starting the agent. This prevents the 'Device or resource busy' error when the agent runs resolvconf -u (same approach used by bosh-warden-cpi) - Add nft-dump helper methods to GardenClient and TestEnvironment for use in both container and VM-based integration tests - Update deploy-to-noble.sh to build nft-dump binary for container tests Tests now pass for both Noble and Jammy stemcells.

Update InstallNftDump in both TestEnvironment and GardenClient to automatically build the nft-dump binary if it doesn't exist, rather than requiring it to be pre-built. This makes the integration tests self-contained and removes the need for CI pipelines to build the binary separately.

On systemd-based systems like Noble, /var/tmp can have issues similar to /var/log where systemd services briefly hold file handles open even after fuser -km. This caused intermittent 'rm -rf /var/tmp' failures in test cleanup. Changes: - Use lazy unmount for /var/tmp (like we already do for /var/log) - Add retry logic with brief sleeps for rm -rf to handle transient busy state

The test was being skipped because we couldn't move a process into the agent's cgroup on systemd systems. This adds: 1. Debug output to understand the cgroup hierarchy 2. Try creating a child cgroup under the agent's cgroup (which should be allowed with proper delegation) instead of writing directly to the agent's cgroup.procs

The previous pattern 'pgrep -f "bosh-agent$"' expected the command line to END with 'bosh-agent', but on systemd the full command line is: /var/vcap/bosh/bin/bosh-agent -P ubuntu -C /var/vcap/bosh/agent.json Use 'pgrep -x bosh-agent' to match the exact process name, with fallback to matching the full binary path.

On systemd systems, /var/tmp may be held by various services and difficult to remove even with retries. Since we always recreate /var/tmp right after the cleanup attempt, this failure is not critical.

When losetup -a returns empty (no loop devices), splitting on newline produces an empty string which caused 'sudo losetup -d' to run with no argument and fail. Skip empty strings to avoid this spurious error.

Add a Go package that installs Garden from a compiled BOSH release tarball onto a VM via SSH, enabling integration tests to verify nftables firewall behavior inside Garden containers. Key components: - gardeninstaller: Extracts packages/templates from release tarball - SSH driver with sudo support for non-root users - XFS-backed grootfs stores (10GB for multiple stemcell images) - Generates config files matching garden-runc-release format The test suite now automatically installs Garden when GARDEN_RELEASE_TARBALL is set, removing the need for garden-runc as a BOSH release dependency. Tests verify firewall rules work in both Noble and Jammy stemcells.

…upport - Add UseDirectStore config option to skip XFS loopback creation for nested containers where loop devices are unavailable - Add initDirectStores() for grootfs init without store-size-bytes - Add initXFSStores() for standard XFS loopback initialization - Fix POSIX compatibility: use '.' instead of 'source' for shell sourcing - Add GardenClient methods for nested Garden containers: - NewGardenClientWithAddress: connect to nested Garden via forwarded ports - CreateNestedGardenContainer: create privileged container with cgroup mounts - StreamTarball: stream gzipped tarballs into containers - Container: expose underlying container for GardenDriver

…n tests - Extract Driver interface to integration/installerdriver package with Bootstrap(), Cleanup(), and IsBootstrapped() lifecycle methods - Move SSHDriver and GardenDriver to installerdriver package with config-based constructors (SSHDriverConfig, GardenDriverConfig) - Add GardenAPIClient helper for connecting to Garden through SSH tunnel - Create agentinstaller package for installing bosh-agent in containers - Add DialAgentThroughJumpbox() to utils for proper SSH tunneling to agent VM (fixes bind mount directory creation on wrong host) - Update garden_firewall_test.go and nested_garden_firewall_test.go to use the new Driver pattern with proper SSH connections - Add test_helpers.go with stemcell image constants and utility functions The refactor enables running Garden inside Garden containers (nested) for testing firewall rules at multiple nesting levels. All 10 single-level Garden firewall tests pass (Noble + Jammy stemcells).

…upport - Disable containerd mode for L1/L2 Garden installations (containerd cannot run inside containers due to missing capabilities and cgroups) - Use dynamic IP allocation from Garden pools instead of NetIn port forwarding (nftables-based systems may not support iptables NetIn rules) - Add TCP forwarder (socat) in L1 to reach L2 Garden from host - Add ContainerdMode and NetworkPool options to gardeninstaller Config - Add ContainerIP() and Handle() methods to GardenDriver - Fix busybox version mismatch (1.37.0 -> 1.36.1) to match release tarball - Document cgroup bind mount requirements for firewall in nested containers Network topology: Host -> SSH -> Agent VM (Garden L0 :7777, containerd=on) -> L1 container (Garden :7777, containerd=off, pool 10.253.0.0/22) -> L2 container (Garden :7777, containerd=off, pool 10.252.0.0/22) All 8 nested garden firewall tests now pass, verifying nftables rules work correctly up to 3 levels of container nesting.

Add netcat-based tunneling to reach nested Garden containers (L2) from the test runner. This approach spawns netcat processes inside intermediate containers to tunnel TCP traffic. Files added: - garden_process_conn.go: net.Conn wrapper around Garden process stdin/stdout - container_tunnel.go: persistent tunnel attempt (incomplete) - tcp_forwarder.go: Python-based TCP proxy for container port forwarding Changes: - garden_api_client.go: Add ensureLogger() to fix nil logger panic, add NewGardenAPIClientThroughContainer() for netcat tunneling - driver_garden.go: Add Container() method to expose underlying container - nested_garden_firewall_test.go: Use netcat tunnel for L2 Garden connectivity Known issue: EOF errors occur during long operations like container creation with large images, likely due to netcat process lifecycle issues. This commit preserves the netcat approach for reference before switching to the NetIn port forwarding approach.

NetIn is Garden's built-in port forwarding mechanism that creates iptables DNAT rules. This is more reliable than the netcat tunnel approach which suffered from EOF errors during long operations. - Use L0 Garden NetIn (17777→7777) to reach L1 Garden via external IP - Use L1 Garden NetIn (27777→7777) to reach L2 Garden via L1 container IP - Remove netcat-based tunnel code (garden_process_conn.go, container_tunnel.go, tcp_forwarder.go) - Simplify garden_api_client.go to only NewGardenAPIClient and NewGardenAPIClientDirect

…nnections - Fix cgroup v2 socket matching to use Level 0 (direct cgroup) instead of Level 2 - Add DROP rule for non-agent connections to monit port 2822 - Add IsRunningUnderSystemd() detection to warn when cgroup isolation is ineffective - Improve nested firewall test with agent log capture for debugging - Add agentinstaller package for bosh-agent installation in test containers - Add cgrouputils package for cgroup diagnostics in tests

github-advanced-security bot found potential problems Jan 30, 2026

View reviewed changes

platform/firewall/nftables_firewall.go Fixed Show fixed Hide fixed

cf-foundation-community-automation bot added this to Foundational Infrastructure Working Group Jan 30, 2026

cf-foundation-community-automation bot moved this to Inbox in Foundational Infrastructure Working Group Jan 30, 2026

rkoster mentioned this pull request Feb 2, 2026

Regenerate counterfeiter fakes with v6.12.1 #395

Merged

rkoster force-pushed the feature/agent-managed-firewall branch from 52ed4eb to 9e9299e Compare February 2, 2026 10:49

rkoster added 23 commits February 3, 2026 11:12

Fix cgroupv2 socket matching: use level 1 and trim leading slash

8aac7ab

Run nats firewall tests in order (ipv4 first)

154a760

Add Noble support to CI integration test script

d211046

Use OS detection to stop agent with appropriate service manager (systemctl for Noble, sv for Jammy) before installing new agent binary.

Fix NATS firewall test for Noble (pure cgroupv2 with systemd)

2ae2aa3

- Add regex pattern for pure cgroupv2 with systemd path format - Wait for NATS rules to be populated before checking nftables - Use Eventually for nftables check to handle timing issues

Clean up log file and nftables before NATS firewall test

ba03b76

- Truncate log file in BeforeEach to avoid finding old logs from pipeline setup - Delete existing nftables table to ensure clean state for fresh agent startup

Wait for nftables rules directly instead of log messages

5ece036

- Remove unreliable log-based waiting (differs between runit/systemd) - Wait up to 300 seconds for NATS rules to appear in nftables - Add meta skuid pattern for hybrid cgroup UID-based matching

Fix systemd agent start/stop for integration tests

27f6349

Remove nohup wrapper for systemd commands since systemctl is designed to work synchronously and doesn't need background execution like runit's sv.

Add debug logging for cgroup path and BeforeConnect

5c48e63

Add debug output to NATS firewall test after 30s timeout

922efcf

rkoster added 10 commits February 3, 2026 11:12

Fix errcheck lint errors in integration tests

8e2168f

Add //nolint:errcheck comments for intentionally ignored errors and check url.Parse error return value.

rkoster force-pushed the feature/agent-managed-firewall branch from f18859f to d262967 Compare February 3, 2026 11:12

rkoster added 13 commits February 3, 2026 11:22

Make /var/tmp cleanup non-fatal since we recreate it immediately after

5216420

On systemd systems, /var/tmp may be held by various services and difficult to remove even with retries. Since we always recreate /var/tmp right after the cleanup attempt, this failure is not critical.

Skip empty loop device names in ResetDeviceMap

2d8b61c

When losetup -a returns empty (no loop devices), splitting on newline produces an empty string which caused 'sudo losetup -d' to run with no argument and fail. Skip empty strings to avoid this spurious error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent-managed firewall with nftables for NATS/monit access control#394

Add agent-managed firewall with nftables for NATS/monit access control#394
rkoster wants to merge 46 commits intocloudfoundry:mainfrom
rkoster:feature/agent-managed-firewall

rkoster commented Jan 30, 2026

Uh oh!

Uh oh!

aramprice commented Jan 30, 2026

Uh oh!

rkoster commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rkoster commented Jan 30, 2026

Summary

Replaces

Supported Container Scenarios

Architectural Changes

Firewall lifecycle moved from network setup to agent bootstrap + mbus hooks

New Platform interface methods

Cross-table coordination with packet marks

Runtime cgroup detection

New CLI command for BOSH jobs

Note

Files Changed

New

Modified

Removed

Uh oh!

Uh oh!

aramprice commented Jan 30, 2026

Uh oh!

rkoster commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants