fix(net): empty net_allow deny-all bypass + strengthen false-passing security tests#103
Merged
Conversation
`test_no_network_blocks_connect` connected to 127.0.0.1:80 (nothing listening) and treated any OSError as "blocked", so it passed with zero network enforcement. `test_xoa_executor_no_network` asserted `not result.success` on a malformed executor command, failing for plumbing reasons rather than because egress was denied. Both are negative tests that assert "the connection failed" without controlling why. Rebind each against a live loopback listener so an *allowed* connect would succeed, and assert the connect is denied with EACCES (errno 13) — Landlock's CONNECT_TCP hook. This makes them discriminate a real deny from an incidental failure. Signed-off-by: Cong Wang <cwang@multikernel.io>
Cover the contract that an empty `net_allow` denies all outbound TCP even when filesystem grants are present — the configuration in which any fs grant turns on the named-AF_UNIX connect gate (`has_unix_fs_gate`) and `connect()` starts being trapped for every address family. - Rust: crates/sandlock-core/tests/integration/test_network.rs - Python: python/tests/test_sandbox.py (TestNetAllowDenyAll) Both use a live loopback listener so an *allowed* connect would succeed, and assert the connect is denied with EACCES (errno 13). They fail on the pre-fix code (the supervisor performs the connect on-behalf, bypassing Landlock) and pass once the on-behalf path defers unconstrained IP connects to the kernel. Signed-off-by: Cong Wang <cwang@multikernel.io>
… invariant The sendto/sendmsg on-behalf handlers already follow a clear rule: with a network destination policy active the supervisor is the IP-level enforcer and must never Continue; without one there is nothing to enforce, so the syscall is returned to the kernel and Landlock/BPF govern it. connect() violated that invariant. Once any fs grant turns on the named- AF_UNIX connect gate, connect() is trapped for every address family, and the handler performed IP connects on-behalf unconditionally — in the (unconfined) supervisor, bypassing the child's Landlock CONNECT_TCP decision. With fs grants present (i.e. essentially always), an empty net_allow (deny-all) silently allowed all outbound TCP. Bring connect() in line with the datagram handlers: add `NotifPolicy::ip_connect_supervised(dest_is_loopback)` and return SECCOMP_USER_NOTIF_FLAG_CONTINUE for an IP connect that needs no supervision. The child then issues the connect directly and its own Landlock CONNECT_TCP rules govern it — deny-all for an empty net_allow, allow-all for `*:*` (CONNECT_TCP left unhandled). port-remap still needs on-behalf, but only for loopback, so it is the sole extra reason to supervise an otherwise-unpoliced IP connect. AllowList/DenyList checks, HTTP ACL, IP allowlists, and UDP/ICMP handling are unchanged. Also rename the flag that drives this invariant from the misleading `has_net_allowlist` to `has_net_destination_policy` — it is set from `network_destination_policy` (net_allow OR net_deny OR policy_fn OR http_acl), not just an allowlist — so the connect gate, the sendto/sendmsg fast path, and the dispatch registration all read honestly. This also makes the redundant `|| has_http_acl` (already subsumed) in the connect trap condition visible, and drops it. With the empty case now gated to Continue, `NetworkPolicy::Unrestricted` is only ever reached meaning "allow all", so its doc is corrected accordingly. Signed-off-by: Cong Wang <cwang@multikernel.io>
…p assert) test_deny_returns_true connected to a dead port (127.0.0.1:1) with connect_ex, ignored the result, and asserted only that the process ran — so it passed even if the deny did nothing (port 1 fails regardless). Same false-positive shape as the network deny-tests fixed in b7d800a. Rebind it against a live loopback listener on an allowlisted port (so the connect would succeed without the deny) and assert EPERM (errno 1) plus a post-deny marker proving the process survives. Adds a shared `_loopback_listener` helper for the deny-tests that follow. Signed-off-by: Cong Wang <cwang@multikernel.io>
test_deny_with_errno connected to 127.0.0.1:1 — outside the net_allow allowlist — so Landlock denied it with the same errno 13 the policy_fn returns. The assertion passed even if the policy_fn errno path were a no-op. Target a live listener on an allowlisted port instead: Landlock permits it and the listener would accept it, so errno 13 can only originate from the policy_fn verdict. Signed-off-by: Cong Wang <cwang@multikernel.io>
test_restrict_network_on_execve called restrict_network([]) — an empty list is a no-op (the override is skipped when it lists no IPs) — and asserted only that the program printed output. It verified nothing about network restriction. Use two live loopback listeners (127.0.0.1 and 127.0.0.2), both allowlisted so either would connect, then restrict to ["127.0.0.1"] and assert the first connects while the second is refused (errno 111). Without enforcement both would connect. Signed-off-by: Cong Wang <cwang@multikernel.io>
…ed IPs test_restrict_pid_network restricted the exec'd pid then ran `echo ok` — it made no network call, so it observed nothing. Mirror the restrict_network test: two live loopback listeners (both allowlisted), restrict the pid to ["127.0.0.1"], and assert the listed IP connects while the non-listed one is refused (errno 111). Signed-off-by: Cong Wang <cwang@multikernel.io>
…al test handle_memory read the limit from the static NotifPolicy snapshot (`policy.max_memory_bytes`) and never from the LivePolicy that `restrict_max_memory` writes, so dynamically tightening the limit had no effect — a sandbox could allocate far past the restricted size. Read the live limit when a policy_fn is active (it is seeded from the static `max_memory` ceiling and only lowered by `restrict_max_memory`), falling back to the static limit otherwise. The policy_fn lock is taken and released before the resource lock, preserving lock ordering. test_restrict_max_memory previously restricted memory, ran `echo`, and asserted nothing. It now sets a 256 MiB ceiling, restricts to 64 MiB, and asserts a 128 MiB allocation is killed — while the same allocation under the un-restricted ceiling (control) succeeds, proving the kill is due to the dynamic restriction. Signed-off-by: Cong Wang <cwang@multikernel.io>
With policy_fn active, every fork-like syscall is ptrace-traced so the new child is registered before it can run user code (argv-safety TOCTOU). That path deadlocked, and fixing it surfaced three couplings, all addressed here. 1. Pre-Continue ptrace-stop deadlock. prepare_process_creation_tracking PTRACE_INTERRUPT'd the caller and waited for a stop *before* the seccomp Continue was sent — but a task parked in the seccomp user-notification wait cannot reach a ptrace-stop, so waitpid blocked forever. Now the tracee is only PTRACE_SEIZE'd before Continue (SEIZE does not stop it); the fork event is awaited afterward. 2. Single-thread coordinator, no busy-poll. ptrace *commands* are per-tracer-thread (cross-thread issue fails with ESRCH; only waitpid may be cross-thread), so the whole SEIZE -> wait -> detach sequence runs in one spawn_blocking worker, driven over a channel by finish() after Continue. The worker uses a single blocking waitpid (deterministic for a successful fork — the event is synchronous). A *failed* fork emits no event, so finish() bounds the wait with an async tokio::time watchdog that wakes the worker via kill(SIGURG); the worker treats that signal as "no child". No dedicated thread, no spinning, no INTERRUPT race. 3. Competing waiter. Sandbox::wait() reaped the top-level child with a waitpid(pid, 0) loop, and waitpid (any flags) reaps a tracee's ptrace-stops — so it raced the fork-tracking worker for the child's fork events and hung repeated forks. wait() now detects exit via the child's pidfd + AsyncFd (readable on exit only, never consuming ptrace-stops), mirroring spawn_pid_watcher, then reaps the status. A blocking waitpid fallback remains for kernels without pidfd_open. Verified: 200 sequential forks under policy_fn, forced fork failures (EAGAIN), and repeated separate runs all complete with no hangs; full Rust (398 lib + 257 integration) and Python (354) suites pass. Signed-off-by: Cong Wang <cwang@multikernel.io>
handle_fork compared the concurrent process count against the static ResourceState.max_processes and never against the LivePolicy that restrict_max_processes writes, so dynamically tightening the process limit had no effect. Read the live limit when a policy_fn is active (it is seeded from the static max_processes and only lowered by restrict_max_processes), falling back to the static limit otherwise — mirroring the restrict_max_memory fix. handle_fork now takes the SupervisorCtx (for ctx.policy_fn and ctx.resource) instead of the resource handle alone; the policy_fn lock is taken and released before the resource lock to preserve lock ordering. Verified: restrict_max_processes(1) now denies the fork with EAGAIN while the unrestricted baseline allows it, deterministically and without hanging (the latter relying on the fork-tracking deadlock fix). Signed-off-by: Cong Wang <cwang@multikernel.io>
test_restrict_max_processes restricted the limit then ran `echo` and asserted nothing. Restrict to 1 and fork: assert the fork is denied with EAGAIN (errno 11), while the unrestricted control run forks successfully — proving the denial is due to the dynamic limit. Now meaningful (and no longer hangs) given the fork-tracking deadlock fix and the restrict_max_processes enforcement fix. Signed-off-by: Cong Wang <cwang@multikernel.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Everything here traces to one failure mode: security tests that pass without the security actually happening. Fixing those tests exposed real bugs in the enforcement they were supposed to cover.
1.
net_allow=[]silently allowed all outbound TCP (Landlock bypass)Once any fs grant turns on the named-
AF_UNIXconnect gate (has_unix_fs_gate— i.e. essentially always),connect()is trapped for every address family. The on-behalf handler then performed IP connects in the unconfined supervisor, bypassing the child's LandlockCONNECT_TCPdeny-all. So an emptynet_allow(deny-all) permitted all outbound TCP.Fix: bring
connect()in line with the invariant thesendto/sendmsghandlers already follow — no network destination policy ⇒ nothing to enforce ⇒ return the syscall to the kernel so Landlock governs it. Also renames the misnamedhas_net_allowlist→has_net_destination_policyand corrects theNetworkPolicy::Unrestricteddoc.Regression introduced in
4a8a7bf(post-v0.8.3tag). The publishedv0.8.3release is NOT affected — verified against the PyPI wheel; only source builds frommainbetween4a8a7bfand this PR are.2. False-passing security tests
A cluster of tests exercised a deny/restrict API but only asserted "the process still ran", so they passed even when enforcement was a complete no-op:
restrict_network([])— an empty list is a no-oprestrict_pid_network— ranecho, never made a network callrestrict_max_memory/restrict_max_processes— restricted, then ranecho, asserted nothingAll now use live loopback listeners (so an allowed op would actually succeed) or real allocations/forks, and assert the enforcement-specific outcome (EACCES/EPERM/EAGAIN, killed process, refused connect). Equivalently weak tests in
test_mcp_integration.py/test_pipeline.pyare fixed too.3. Three
policy_fnproduct bugs the strengthening uncoveredMaking those tests real revealed the underlying dynamic-policy features were broken:
restrict_max_memorywas a no-op —handle_memoryread the staticNotifPolicylimit, never theLivePolicythatrestrict_max_memorywrites. Now reads the live (tightened) limit.restrict_max_processeswas a no-op — same shape inhandle_fork(read staticrs.max_processes). Now reads the live limit.policy_fn+fork()deadlocked the sandbox. The fork-event ptrace tracking thatpolicy_fnenables hung on any fork. Root cause unfolded into three couplings, all fixed:Continueptrace-stop deadlock —prepareINTERRUPT'd and waited for a stop before the seccompContinue, but a tracee parked in the notify wait can't reach a ptrace-stop. NowSEIZE-only beforeContinue; the fork event is awaited after.ESRCH; onlywaitpidmay be cross-thread), so the wholeSEIZE → wait → detachruns in onespawn_blockingworker driven over a channel. A successful fork is caught by a single blockingwaitpid; a failed fork (no event) is bounded by an asynctokio::timewatchdog that wakes the worker viakill(SIGURG). No dedicated thread, no spin, noINTERRUPTrace.Sandbox::wait()reaped the top-level child with awaitpid(pid,0)loop, andwaitpid(any flags) reaps a tracee's ptrace-stops, so it raced the fork-tracking worker and hung repeated forks.wait()now detects exit via the child'spidfd+AsyncFd(readable on exit only, never consuming ptrace-stops), mirroringspawn_pid_watcher.Fork tracking still uses ptrace (the namespace-less TOCTOU guarantee requires
TRACEFORK); pidfd is only for exit detection.Test plan
cargo test -p sandlock-core— 398 lib + 257 integration pass (incl. newtest_empty_net_allow_denies_tcp_despite_fs_grantsand the fork-tracking integration test exercising the new coordinator).pytest python/tests— 354 pass.policy_fn, forced fork failures (EAGAIN), and repeated runs all complete with zero hangs.net_allow=[]→ EACCES; matching rule → ok; non-matching → refused;*:*→ ok.