-
Notifications
You must be signed in to change notification settings - Fork 5
File descriptor leak in supervisor after SECCOMP_IOCTL_NOTIF_ADDFD #6
Description
Context
We're evaluating sandlock for integration into our code execution platform and performed a security audit of the codebase. The project is very promising — the unprivileged Landlock + seccomp approach is exactly what we need, and the recent additions (confine(), dry-run, deterministic mode) are great. We'd like to contribute fixes and become active users.
Our audit found several issues; this is the most critical one.
Summary
Every NotifAction::InjectFdSend leaks one file descriptor in the supervisor process. The supervisor-side fd is never closed after SECCOMP_IOCTL_NOTIF_ADDFD ioctl duplicates it into the child. Long-running sandboxes will hit EMFILE and fail.
Severity
Critical — leads to supervisor resource exhaustion (DoS) under normal workload, not just adversarial input.
Affected code
6 call sites create an fd, call mem::forget() to prevent OwnedFd from closing it, pass the raw integer via InjectFdSend, and never close it afterward:
| File | Function | Trigger |
|---|---|---|
procfs.rs:147-176 |
inject_memfd |
Every /proc/cpuinfo, /proc/meminfo, /proc/net/* read |
procfs.rs:160-173 |
handle_hostname_open |
Every /etc/hostname read (when hostname is set) |
cow/dispatch.rs:107-115 |
handle_cow_open |
Every openat in COW workdir |
random.rs:80-88 |
handle_random_open |
Every /dev/urandom, /dev/random open (when random_seed is set) |
chroot/dispatch.rs:261 |
handle_chroot_open |
Every redirected open under chroot |
chroot/dispatch.rs:274 |
handle_chroot_open |
Every redirected open under chroot |
Root cause
NotifAction::InjectFdSend stores a RawFd (plain i32), not an OwnedFd. The handler in send_response() (notif.rs:442-449) passes this integer to inject_fd_and_send() which calls SECCOMP_IOCTL_NOTIF_ADDFD. The kernel duplicates the fd into the child but does not close the supervisor's copy. After the ioctl returns, send_response returns Ok(()) without closing srcfd. The NotifAction enum has no Drop implementation, so when the action value is dropped, the i32 is simply discarded — no close() ever happens.
The call sites use std::mem::forget(memfd) specifically to prevent OwnedFd from closing the fd before the ioctl. This is correct — but the fd must be closed after the ioctl, and nothing does that.
Flow diagram
sequenceDiagram
participant H as Handler<br/>(procfs/random/cow)
participant SR as send_response<br/>(notif.rs:434)
participant K as Kernel<br/>(ADDFD ioctl)
participant C as Child process
H->>H: memfd = memfd_create("sandlock-*")
Note over H: supervisor owns fd 7
H->>H: std::mem::forget(memfd)
Note over H: OwnedFd destructor disabled<br/>fd 7 will NOT auto-close
H->>SR: InjectFdSend { srcfd: 7 }
SR->>K: ioctl(SECCOMP_IOCTL_NOTIF_ADDFD,<br/>srcfd=7, ADDFD_FLAG_SEND)
K->>C: duplicates fd 7 → child gets fd 3
K-->>SR: returns new_fd
Note over SR: srcfd=7 still open in supervisor<br/>no close() called
SR-->>H: Ok(())
Note over H: fd 7 leaked forever<br/>process fd table grows by 1
Impact
A Python script importing NumPy or Pandas triggers ~10 /proc/* reads. With random_seed enabled, each /dev/urandom open adds another leak. In COW mode, every file open leaks. A conservative estimate for a typical data science script:
- ~10 procfs reads + ~5 urandom opens + ~50 COW opens = ~65 leaked fds per execution
- Default soft
RLIMIT_NOFILE= 1024 - ~15 executions → supervisor hits EMFILE → all sandboxes on this supervisor fail
With deterministic mode enabled (procfs + random + hostname + getdents), the leak rate is even higher.
Reproduction
Quick check: fd count grows after each run
"""Run this with Python SDK — shows fd leak accumulating per sandbox execution."""
import os
from sandlock import Sandbox, Policy
def count_fds():
return len(os.listdir(f"/proc/{os.getpid()}/fd"))
policy = Policy(
fs_readable=["/"],
random_seed=42, # enables /dev/urandom interception
hostname="test", # enables /etc/hostname interception
num_cpus=2, # enables /proc/cpuinfo interception
)
script = """
import os
for f in ['cpuinfo', 'meminfo', 'stat']:
open(f'/proc/{f}').read()
os.urandom(16)
"""
print(f"before: {count_fds()} fds")
for i in range(20):
Sandbox(policy).run(["python3", "-c", script], timeout=10)
print(f"after run {i+1:>2}: {count_fds()} fds")
# Expected: fd count grows by ~10-15 per iteration, never decreases.
# Around iteration 15-20 (depending on RLIMIT_NOFILE), runs will start
# failing with EMFILE ("Too many open files").Crash scenario: EMFILE after ~15 runs
The leak is per-process, not system-wide — the OS is unaffected, but the supervisor
process (which stays alive across sandbox invocations) accumulates leaked fds until
it hits RLIMIT_NOFILE (typically 1024 soft). After that, all subsequent memfd_create,
pipe, open, and socket calls fail with EMFILE, and no new sandboxes can run.
Restarting the process clears all leaked fds. The leak does not persist across
process restarts — it only affects long-lived processes that run multiple sandboxes
(servers, daemons, worker pools, test loops via the SDK).
Single-shot CLI is not affected
sandlock run -- python3 script.py spawns a new process each time, so leaked fds
are reclaimed by the kernel on exit. The leak only matters when the same process
reuses the Sandbox API repeatedly.
Suggested fix
Change InjectFdSend to own the fd, so it is automatically closed after the ioctl:
// notif.rs — change the variant type:
pub enum NotifAction {
// ...
InjectFdSend { srcfd: OwnedFd }, // was: RawFd
// ...
}
// notif.rs — send_response: OwnedFd is dropped after ioctl
NotifAction::InjectFdSend { srcfd } => {
match inject_fd_and_send(fd, id, srcfd.as_raw_fd()) {
Ok(_new_fd) => Ok(()),
Err(_) => respond_continue(fd, id),
}
// srcfd: OwnedFd dropped here → close(fd) called automatically
}Then all call sites replace mem::forget + raw fd with OwnedFd::into():
// procfs.rs — before:
std::mem::forget(memfd);
NotifAction::InjectFdSend { srcfd: raw }
// procfs.rs — after:
NotifAction::InjectFdSend { srcfd: memfd } // OwnedFd moved, no forget neededThis fix:
- Closes all 6 leak points at once (the handler does the close, not each call site)
- Prevents future leaks — new
InjectFdSendcallers can't forget to close becauseOwnedFddoes it automatically - Is backwards compatible — no API change outside the crate
An alternative minimal fix (without changing NotifAction) would be to add close(srcfd) after inject_fd_and_send() in send_response(). This is simpler but doesn't prevent future call sites from leaking.
Next steps
We're happy to submit a PR for whichever approach you prefer. We also found a few other issues during the audit (pipe created without O_CLOEXEC, unchecked read_exact return on the control pipe, fs_denied policy field not enforced) — we can file separate issues for those if that's helpful.
Thanks for building this project — the unprivileged sandbox space really needs a well-engineered solution.