File descriptor leak in supervisor after SECCOMP_IOCTL_NOTIF_ADDFD

## Context

We're evaluating sandlock for integration into our code execution platform and performed a security audit of the codebase. The project is very promising — the unprivileged Landlock + seccomp approach is exactly what we need, and the recent additions (`confine()`, dry-run, deterministic mode) are great. We'd like to contribute fixes and become active users.

Our audit found several issues; this is the most critical one.

## Summary

Every `NotifAction::InjectFdSend` leaks one file descriptor in the supervisor process. The supervisor-side fd is never closed after `SECCOMP_IOCTL_NOTIF_ADDFD` ioctl duplicates it into the child. Long-running sandboxes will hit `EMFILE` and fail.

## Severity

**Critical** — leads to supervisor resource exhaustion (DoS) under normal workload, not just adversarial input.

## Affected code

6 call sites create an fd, call `mem::forget()` to prevent `OwnedFd` from closing it, pass the raw integer via `InjectFdSend`, and never close it afterward:

| File | Function | Trigger |
|---|---|---|
| `procfs.rs:147-176` | `inject_memfd` | Every `/proc/cpuinfo`, `/proc/meminfo`, `/proc/net/*` read |
| `procfs.rs:160-173` | `handle_hostname_open` | Every `/etc/hostname` read (when `hostname` is set) |
| `cow/dispatch.rs:107-115` | `handle_cow_open` | Every `openat` in COW workdir |
| `random.rs:80-88` | `handle_random_open` | Every `/dev/urandom`, `/dev/random` open (when `random_seed` is set) |
| `chroot/dispatch.rs:261` | `handle_chroot_open` | Every redirected open under chroot |
| `chroot/dispatch.rs:274` | `handle_chroot_open` | Every redirected open under chroot |

## Root cause

`NotifAction::InjectFdSend` stores a `RawFd` (plain `i32`), not an `OwnedFd`. The handler in `send_response()` (notif.rs:442-449) passes this integer to `inject_fd_and_send()` which calls `SECCOMP_IOCTL_NOTIF_ADDFD`. The kernel **duplicates** the fd into the child but does **not** close the supervisor's copy. After the ioctl returns, `send_response` returns `Ok(())` without closing `srcfd`. The `NotifAction` enum has no `Drop` implementation, so when the action value is dropped, the `i32` is simply discarded — no `close()` ever happens.

The call sites use `std::mem::forget(memfd)` specifically to prevent `OwnedFd` from closing the fd before the ioctl. This is correct — but the fd must be closed **after** the ioctl, and nothing does that.

## Flow diagram

```mermaid
sequenceDiagram
 participant H as Handler (procfs/random/cow)
 participant SR as send_response (notif.rs:434)
 participant K as Kernel (ADDFD ioctl)
 participant C as Child process

 H->>H: memfd = memfd_create("sandlock-*")
 Note over H: supervisor owns fd 7

 H->>H: std::mem::forget(memfd)
 Note over H: OwnedFd destructor disabled fd 7 will NOT auto-close

 H->>SR: InjectFdSend { srcfd: 7 }

 SR->>K: ioctl(SECCOMP_IOCTL_NOTIF_ADDFD, srcfd=7, ADDFD_FLAG_SEND)
 K->>C: duplicates fd 7 → child gets fd 3
 K-->>SR: returns new_fd

 Note over SR: srcfd=7 still open in supervisor no close() called
 SR-->>H: Ok(())

 Note over H: fd 7 leaked forever process fd table grows by 1
```

## Impact

A Python script importing NumPy or Pandas triggers ~10 `/proc/*` reads. With `random_seed` enabled, each `/dev/urandom` open adds another leak. In COW mode, every file open leaks. A conservative estimate for a typical data science script:

- ~10 procfs reads + ~5 urandom opens + ~50 COW opens = **~65 leaked fds per execution**
- Default soft `RLIMIT_NOFILE` = 1024
- **~15 executions → supervisor hits EMFILE → all sandboxes on this supervisor fail**

With deterministic mode enabled (procfs + random + hostname + getdents), the leak rate is even higher.

## Reproduction

### Quick check: fd count grows after each run

```python
"""Run this with Python SDK — shows fd leak accumulating per sandbox execution."""
import os
from sandlock import Sandbox, Policy

def count_fds():
 return len(os.listdir(f"/proc/{os.getpid()}/fd"))

policy = Policy(
 fs_readable=["/"],
 random_seed=42, # enables /dev/urandom interception
 hostname="test", # enables /etc/hostname interception
 num_cpus=2, # enables /proc/cpuinfo interception
)

script = """
import os
for f in ['cpuinfo', 'meminfo', 'stat']:
 open(f'/proc/{f}').read()
os.urandom(16)
"""

print(f"before: {count_fds()} fds")

for i in range(20):
 Sandbox(policy).run(["python3", "-c", script], timeout=10)
 print(f"after run {i+1:>2}: {count_fds()} fds")

# Expected: fd count grows by ~10-15 per iteration, never decreases.
# Around iteration 15-20 (depending on RLIMIT_NOFILE), runs will start
# failing with EMFILE ("Too many open files").
```

### Crash scenario: EMFILE after ~15 runs

The leak is per-process, not system-wide — the OS is unaffected, but the supervisor
process (which stays alive across sandbox invocations) accumulates leaked fds until
it hits `RLIMIT_NOFILE` (typically 1024 soft). After that, all subsequent `memfd_create`,
`pipe`, `open`, and `socket` calls fail with `EMFILE`, and no new sandboxes can run.

Restarting the process clears all leaked fds. The leak does **not** persist across
process restarts — it only affects long-lived processes that run multiple sandboxes
(servers, daemons, worker pools, test loops via the SDK).

### Single-shot CLI is not affected

`sandlock run -- python3 script.py` spawns a new process each time, so leaked fds
are reclaimed by the kernel on exit. The leak only matters when the same process
reuses the `Sandbox` API repeatedly.

## Suggested fix

Change `InjectFdSend` to own the fd, so it is automatically closed after the ioctl:

```rust
// notif.rs — change the variant type:
pub enum NotifAction {
 // ...
 InjectFdSend { srcfd: OwnedFd }, // was: RawFd
 // ...
}

// notif.rs — send_response: OwnedFd is dropped after ioctl
NotifAction::InjectFdSend { srcfd } => {
 match inject_fd_and_send(fd, id, srcfd.as_raw_fd()) {
 Ok(_new_fd) => Ok(()),
 Err(_) => respond_continue(fd, id),
 }
 // srcfd: OwnedFd dropped here → close(fd) called automatically
}
```

Then all call sites replace `mem::forget` + raw fd with `OwnedFd::into()`:

```rust
// procfs.rs — before:
std::mem::forget(memfd);
NotifAction::InjectFdSend { srcfd: raw }

// procfs.rs — after:
NotifAction::InjectFdSend { srcfd: memfd } // OwnedFd moved, no forget needed
```

This fix:
- Closes all 6 leak points at once (the handler does the close, not each call site)
- Prevents future leaks — new `InjectFdSend` callers can't forget to close because `OwnedFd` does it automatically
- Is backwards compatible — no API change outside the crate

An alternative minimal fix (without changing `NotifAction`) would be to add `close(srcfd)` after `inject_fd_and_send()` in `send_response()`. This is simpler but doesn't prevent future call sites from leaking.

## Next steps

We're happy to submit a PR for whichever approach you prefer. We also found a few other issues during the audit (pipe created without `O_CLOEXEC`, unchecked `read_exact` return on the control pipe, `fs_denied` policy field not enforced) — we can file separate issues for those if that's helpful.

Thanks for building this project — the unprivileged sandbox space really needs a well-engineered solution.

File	Function	Trigger
`procfs.rs:147-176`	`inject_memfd`	Every `/proc/cpuinfo`, `/proc/meminfo`, `/proc/net/*` read
`procfs.rs:160-173`	`handle_hostname_open`	Every `/etc/hostname` read (when `hostname` is set)
`cow/dispatch.rs:107-115`	`handle_cow_open`	Every `openat` in COW workdir
`random.rs:80-88`	`handle_random_open`	Every `/dev/urandom`, `/dev/random` open (when `random_seed` is set)
`chroot/dispatch.rs:261`	`handle_chroot_open`	Every redirected open under chroot
`chroot/dispatch.rs:274`	`handle_chroot_open`	Every redirected open under chroot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File descriptor leak in supervisor after SECCOMP_IOCTL_NOTIF_ADDFD #6

Context

Summary

Severity

Affected code

Root cause

Flow diagram

Impact

Reproduction

Quick check: fd count grows after each run

Crash scenario: EMFILE after ~15 runs

Single-shot CLI is not affected

Suggested fix

Next steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

File descriptor leak in supervisor after SECCOMP_IOCTL_NOTIF_ADDFD #6

Description

Context

Summary

Severity

Affected code

Root cause

Flow diagram

Impact

Reproduction

Quick check: fd count grows after each run

Crash scenario: EMFILE after ~15 runs

Single-shot CLI is not affected

Suggested fix

Next steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions