Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions seccomp/default_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,34 @@ func DefaultProfile() *Seccomp {
Arches: []string{"s390", "s390x"},
},
},
{
LinuxSyscall: specs.LinuxSyscall{
Names: []string{
"unshare",
},
Action: specs.ActAllow,
Args: []specs.LinuxSeccompArg{
{
Index: 0,
Value: unix.CLONE_NEWNS,
ValueTwo: 0,
Op: specs.OpMaskedEqual,
},
{
Index: 0,
Value: unix.CLONE_NEWUTS,
ValueTwo: 0,
Op: specs.OpMaskedEqual,
},
{
Index: 0,
Value: unix.CLONE_NEWUSER,
ValueTwo: 0,
Op: specs.OpMaskedEqual,
},
},
},
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLONE_NEWUSER does not seem covered in this PR.

Anyway, it might be still scary to allow CLONE_NEWUSER by default, due to its several vulnerabilities in the past (CVE-2023–0386, etc.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker documentation lists unshare as a thing that is blocked by default, but doesn't provide an example of how an authorized user would unblock it or an argument to it for a particular workload. It just says you can pass a seccomp profile (which it seems to expect you to already be competent to create based on the default one) or you can use --security-opt seccomp=unconfined.

Given the number of people who actually know how to write a seccomp profile, and that there isn't a slightly-more-permissive one included as an available option, probably the vast majority of workloads that need user namespaces are currently running completely unconfined.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLONE_NEWUSER does not seem covered in this PR.

Whoops, I somehow messed up PR and did CLONE_UTS twice.

Copy link
Copy Markdown
Author

@slonopotamus slonopotamus Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it might be still scary to allow CLONE_NEWUSER by default, due to its several vulnerabilities in the past (CVE-2023–0386, etc.)

There is a long discussion in moby/moby#42441 about safety of these. I just wanted to make a PR that (if this feature is declared safe) would implement the change.

And I said, my specific usecase is to make Buildah work. Building with Buildah inside a container will always be at least as safe (and normally much safer) than building with docker build outside of container, even if there will be a future vulnerability in unshare.

I agree with @adamnovak that writing a custom seccomp policy is too hard.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably the vast majority of workloads that need user namespaces are currently running completely unconfined

Or, worse yet, use custom builds of applications with sandboxing disabled (https://bugs.passt.top/show_bug.cgi?id=116#c6).

{
LinuxSyscall: specs.LinuxSyscall{
Names: []string{
Expand Down