Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .nvmrc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v18
v20
254 changes: 254 additions & 0 deletions blog/2026-05-10-kernel-root-exploits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
---
slug: kernel_local_root_exploits
title: Linux Kernel local root exploits CVE-2026-31431, -43284, -43500
authors: [garloff]
tags: [security, linux, cve, copy.fail, dirtyfrag]
---

## Linux root exploits (Local Privilege Escalation)

Unix is designed as a multi-user system. Different users have their own
files and processes and can work without interference from others.
Linux lives in that tradition. It has advanced the concept with namespaces
where users can also have a private view on networking, process list, filesystems
and other pieces that are traditionally shared (read-only) on a Unix system,
also including some resource management to enhance performance isolation.

It is the operating system's kernel's job to keep the separation safe; in
particular, normal users must not achieve the system administrator (root)
privileges. Where the kernel fails to ensure this, we have a "local root"
vulnerability, a Local Privilege Escalation (LPE).

The Linux kernel is a large and a complex beast. On one hand it has sophisticated
mechanisms to get really good performance out of increasingly complex hardware.
On the other hand, it comes with a huge variety of device drivers. From time to
time, vulnerabilities are found, reported and fixed. The Linux kernel has several
LPEs per year. Most of the time, they affect only a small fraction of users
(typically by being located in a device driver or somewhat exotic feature)
and often they are hard to exploit, needing to win a race condition with
many attempts and sometimes causing crashes in trying (which may not go unnoticed).

We don't normally report about these LPEs. They get fixed by the upstream Linux kernel
developers, shipped as stable updates by the maintainers and shipped to the end
users via kernel updates from the Linux distributors.

## copy.fail and Dirty Frag

The currently highly visible Linux kernel issues [copy.fail](https://copy.fail/)
and [Dirty Frag](https://github.com/V4bel/dirtyfrag) are both LPEs (local root
vulnerabilities). The reason we report about them is that they both affect
most Linux users (with kernels from the last 9 years) and are easy to exploit.

Like [Dirty Pipe](https://dirtypipe.cm4all.com/) and before
[Dirty Cow](https://dirtycow.ninja/), both LPEs rely on improper protection
of the page cache.
The Linux kernel keeps contents from file systems in the page cache; when code
gets executed, it is mapped into your virtual memory. When the memory page is
accessed and not yet loaded into your physical memory, a page fault occurs and
the relevant blocks are loaded from disk — or the access is denied and your
program receives `SIGSEGV` and is terminated. Copying pages is costly and the
kernel avoids it to achieve higher performance. If you write to a memory page,
the kernel may receive a page fault on a read-only mapping (that it created to
avoid copying) and only then do the copy to create a private writable copy.
This approach is called copy-on-write (COW) and is common in modern operating
systems. If a page from the page cache is changed in memory, it is also marked
"dirty", so the kernel knows it needs to write the changes back to the file system.

In copy.fail, the `aead` crypto module did some cryptography in place, avoiding
the need to allocate an extra buffer. Unfortunately, it requires 4 extra bytes
under some conditions; normally aead is used by IPsec and that location is a
designate place in a network buffer. However, a local attacker can make this
write happen to a page cache page by using `splice`. This way, the copy of the
`sudo` binary in the page cache can be overwritten, allowing to circumvent the
safeguards there. The attacker can trivially become root — as the page is not
dirtied, no trace of the corruption will be visible on the disk.
[copy.fail](https://copy.fail/) has been assigned CVE-2026-31431.

In Dirty Frag, a network buffer that is split over several fragments is not
properly handled and the fragmented buffer is not properly COW'ed. The AEAD
crypto operation then again overwrites 4 bytes. A local attacker can trigger
this again become root very quickly by overwriting the page cache's view of
`sudo`. (Of course other sensitive binary code could be overwritten in memory.)
This can be triggered via the IPsec `esp_input` (for both IPv4 and IPv6) as well
as via the `rxrpc` code. The esp variant requires the privilege to create user
namespaces and then allows for easy 4 byte writes at a time. It has been assigned
CVE-2026-43284. The rxrpc variant overwrites 8 bytes and doe not require the
namespace creation privileges, but as these bytes are crypted,
the user needs to brute force them in order to achieve a controlled result. This
variant was assigned CVE-2026-43500.

_Exploiting these vulnerabilities requires access to the system and the ability
to execute code there, thus the categorization as Local Privilege Escalation (LPE),
not Remote Code Execution (RCE)._

## Impact

Any system where normal (non-root) users can log in to execute code under their
own control is no longer secure: The users can use the publicly available
exploits to gain root privileges and get access to whatever the (virtual)
machine has access to. This means accessing other user's data as well as secrets
that may be stored by the system administrator.

Such systems are less common these days than they were 20 years ago. The reason
is that virtualization has become a commodity, so in many scenarios, individual
users may use their own virtual machine rather than having access to a shared
(virtual) machine.

Note that this vulnerability does NOT break the isolation of virtual machines.
VMs remain as securely isolated as they would be without this vulnerability.
These LPEs do NOT establish a virtualization escape.

There is however a common scenario where individual users and workloads
are running inside a container. The LPE also allows for escaping containers.
Running a shell inside a kubernetes pod allows you to get control of the
kubernetes node and thus of everything that your kubernetes cluster has
access to. Running untrusted code in a container is thus very risky — something
that will affect e.g. CI setups.

## Fixes

A fix to the Linux kernel for Copy.fail was silently merged at the end of March
2026 (for 7.0-rc7) and also been merged to the stable kernel series (6.18.22,
6.12.85, 6.6.137).
It just disables the in-place optimization for `algif_aed`. As of early May,
Linux distributors are currently underway to ship fixed kernels.
Without a fixed kernel, a workaround is to place a file `copyfail.conf` in
`/etc/modprobe.d/` with the contents:

```shell
# Temporary workaround for copy.fail CVE-2026-31431
install algif_aead /bin/false
```

The fixes for Dirty Frag are still in development as of May 8. The first fixes
have been merged upstream and released in 7.0.5, 6.18.28, 6.12.87, 6.6.138,
6.1.172, 5.15.206 and 5.10.255 but there is
[more to come for rxrpc](https://lwn.net/ml/all/2026050859-ahead-anchovy-05e2@gregkh/).
The responsible disclosure process for Dirty Frag was unfortunately broken,
so the upstream maintainers and the distributors this time did not have time
to carefully prepare and test fixes ahead of the publication of the issue.
So we have to expect that it will take a few days until all Linux distributor
manage to ship tested fixed kernels.

A fully effective workaround is again to prevent loading the affected modules
by placing another file `dirtyfrag.conf` in `/etc/modprobe.d/`:

```shell
# Temporary workaround for Dirty Frag CVE-2026-43284, CVE-2026-43500
# This breaks IPsec
install esp4 /bin/false
install esp6 /bin/false
install rxrpc /bin/false
```

Note that these workarounds prevent IPsec from working.

If a system is suspected to already have been exploited, the system owner can
dispose of the page cache by doing `echo 3 > /proc/sys/vm/drop_caches` as root
and unload the affected modules to prevent re-exploitation.
This will discard the modified page cache pages — however an attacker could have
used its gained privileges to install further backdoors etc. into the system, so
it will need to be reinstalled or fully audited to be considered trustable again.

## SCS IaaS Cloud Provider exposure

None of the control-plane / management systems in a normal SCS cloud infrastructure
can be logged in by normal users. The LPE thus can not be exploited. However,
should another exploit be found and used successfully, the LPEs may be used
to escalate privileges further, e.g. breaking out of the containers that run
the OpenStack services or Ceph or some of the management tools and thus remove
one layer of a defense-in-depth concept.

Cloud Providers are advised to install updated kernels to reestablish the defense.
They can apply the module loading prevention measures in the meantime. Providers
are advised to use this with care on the network nodes — if these need to support
IPsec (e.g. for OpenStack's VPNaaS which is part of neutron), the non-loadable
modules may prevent correct operation. Please note that there is no known remote
exploit via IPsec, so a temporary trade-off to live without the defense-in-depth
and not break IPsec (and this way create security and functionality issues or for
customers) may be justified.

Cloud providers often provide VM images for their customers.
To support the customers to keep the security separation in the customer's VMs,
they are advised to watch out for the availability of new distribution images
and provide them short-term via their image service (glance).

## SCS Kubernetes Provider exposure

The default implementation with SCS Cluster Stacks is vulnerable; the current
node images have a kernel that is affected by this weakness. This allows a user
to break out of the containers running in the cluster to take over the node
VM and other containers.

With Cluster-API and the SCS Cluster Stacks building
on them, creating, updating and removing Kubernetes clusters has become
a commodity; it is thus normal to create clusters per development team and
not share them. In this scenario, the break out may allow a developer to
take over containers from his team mates which may not constitute a real danger
in many setups. For cluster setups across teams or worse for setups where several
clusters that belong to different entities share a control plane, this becomes
more serious.

Note that the LPE also removes a defense-in-depth mechanism, where a user of
a service running in a k8s cluster exploits a vulnerability to be able to
execute code in the container — the LPEs can then be used to escalate the
privileges further.

As soon as new kernels become available, the node images will be rebuilt and
shipped with the next cluster stack patch releases. For users, the normal
rolling upgrade will then be all that's needed to be secure against this LPE
again.

We will update this advisory as soon as new node images are available.

For highly critical workloads, cluster operators can log in to the nodes
and deploy the mechanisms to prevent loading the above-mentioned modules.
(Again, this will break IPsec.) Note that logging in to nodes in an SCS
Cluster Stack cluster is not possible by default; it requires booting
into a rescue image (if the cluster runs on OpenStack) to inject an ssh
key or to use a tool like kubectl-node-shell with the appropriate
privileges.

```bash
for node in $(kubectl get nodes | grep -v '^NAME' | awk '{print $1;}') do;
kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable algif_aead (copy.fail)\ninstall algif_aead /bin/false" > /etc/modprobe.d/disable-aead-copyfail.conf'
kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable esp4, esp6, rxrpc (Dirty Frag)\ninstall esp4 /bin/false\ninstall esp6 /bin/false\ninstall rxrpc /bin/false" > /etc/modprobe.d/disable-esp46-rxrpc-dirtyfrag.conf'
done
```

## SCS Cloud users

Customers of SCS IaaS clouds are responsible for their own VMs. For VMs
that are exposed, they should use the documented workaround inside their VMs,
online-update and reboot into a fixed kernel or redeploy their VMs based
on a fixed upstream image.

Customers that do their own Kubernetes Container Cluster Management
with e.g. SCS Cluster Stacks are advised to watch out for new node
images and then perform the rolling upgrade. If their use scenario puts
them at increased risk, they are advised to prevent the module loading
in the meantime, as advised above.

## SCS community infrastructure

The SCS community infrastructure was secured on May 8 by disabling the
relevant modules.

## Thanks

The authors would like to thank Taeyang Lee at Xint (who initiated the
research on copy.fail) and Hyunwoo Kim (@v4bel, who discovered Dirty Frag).
They would also like to thank the upstream Linux kernel maintainers and
Linux distributors for their reliable work no handling the issues and
getting fixes out.

## Sovereign Cloud Stack Security Contact

SCS security contact is [security@scs.community](mailto:security@scs.community), as published on
[https://scs.community/.well-known/security.txt](https://scs.community/.well-known/security.txt).

## Version history

- Initial Draft, v0.1, 2026-05-08, 17:15 CEST.
- kubectl node-shell instructions, v0.2, 2026-05-09, 12:45 CEST.
- Mention succssful patching of community infra, v0.3, 2026-05-09, 13:30 CEST.
Loading