diff --git a/blog/2026-05-10-kernel-root-exploits.md b/blog/2026-05-10-kernel-root-exploits.md new file mode 100644 index 0000000000..28716e09ae --- /dev/null +++ b/blog/2026-05-10-kernel-root-exploits.md @@ -0,0 +1,256 @@ +--- +slug: kernel_local_root_exploits +title: Linux Kernel local root exploits CVE-2026-31431, -43284, -43500 +authors: [garloff] +tags: [security, linux, cve, copy.fail, dirtyfrag] +--- + +## Linux root exploits (Local Privilege Escalation) + +Unix is designed as a multi-user system. Different users have their own +files and processes and can work without interference from others. +Linux lives in that tradition. It has advanced the concept with namespaces +where users can also have a private view on networking, process list, filesystems +and other pieces that are traditionally shared (read-only) on a Unix system, +also including some resource management to enhance performance isolation. + +It is the operating system's kernel's job to keep the separation safe; in +particular, normal users must not achieve the system administrator (root) +privileges. Where the kernel fails to ensure this, we have a "local root" +vulnerability, a Local Privilege Escalation (LPE). + +The Linux kernel is a large and a complex beast. On one hand it has sophisticated +mechanisms to get really good performance out of increasingly complex hardware. +On the other hand, it comes with a huge variety of device drivers. From time to +time, vulnerabilities are found, reported and fixed. The Linux kernel has several +LPEs per year. Most of the time, they affect only a small fraction of users +(typically by being located in a device driver or somewhat exotic feature) +and often they are hard to exploit, needing to win a race condition with +many attempts and sometimes causing crashes in trying (which may not go unnoticed). + +We don't normally report about these LPEs. They get fixed by the upstream Linux kernel +developers, shipped as stable updates by the maintainers and shipped to the end +users via kernel updates from the Linux distributors. + +## copy.fail and Dirty Frag + +The currently highly visible Linux kernel issues [copy.fail](https://copy.fail/) +and [Dirty Frag](https://github.com/V4bel/dirtyfrag) are both LPEs (local root +vulnerabilities). The reason we report about them is that they both affect +most Linux users (with kernels from the last 9 years) and are easy to exploit. + +Like [Dirty Pipe](https://dirtypipe.cm4all.com/) and before +[Dirty Cow](https://dirtycow.ninja/), both LPEs rely on improper protection +of the page cache. +The Linux kernel keeps contents from file systems in the page cache; when code +gets executed, it is mapped into your virtual memory. When the memory page is +accessed and not yet loaded into your physical memory, a page fault occurs and +the relevant blocks are loaded from disk — or the access is denied and your +program receives `SIGSEGV` and is terminated. Copying pages is costly and the +kernel avoids it to achieve higher performance. If you write to a memory page, +the kernel may receive a page fault on a read-only mapping (that it created to +avoid copying) and only then do the copy to create a private writable copy. +This approach is called copy-on-write (COW) and is common in modern operating +systems. If a page from the page cache is changed in memory, it is also marked +"dirty", so the kernel knows it needs to write the changes back to the file system. + +In copy.fail, the `aead` crypto module did some cryptography in place, avoiding +the need to allocate an extra buffer. Unfortunately, it requires 4 extra bytes +under some conditions; normally aead is used by IPsec and that location is a +designate place in a network buffer. However, a local attacker can make this +write happen to a page cache page by using `splice`. This way, the copy of the +`sudo` binary in the page cache can be overwritten, allowing to circumvent the +safeguards there. The attacker can trivially become root — as the page is not +dirtied, no trace of the corruption will be visible on the disk. +[copy.fail](https://copy.fail/) has been assigned CVE-2026-31431. + +In Dirty Frag, a network buffer that is split over several fragments is not +properly handled and the fragmented buffer is not properly COW'ed. The AEAD +crypto operation then again overwrites 4 bytes. A local attacker can trigger +this again become root very quickly by overwriting the page cache's view of +`sudo`. (Of course other sensitive binary code could be overwritten in memory.) +This can be triggered via the IPsec `esp_input` (for both IPv4 and IPv6) as well +as via the `rxrpc` code. The esp variant requires the privilege to create user +namespaces and then allows for easy 4 byte writes at a time. It has been assigned +CVE-2026-43284. The rxrpc variant overwrites 8 bytes and doe not require the +namespace creation privileges, but as these bytes are crypted, +the user needs to brute force them in order to achieve a controlled result. This +variant was assigned CVE-2026-43500. + +_Exploiting these vulnerabilities requires access to the system and the ability +to execute code there, thus the categorization as Local Privilege Escalation (LPE), +not Remote Code Execution (RCE)._ + +## Impact + +Any system where normal (non-root) users can log in to execute code under their +own control is no longer secure: The users can use the publicly available +exploits to gain root privileges and get access to whatever the (virtual) +machine has access to. This means accessing other user's data as well as secrets +that may be stored by the system administrator. + +Such systems are less common these days than they were 20 years ago. The reason +is that virtualization has become a commodity, so in many scenarios, individual +users may use their own virtual machine rather than having access to a shared +(virtual) machine. + +Note that this vulnerability does NOT break the isolation of virtual machines. +VMs remain as securely isolated as they would be without this vulnerability. +These LPEs do NOT establish a virtualization escape. + +There is however a common scenario where individual users and workloads +are running inside a container. The LPE also allows for escaping containers. +Running a shell inside a kubernetes pod allows you to get control of the +kubernetes node and thus of everything that your kubernetes cluster has +access to. Running untrusted code in a container is thus very risky — something +that will affect e.g. CI setups. + +## Fixes + +A fix to the Linux kernel for Copy.fail was silently merged at the end of March +2026 (for 7.0-rc7) and also been merged to the stable kernel series (6.18.22, +6.12.85, 6.6.137). +It just disables the in-place optimization for `algif_aed`. As of early May, +Linux distributors are currently underway to ship fixed kernels. +Without a fixed kernel, a workaround is to place a file `copyfail.conf` in +`/etc/modprobe.d/` with the contents: + +```shell +# Temporary workaround for copy.fail CVE-2026-31431 +install algif_aead /bin/false +``` + +The fixes for Dirty Frag are still in development as of May 8. The first fixes +have been merged upstream and released in 7.0.5, 6.18.28, 6.12.87, 6.6.138, +6.1.172, 5.15.206 and 5.10.255 but there is +[more to come for rxrpc](https://lwn.net/ml/all/2026050859-ahead-anchovy-05e2@gregkh/). +The responsible disclosure process for Dirty Frag unfortunately failed due to the +[patches being spotted](https://www.openwall.com/lists/oss-security/2026/05/07/12), +so the upstream maintainers and the distributors this time did not have time +to carefully prepare and test fixes ahead of the publication of the issue. +So we have to expect that it will take a few days until all Linux distributor +manage to ship tested fixed kernels. + +A fully effective workaround is again to prevent loading the affected modules +by placing another file `dirtyfrag.conf` in `/etc/modprobe.d/`: + +```shell +# Temporary workaround for Dirty Frag CVE-2026-43284, CVE-2026-43500 +# This breaks IPsec +install esp4 /bin/false +install esp6 /bin/false +install rxrpc /bin/false +``` + +Note that these workarounds prevent IPsec from working. + +If a system is suspected to already have been exploited, the system owner can +dispose of the page cache by doing `echo 3 > /proc/sys/vm/drop_caches` as root +and unload the affected modules to prevent re-exploitation. +This will discard the modified page cache pages — however an attacker could have +used its gained privileges to install further backdoors etc. into the system, so +it will need to be reinstalled or fully audited to be considered trustable again. + +## SCS IaaS Cloud Provider exposure + +None of the control-plane / management systems in a normal SCS cloud infrastructure +can be logged in by normal users. The LPE thus can not be exploited. However, +should another exploit be found and used successfully, the LPEs may be used +to escalate privileges further, e.g. breaking out of the containers that run +the OpenStack services or Ceph or some of the management tools and thus remove +one layer of a defense-in-depth concept. + +Cloud Providers are advised to install updated kernels to reestablish the defense. +They can apply the module loading prevention measures in the meantime. Providers +are advised to use this with care on the network nodes — if these need to support +IPsec (e.g. for OpenStack's VPNaaS which is part of neutron), the non-loadable +modules may prevent correct operation. Please note that there is no known remote +exploit via IPsec, so a temporary trade-off to live without the defense-in-depth +and not break IPsec (and this way create security and functionality issues or for +customers) may be justified. + +Cloud providers often provide VM images for their customers. +To support the customers to keep the security separation in the customer's VMs, +they are advised to watch out for the availability of new distribution images +and provide them short-term via their image service (glance). + +## SCS Kubernetes Provider exposure + +The default implementation with SCS Cluster Stacks is vulnerable; the current +node images have a kernel that is affected by this weakness. This allows a user +to break out of the containers running in the cluster to take over the node +VM and other containers. + +With Cluster-API and the SCS Cluster Stacks building +on them, creating, updating and removing Kubernetes clusters has become +a commodity; it is thus normal to create clusters per development team and +not share them. In this scenario, the break out may allow a developer to +take over containers from his team mates which may not constitute a real danger +in many setups. For cluster setups across teams or worse for setups where several +clusters that belong to different entities share a control plane, this becomes +more serious. + +Note that the LPE also removes a defense-in-depth mechanism, where a user of +a service running in a k8s cluster exploits a vulnerability to be able to +execute code in the container — the LPEs can then be used to escalate the +privileges further. + +As soon as new kernels become available, the node images will be rebuilt and +shipped with the next cluster stack patch releases. For users, the normal +rolling upgrade will then be all that's needed to be secure against this LPE +again. + +We will update this advisory as soon as new node images are available. + +For highly critical workloads, cluster operators can log in to the nodes +and deploy the mechanisms to prevent loading the above-mentioned modules. +(Again, this will break IPsec.) Note that logging in to nodes in an SCS +Cluster Stack cluster is not possible by default; it requires booting +into a rescue image (if the cluster runs on OpenStack) to inject an ssh +key or to use a tool like kubectl-node-shell with the appropriate +privileges. + +```bash +for node in $(kubectl get nodes | grep -v '^NAME' | awk '{print $1;}') do; + kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable algif_aead (copy.fail)\ninstall algif_aead /bin/false" > /etc/modprobe.d/disable-aead-copyfail.conf' + kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable esp4, esp6, rxrpc (Dirty Frag)\ninstall esp4 /bin/false\ninstall esp6 /bin/false\ninstall rxrpc /bin/false" > /etc/modprobe.d/disable-esp46-rxrpc-dirtyfrag.conf' +done +``` + +## SCS Cloud users + +Customers of SCS IaaS clouds are responsible for their own VMs. For VMs +that are exposed, they should use the documented workaround inside their VMs, +online-update and reboot into a fixed kernel or redeploy their VMs based +on a fixed upstream image. + +Customers that do their own Kubernetes Container Cluster Management +with e.g. SCS Cluster Stacks are advised to watch out for new node +images and then perform the rolling upgrade. If their use scenario puts +them at increased risk, they are advised to prevent the module loading +in the meantime, as advised above. + +## SCS community infrastructure + +The SCS community infrastructure was secured on May 8 by disabling the +relevant modules. + +## Thanks + +The authors would like to thank Taeyang Lee at Xint (who initiated the +research on copy.fail) and Hyunwoo Kim (@v4bel, who discovered Dirty Frag). +They would also like to thank the upstream Linux kernel maintainers and +Linux distributors for their reliable work no handling the issues and +getting fixes out. + +## Sovereign Cloud Stack Security Contact + +SCS security contact is [security@scs.community](mailto:security@scs.community), as published on +[https://sovereigncloudstack.org/.well-known/security.txt](https://scs.community/.well-known/security.txt). + +## Version history + +- Initial Draft, v0.1, 2026-05-08, 17:15 CEST. +- kubectl node-shell instructions, v0.2, 2026-05-09, 12:45 CEST. +- Mention succssful patching of community infra, v0.3, 2026-05-09, 13:30 CEST. +- Correct facts on the failure of the responsible disclosure. Release as v1.0, 2026-05-09, 20:00 CEST.