SovereignCloudStack · garloff · May 8, 2026 · May 9, 2026 · May 9, 2026 · May 9, 2026
diff --git a/.nvmrc b/.nvmrc
@@ -1 +1 @@
-v18
+v20
diff --git a/blog/2026-05-10-kernel-root-exploits.md b/blog/2026-05-10-kernel-root-exploits.md
@@ -0,0 +1,254 @@
+---
+slug: kernel_local_root_exploits
+title: Linux Kernel local root exploits CVE-2026-31431, -43284, -43500
+authors: [garloff]
+tags: [security, linux, cve, copy.fail, dirtyfrag]
+---
+
+## Linux root exploits (Local Privilege Escalation)
+
+Unix is designed as a multi-user system. Different users have their own
+files and processes and can work without interference from others.
+Linux lives in that tradition. It has advanced the concept with namespaces
+where users can also have a private view on networking, process list, filesystems
+and other pieces that are traditionally shared (read-only) on a Unix system,
+also including some resource management to enhance performance isolation.
+
+It is the operating system's kernel's job to keep the separation safe; in
+particular, normal users must not achieve the system administrator (root)
+privileges. Where the kernel fails to ensure this, we have a "local root"
+vulnerability, a Local Privilege Escalation (LPE).
+
+The Linux kernel is a large and a complex beast. On one hand it has sophisticated
+mechanisms to get really good performance out of increasingly complex hardware.
+On the other hand, it comes with a huge variety of device drivers. From time to
+time, vulnerabilities are found, reported and fixed. The Linux kernel has several
+LPEs per year. Most of the time, they affect only a small fraction of users
+(typically by being located in a device driver or somewhat exotic feature)
+and often they are hard to exploit, needing to win a race condition with
+many attempts and sometimes causing crashes in trying (which may not go unnoticed).
+
+We don't normally report about these LPEs. They get fixed by the upstream Linux kernel
+developers, shipped as stable updates by the maintainers and shipped to the end
+users via kernel updates from the Linux distributors.
+
+## copy.fail and Dirty Frag
+
+The currently highly visible Linux kernel issues [copy.fail](https://copy.fail/)
+and [Dirty Frag](https://github.com/V4bel/dirtyfrag) are both LPEs (local root
+vulnerabilities). The reason we report about them is that they both affect
+most Linux users (with kernels from the last 9 years) and are easy to exploit.
+
+Like [Dirty Pipe](https://dirtypipe.cm4all.com/) and before
+[Dirty Cow](https://dirtycow.ninja/), both LPEs rely on improper protection
+of the page cache.
+The Linux kernel keeps contents from file systems in the page cache; when code
+gets executed, it is mapped into your virtual memory. When the memory page is
+accessed and not yet loaded into your physical memory, a page fault occurs and
+the relevant blocks are loaded from disk — or the access is denied and your
+program receives `SIGSEGV` and is terminated. Copying pages is costly and the
+kernel avoids it to achieve higher performance. If you write to a memory page,
+the kernel may receive a page fault on a read-only mapping (that it created to
+avoid copying) and only then do the copy to create a private writable copy.
+This approach is called copy-on-write (COW) and is common in modern operating
+systems. If a page from the page cache is changed in memory, it is also marked
+"dirty", so the kernel knows it needs to write the changes back to the file system.
+
+In copy.fail, the `aead` crypto module did some cryptography in place, avoiding
+the need to allocate an extra buffer. Unfortunately, it requires 4 extra bytes
+under some conditions; normally aead is used by IPsec and that location is a
+designate place in a network buffer. However, a local attacker can make this
+write happen to a page cache page by using `splice`. This way, the copy of the
+`sudo` binary in the page cache can be overwritten, allowing to circumvent the
+safeguards there. The attacker can trivially become root — as the page is not
+dirtied, no trace of the corruption will be visible on the disk.
+[copy.fail](https://copy.fail/) has been assigned CVE-2026-31431.
+
+In Dirty Frag, a network buffer that is split over several fragments is not
+properly handled and the fragmented buffer is not properly COW'ed. The AEAD
+crypto operation then again overwrites 4 bytes. A local attacker can trigger
+this again become root very quickly by overwriting the page cache's view of
+`sudo`. (Of course other sensitive binary code could be overwritten in memory.)
+This can be triggered via the IPsec `esp_input` (for both IPv4 and IPv6) as well
+as via the `rxrpc` code. The esp variant requires the privilege to create user
+namespaces and then allows for easy 4 byte writes at a time. It has been assigned
+CVE-2026-43284. The rxrpc variant overwrites 8 bytes and doe not require the
+namespace creation privileges, but as these bytes are crypted,
+the user needs to brute force them in order to achieve a controlled result. This
+variant was assigned CVE-2026-43500.
+
+_Exploiting these vulnerabilities requires access to the system and the ability
+to execute code there, thus the categorization as Local Privilege Escalation (LPE),
+not Remote Code Execution (RCE)._
+
+## Impact
+
+Any system where normal (non-root) users can log in to execute code under their
+own control is no longer secure: The users can use the publicly available
+exploits to gain root privileges and get access to whatever the (virtual)
+machine has access to. This means accessing other user's data as well as secrets
+that may be stored by the system administrator.
+
+Such systems are less common these days than they were 20 years ago. The reason
+is that virtualization has become a commodity, so in many scenarios, individual
+users may use their own virtual machine rather than having access to a shared
+(virtual) machine.
+
+Note that this vulnerability does NOT break the isolation of virtual machines.
+VMs remain as securely isolated as they would be without this vulnerability.
+These LPEs do NOT establish a virtualization escape.
+
+There is however a common scenario where individual users and workloads
+are running inside a container. The LPE also allows for escaping containers.
+Running a shell inside a kubernetes pod allows you to get control of the
+kubernetes node and thus of everything that your kubernetes cluster has
+access to. Running untrusted code in a container is thus very risky — something
+that will affect e.g. CI setups.
+
+## Fixes
+
+A fix to the Linux kernel for Copy.fail was silently merged at the end of March
+2026 (for 7.0-rc7) and also been merged to the stable kernel series (6.18.22,
+6.12.85, 6.6.137).
+It just disables the in-place optimization for `algif_aed`. As of early May,
+Linux distributors are currently underway to ship fixed kernels.
+Without a fixed kernel, a workaround is to place a file `copyfail.conf` in
+`/etc/modprobe.d/` with the contents:
+
+```shell
+# Temporary workaround for copy.fail CVE-2026-31431
+install algif_aead /bin/false
+```
+
+The fixes for Dirty Frag are still in development as of May 8. The first fixes
+have been merged upstream and released in 7.0.5, 6.18.28, 6.12.87, 6.6.138,
+6.1.172, 5.15.206 and 5.10.255 but there is
+[more to come for rxrpc](https://lwn.net/ml/all/2026050859-ahead-anchovy-05e2@gregkh/).
+The responsible disclosure process for Dirty Frag was unfortunately broken,
+so the upstream maintainers and the distributors this time did not have time
+to carefully prepare and test fixes ahead of the publication of the issue.
+So we have to expect that it will take a few days until all Linux distributor
+manage to ship tested fixed kernels.
+
+A fully effective workaround is again to prevent loading the affected modules
+by placing another file `dirtyfrag.conf` in `/etc/modprobe.d/`:
+
+```shell
+# Temporary workaround for Dirty Frag CVE-2026-43284, CVE-2026-43500
+# This breaks IPsec
+install esp4 /bin/false
+install esp6 /bin/false
+install rxrpc /bin/false
+```
+
+Note that these workarounds prevent IPsec from working.
+
+If a system is suspected to already have been exploited, the system owner can
+dispose of the page cache by doing `echo 3 > /proc/sys/vm/drop_caches` as root
+and unload the affected modules to prevent re-exploitation.
+This will discard the modified page cache pages — however an attacker could have
+used its gained privileges to install further backdoors etc. into the system, so
+it will need to be reinstalled or fully audited to be considered trustable again.
+
+## SCS IaaS Cloud Provider exposure
+
+None of the control-plane / management systems in a normal SCS cloud infrastructure
+can be logged in by normal users. The LPE thus can not be exploited. However,
+should another exploit be found and used successfully, the LPEs may be used
+to escalate privileges further, e.g. breaking out of the containers that run
+the OpenStack services or Ceph or some of the management tools and thus remove
+one layer of a defense-in-depth concept.
+
+Cloud Providers are advised to install updated kernels to reestablish the defense.
+They can apply the module loading prevention measures in the meantime. Providers
+are advised to use this with care on the network nodes — if these need to support
+IPsec (e.g. for OpenStack's VPNaaS which is part of neutron), the non-loadable
+modules may prevent correct operation. Please note that there is no known remote
+exploit via IPsec, so a temporary trade-off to live without the defense-in-depth
+and not break IPsec (and this way create security and functionality issues or for
+customers) may be justified.
+
+Cloud providers often provide VM images for their customers.
+To support the customers to keep the security separation in the customer's VMs,
+they are advised to watch out for the availability of new distribution images
+and provide them short-term via their image service (glance).
+
+## SCS Kubernetes Provider exposure
+
+The default implementation with SCS Cluster Stacks is vulnerable; the current
+node images have a kernel that is affected by this weakness. This allows a user
+to break out of the containers running in the cluster to take over the node
+VM and other containers.
+
+With Cluster-API and the SCS Cluster Stacks building
+on them, creating, updating and removing Kubernetes clusters has become
+a commodity; it is thus normal to create clusters per development team and
+not share them. In this scenario, the break out may allow a developer to
+take over containers from his team mates which may not constitute a real danger
+in many setups. For cluster setups across teams or worse for setups where several
+clusters that belong to different entities share a control plane, this becomes
+more serious.
+
+Note that the LPE also removes a defense-in-depth mechanism, where a user of
+a service running in a k8s cluster exploits a vulnerability to be able to
+execute code in the container — the LPEs can then be used to escalate the
+privileges further.
+
+As soon as new kernels become available, the node images will be rebuilt and
+shipped with the next cluster stack patch releases. For users, the normal
+rolling upgrade will then be all that's needed to be secure against this LPE
+again.
+
+We will update this advisory as soon as new node images are available.
+
+For highly critical workloads, cluster operators can log in to the nodes
+and deploy the mechanisms to prevent loading the above-mentioned modules.
+(Again, this will break IPsec.) Note that logging in to nodes in an SCS
+Cluster Stack cluster is not possible by default; it requires booting
+into a rescue image (if the cluster runs on OpenStack) to inject an ssh
+key or to use a tool like kubectl-node-shell with the appropriate
+privileges.
+
+```bash
+for node in $(kubectl get nodes | grep -v '^NAME' | awk '{print $1;}') do;
+  kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable algif_aead (copy.fail)\ninstall algif_aead /bin/false" > /etc/modprobe.d/disable-aead-copyfail.conf'
+  kubectl node_shell "$node" -- bash -c 'echo -e "# Temporarily disable esp4, esp6, rxrpc (Dirty Frag)\ninstall esp4 /bin/false\ninstall esp6 /bin/false\ninstall rxrpc /bin/false" > /etc/modprobe.d/disable-esp46-rxrpc-dirtyfrag.conf'
+done
+```
+
+## SCS Cloud users
+
+Customers of SCS IaaS clouds are responsible for their own VMs. For VMs
+that are exposed, they should use the documented workaround inside their VMs,
+online-update and reboot into a fixed kernel or redeploy their VMs based
+on a fixed upstream image.
+
+Customers that do their own Kubernetes Container Cluster Management
+with e.g. SCS Cluster Stacks are advised to watch out for new node
+images and then perform the rolling upgrade. If their use scenario puts
+them at increased risk, they are advised to prevent the module loading
+in the meantime, as advised above.
+
+## SCS community infrastructure
+
+The SCS community infrastructure was secured on May 8 by disabling the
+relevant modules.
+
+## Thanks
+
+The authors would like to thank Taeyang Lee at Xint (who initiated the
+research on copy.fail) and Hyunwoo Kim (@v4bel, who discovered Dirty Frag).
+They would also like to thank the upstream Linux kernel maintainers and
+Linux distributors for their reliable work no handling the issues and
+getting fixes out.
+
+## Sovereign Cloud Stack Security Contact
+
+SCS security contact is [security@scs.community](mailto:security@scs.community), as published on
+[https://scs.community/.well-known/security.txt](https://scs.community/.well-known/security.txt).
+
+## Version history
+
+- Initial Draft, v0.1, 2026-05-08, 17:15 CEST.
+- kubectl node-shell instructions, v0.2, 2026-05-09, 12:45 CEST.
+- Mention succssful patching of community infra, v0.3, 2026-05-09, 13:30 CEST.