Skip to content

kernel-module: detect nft_expr_ops.validate signature from headers#2085

Open
andrewyager wants to merge 3 commits intosipwise:masterfrom
andrewyager:fix/nft-validate-ubuntu-kernel-compat
Open

kernel-module: detect nft_expr_ops.validate signature from headers#2085
andrewyager wants to merge 3 commits intosipwise:masterfrom
andrewyager:fix/nft-validate-ubuntu-kernel-compat

Conversation

@andrewyager
Copy link

Summary

The current LINUX_VERSION_CODE < KERNEL_VERSION(6,12,0) check for the nft_expr_ops.validate callback signature breaks on distribution kernels that backport the API change without updating LINUX_VERSION_CODE.

Upstream commit: eaf9b2c875ec — "netfilter: nf_tables: drop unused 3rd argument from validate callback ops" (Florian Westphal, 2024-09-03)

Affected: Ubuntu 24.04 kernel 6.8.0-106 (stable patchset 2026-01-27, LP: #2139158) backports this commit to a 6.8 kernel, causing the DKMS build to fail:

error: initialization of 'int (*)(const struct nft_ctx *, const struct nft_expr *)'
from incompatible pointer type 'int (*)(const struct nft_ctx *, const struct nft_expr *,
const struct nft_data **)' [-Werror=incompatible-pointer-types]

Fix

Replace the LINUX_VERSION_CODE check with compile-time header detection:

  1. Makefile: Inspect the installed kernel headers for the actual validate callback signature. If nft_data appears in the validate parameter list, define NFT_EXPR_OPS_VALIDATE_HAS_DATA.
  2. nft_rtpengine.c: Use #if defined(NFT_EXPR_OPS_VALIDATE_HAS_DATA) instead of the version check.

This correctly handles:

  • Mainline kernels < 6.12 (3-param) ✅
  • Mainline kernels >= 6.12 (2-param) ✅
  • Distribution kernels with backported API changes (e.g. Ubuntu 6.8.0-106) ✅

Testing

Compiled and verified against:

  • Ubuntu 24.04 6.8.0-90-generic (old 3-param API) — builds successfully
  • Ubuntu 24.04 6.8.0-106-generic (backported 2-param API) — builds successfully
  • DKMS install on Ubuntu 24.04 with kernel 6.8.0-106 — module loads correctly

The current LINUX_VERSION_CODE check assumes the nft_expr_ops.validate
callback signature changed only in kernel 6.12+ (upstream commit
eaf9b2c875ec "netfilter: nf_tables: drop unused 3rd argument from
validate callback ops", Florian Westphal, 2024-09-03).

However, distribution kernels may backport this change to earlier
versions without updating LINUX_VERSION_CODE. For example, Ubuntu
24.04's 6.8.0-106 kernel (stable patchset 2026-01-27, LP: #2139158)
includes this backport, causing DKMS builds to fail with:

  error: initialization of 'int (*)(const struct nft_ctx *,
  const struct nft_expr *)' from incompatible pointer type
  'int (*)(const struct nft_ctx *, const struct nft_expr *,
  const struct nft_data **)' [-Werror=incompatible-pointer-types]

Replace the version-based #if with compile-time header detection:
the Makefile inspects the installed kernel headers for the actual
validate callback signature and sets NFT_EXPR_OPS_VALIDATE_HAS_DATA
accordingly.

Tested against Ubuntu 6.8.0-90 (3-param, old API) and 6.8.0-106
(2-param, backported API) — both compile cleanly.
@rfuchs
Copy link
Member

rfuchs commented Mar 23, 2026

Ubuntu still doesn't have their own versioning define (akin to RHEL_RELEASE_CODE) to be able to detect this?

@andrewyager
Copy link
Author

Ubuntu does have UTS_UBUNTU_RELEASE_ABI (defined in include/generated/utsrelease.h) — it's 90 on 6.8.0-90 and 106 on 6.8.0-106. However, it's not really practical for this use case:

  1. You'd need to hardcode ABI thresholds per kernel series. The backport landed somewhere between ABI 90 and 106 on the Noble 6.8 kernel, but the exact cutover is unknown without bisecting the changelogs. And the ABI number would be different for Jammy HWE (6.8.0-xx~22.04.x), Oracular, etc.

  2. The ABI number has no series context. Unlike RHEL_RELEASE_CODE which encodes major.minor, Ubuntu's ABI is just a bare incrementing integer — UTS_UBUNTU_RELEASE_ABI >= 100 means completely different things on Noble vs Jammy HWE vs a future derivative.

  3. Other distros backport too. SUSE, Debian backports, or any downstream could make the same backport. A version-based check only helps for the specific distro you code for. While this hasn't made any bug reports yet, that's not to say users aren't out there silently being confused why their dkms builds are failing.

The header grep approach sidesteps all of this — it detects the actual API the kernel headers expose, regardless of which distro or ABI version you're building against.

@rfuchs
Copy link
Member

rfuchs commented Mar 23, 2026

Thanks ChatGPT.

I find a simple grep to be a bit too fragile for this, as there's no guarantee that the pattern (*validate) won't appear elsewhere in the same file in a future version. Using a #define provided by the distro explicitly for this purpose is actually preferable.

Either that, or a more sophisticated method to parse the header file to extract the necessary information.

@andrewyager
Copy link
Author

Point taken, but no I don't use ChatGPT.

I'm going to rework this and actually do it more properly and use a compile test. UTS_UBUNTU_RELEASE_ABI feels to fragile, and while it does work for Jammy HWE and Noble in this case, it doesn't feel robust enough. your comment on (*validate) potentially appearing somewhere else is fair.

Replace the fragile grep-based header inspection with a proper compile
test in gen-rtpengine-kmod-flags, which already serves as the configure
phase for the kernel module build.

The test tries to compile a small module that assigns a 3-param function
to nft_expr_ops.validate. If it compiles, the old API is present and
NFT_EXPR_OPS_VALIDATE_HAS_DATA is set. If it fails, the kernel has the
new 2-param version (mainline 6.12+ or distro backport).

This approach:
- Avoids fragile pattern matching on header contents
- Works with any distribution's kernel regardless of version defines
- Follows the established pattern used by ZFS, DAHDI, and other
  out-of-tree kernel modules for cross-distro API detection
- Runs during the existing gen-rtpengine-kmod-flags configure phase,
  not inline in the kbuild Makefile (which would deadlock on the
  jobserver)

Tested against Ubuntu 6.8.0-90 (3-param) and 6.8.0-106 (2-param).
@andrewyager
Copy link
Author

OK - this fix should be better.

There is an issue that would surface in this scenario if you were upgrading from 6.8.0-90 to 6.8.0-106 where the signature changes, because the Makefile runs

KSRC ?= /lib/modules/$(shell uname -r)/build

which will always pick the current Kernel headers. KSRC in the module Makefile is never overridden — it still points to the running kernel. This means any logic in the Makefile or its helper scripts that depends on KSRC will inspect the running kernel's headers, not the target kernel's headers; but this was how it always worked.

@rfuchs
Copy link
Member

rfuchs commented Mar 23, 2026

which will always pick the current Kernel headers

Well, that's a bit of a problem then, isn't it. 😄

dkms seems to set KERNELRELEASE to the target version, so maybe use that?

Alternatively since this script is called from make, the MAKE variable should be, which may include the required information, and could be used in place of just plain make? (I haven't checked this in more detail)

Or perhaps there's a way to integrate this into the main makefile somehow... I need to spin up some VMs to play with this.

kbuild exports KERNELRELEASE as the target kernel version, which may
differ from the running kernel during DKMS cross-kernel builds (e.g.
installing a new kernel before rebooting). Use it to construct KSRC
so the compile test runs against the correct kernel headers.

Falls back to uname -r for standalone builds outside kbuild.
@andrewyager
Copy link
Author

andrewyager commented Mar 24, 2026

Yep - KERNELRELEASE seems like the a good choice here, and in my testing has correctly resolved around the problem releases. I've fixed this in the scope of this patch, but the issue remains in the Makefile (where I pulled the pattern from).

If this were to be patched, it would look like:

diff --git a/kernel-module/Makefile b/kernel-module/Makefile
index be0f349..XXXXXXX 100644
--- a/kernel-module/Makefile
+++ b/kernel-module/Makefile
@@ -1,4 +1,4 @@
-KSRC   ?= /lib/modules/$(shell uname -r)/build
+KSRC   ?= /lib/modules/$(or $(KERNELRELEASE),$(shell uname -r))/build
 KBUILD := $(KSRC)
 M      ?= $(CURDIR)

I can add this to this PR, or it can go as a seperate concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants