Skip to content

Please pull 26.04 linux nvidia.glue#428

Open
fyu1 wants to merge 191 commits into
NVIDIA:26.04_linux-nvidiafrom
fyu1:26.04_linux-nvidia.glue
Open

Please pull 26.04 linux nvidia.glue#428
fyu1 wants to merge 191 commits into
NVIDIA:26.04_linux-nvidiafrom
fyu1:26.04_linux-nvidia.glue

Conversation

@fyu1
Copy link
Copy Markdown
Collaborator

@fyu1 fyu1 commented May 18, 2026

This MPAM PR has 4 parts:

1-47: backported from upstream
48: enable RESCTRL_FS
49-52: forward ported from 6.17 hwe
53: fix issues on Grace
Please review and merge to 7.0 hwe.

jacobmartin0 and others added 30 commits April 14, 2026 11:27
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…idia

Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit 1a32c7f noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit 0d1a2de noble:linux-nvidia-6.17)
[jacobmartin: dropped uses of CONFIG_PREEMPT_NONE /
CONFIG_PREEMPT_VOLUNTARY, these have been disabled upstream for arm64
and amd64 arches by commit 7dadeaa ("sched: Further restrict the
preemption modes") in favor of CONFIG_PREEMPT_LAZY, which is default in
the parent kernel.]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Add support for exposing rprovides data for standalone modules
too. Switch to exposing provides as a shared debian/substvar file
and use that in the templates.

Ignore: yes
Signed-off-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Ian May <ian.may@canonical.com>
(cherry picked from commit afacdda noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit 52ba185)

(cherry picked from commit 52ba185 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 8f0710a noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
This reverts commit 7a51fff.

This stale debian/dkms-versions scripting is still used for derivatives
of linux without a linux-main-modules package to parse the main
package's dkms-versions file for out-of-tree module builds.

Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
When nvidia-fs-dkms is available as a dkms package, we want to
default to using the signed modules if possible.  Adding
a version number for the nvidia-fs modules package enables the inbox
modules to be selected over an equivalent dkms version.

Ignore: yes
Signed-off-by: Ian May <ian.may@canonical.com>
(cherry picked from commit 607379d noble:linux-nvidia/main-next)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit f6927df)

(cherry picked from commit f6927df noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 750ba56 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…nvidia kernels

BugLink: https://bugs.launchpad.net/bugs/2060327

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Ian May <ian.may@canonical.com>
[jacobmartin: Add note to changed configs]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 9b2615a noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit e3b5061 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ations

BugLink: https://bugs.launchpad.net/bugs/2060327

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Ian May <ian.may@canonical.com>
[jacobmartin: Add annotations note for changed configs]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 3d31ea0 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit 6de4075 noble:linux-nvidia-6.17)
[jacobmartin: set new CoreSight configs:
  - CONFIG_CORESIGHT_TNOC=m
  - CONFIG_CORESIGHT_CTCU=m]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
Ignore: yes
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit 448ddcb)

(cherry picked from commit 448ddcb noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit b4e9b91 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2061930
BugLink: https://bugs.launchpad.net/bugs/2067106

There are systems in production that don't have
firmware that supports coresight_etm4x.  Instead of
removing completely, blacklist coresight_etm4x so
systems with the correct firmware can use the module.

Signed-off-by: Ian May <ian.may@canonical.com>
Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(backported from commit 217d1ae
noble:linux-nvidia-6.14)
[maskedarray: adjusted context]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 3f7d900 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2067111

Nvidia provide a way to flash the UEFI via capsule loader in arm64.
CAPSULE_LOADER is also built-in in L4T kernel so for the easy use,
need to make CAPSULE_LOADER as built-in in arm64.

Nvidia-BugLink: https://nvbugspro.nvidia.com/bug/4601764

Signed-off-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
(cherry picked from commit efbc80a noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit 58d6077)

(cherry picked from commit 58d6077 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 812ae1e noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2095028

This is used for GPU memory mapping. The solution is a WAR while waiting
for the upstream solution that would use dmabuf to map the entire range
in a single sequence.

Related topics:
https://lore.kernel.org/kvm/20240624065552.1572580-1-vivek.kasireddy@intel.com/
https://lore.kernel.org/kvm/cover.1719909395.git.leon@kernel.org/

Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
(cherry picked from commit d3d7b64f1a3274e5df04dee1a8062f54a3fa1116 nvidia/kstable/dev/nic/iommufd_vsmmu-12122024)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 15e066a noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 8fcaed8 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit ef306c8 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…ualization

BugLink: https://bugs.launchpad.net/bugs/2095028

This adds the following config options to annotations:

            CONFIG_ARM_SMMU_V3_IOMMUFD=y
            CONFIG_IOMMUFD_DRIVER_CORE=y
            CONFIG_IOMMUFD_VFIO_CONTAINER=y
            CONFIG_NVGRACE_GPU_VFIO_PCI=m
            CONFIG_VFIO_CONTAINER=n
            CONFIG_VFIO_IOMMU_TYPE1=-

For CMA size requirements, the 64K kernel configuration needs 640MB
in the worst-case scenario, while the 4K kernel configuration requires 40MB.
Due to the current CMA alignment requirement of 512MB on 64k kernel and
128MB on 4k kernel, use each as default
            For 64k kernel, CONFIG_CMA_SIZE_MBYTES=1024
            For 4k kernel, CONFIG_CMA_SIZE_MBYTES=128

These config options has been defined in debian.master
            CONFIG_IOMMUFD=m
            CONFIG_IOMMU_IOPF=y

Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Kai-Heng Feng <kaihengf@nvidia.com>
Acked-by: Koba Ko <kobak@nvidia.com>
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
(backported from commit 35a55f3 24.04_linux-nvidia-adv-6.8-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit 1314cf0 noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit d09b7e2 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(backported from commit 3660ee5 noble:linux-nvidia-6.17)
[mochs: Removed CONFIG_TEGRA241_CMDQV=n; we want it =y from debian.master]
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2096888

Add ACPI support to 8250_mtk driver. This makes it possible to
use UART on ARM-based desktops with EDK2 UEFI firmware.

Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 4647186 noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit d73760e noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 072848c noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2096882

Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit a99eb0f noble:linux-nvidia-6.11)
[jacobmartin: Drop addition of 13d3:3604 already added by upstream
commit f9685f3 ("Bluetooth: btusb: Add MediaTek MT7925-B22M support
ID 0x13d3:0x3604"). Drop driver_info flag "BTUSB_VALID_LE_STATES" as it
was inverted by upstream commit 0fec656 ("Bluetooth: btusb: Invert
LE State flag to set invalid rather then valid")]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(backported from commit a1d77cd
noble:linux-nvidia-6.14)
[maskedarray: adjusted context]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit f79eaa9 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…eam low power

BugLink: https://bugs.launchpad.net/bugs/2107509

Add a quirk to avoid U1 and U2 low power state operations
during bulk stream transfers.

Change-Id: Iaff484625eca6708713d0c2acaeddfc1103ac7d2
Signed-off-by: Us Chien <us.chien@mediatek.com>
Signed-off-by: Yenchia Chen <yenchia.chen@mediatek.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Acked-by: Brad Figg <bfigg@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 07399e8 noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(backported from commit e521e80)
[maskedarray: changed the XHCI_NVIDIA_MT8901_HOST quirk bit value
to 51]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 08ca4af noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2109730

Realtek R8127 driver can be downloaded from
https://www.realtek.com/Download/List?cate_id=584

Where it is maintained as out of tree module.

This patch adds the extracted content of r8127-11.014.00.tar.bz2 in
the folder drivers/net/ethernet/realtek/r8127.

4bd62fc87de32760fb1f3b9cd3ec14e933035623  r8127-11.014.00.tar.bz2

All the clean-up, makefile and Kconfig related changes will be
done in the subsequent commits. The source code contains a GPL2
compatible license. All the license information and Realtek
copyright notice will be maintained in each file and newly added files.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Ian May <ianm@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
(cherry picked from commit 7faf7ac noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit e45f1b7 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 24068b2 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2109730

These files are not needed to build r8127 as part of kernel
source code build, so removed these non required files.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Ian May <ianm@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
(cherry picked from commit 063d338 noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 712fc60 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit ee5f3b0 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2109730

This commit moved all files from src folder to parent folder itself.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Ian May <ianm@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
(cherry picked from commit a5fe39b noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 1802cd3 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit f83397f noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2109730

In the original code, r8127 driver was build as out of tree module.
This commit adds Kconfig and updates Makefile for building it
with kernel build.

r8127 driver internally uses different config flags and these are set
through EXTRA_CFLAGS.  These config flags are now set in the Makefile
with ccflags-y. All the flags, that were getting enabled by default in
the original code, have been enabled in ccflags-y. This commit is not
enabling any extra flags.

Some of the files compilation are dependent upon a particular flag.
Now, only default flags are set, so these files will become unused,
This commit has removed these files.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Ian May <ianm@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
(backported from commit 04ea6d0 noble:linux-nvidia-6.11)
[jacobmartin: adjust context around RTASE definitions introduced in
K6.14]
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>

(cherry picked from commit 6217fea noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit d423ea7 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
…127 module

BugLink: https://bugs.launchpad.net/bugs/2109730

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Matt Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Ian May <ianm@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Ian May <ianm@nvidia.com>
(cherry picked from commit 59db394 noble:linux-nvidia-6.11)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
(cherry picked from commit aaa5490)

(cherry picked from commit aaa5490 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit 1edd05e noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2111511

- crb_acpi_add() checks for start method
- If start method is ACPI_TPM2_CRB_WITH_ARM_FFA, then
  it invokes tpm_crb_ffa_init().
- The tpm_crb_ffa_init() uses IS_REACHABLE()

    #if IS_REACHABLE(CONFIG_TCG_ARM_CRB_FFA)
    int tpm_crb_ffa_init(void);
    #else
    static inline int tpm_crb_ffa_init(void) { return 0; }
    #endif

  So, either tpm_crb (configured with CONFIG_TCG_CRB)
  should be module or we need to make
  tpm_crb_ffa (CONFIG_TCG_ARM_CRB_FFA) built-in.

- CONFIG_TCG_CRB is selected by other configs so making
  it module won't be feasible. We can
  enable CONFIG_TCG_ARM_CRB_FFA to make tpm_crb_ffa
  built-in.

- This also requires to select CONFIG_ARM_FFA_TRANSPORT=y

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 60809f8)

(cherry picked from commit 60809f8 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>
(cherry picked from commit b054f0b noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

The FFH (Functional Fixed Hardware) operation region is maintained by
ARM in https://developer.arm.com/documentation/den0048/latest/

OperationRegion (RegionName, RegionSpace, Offset, Length)

For ARM FFH, Offset is used to identify the functionality offered by
this FFH address space. It must be set to one of the following values:

- 0x0 to indicate usage of 32-bit calling convention
- 0x1 to indicate usage of 64-bit calling convention.
- All other values are reserved.

For GB10 and other similar SOC’s, to communicate with embedded controller,
a new specification is being defined. It is currently in draft stage and
maintained in

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/README.md
https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md

Offset 4 section:

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md#operation-region-definition

This specification internally uses offset 0x4 which is not defined in
published ARM specification. So, when ACPI request comes with offset 0x4,
then it will fail due to missing support. This commit adds support for
custom offset handler. A new EC interface driver will be added in
subsequent patches which will registers it callback function.
When FFH operation region will be executed with offsets other
than 0x0 and 0x1, then it will be forwarded to custom handler.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 89b7d03 noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit df76ec3 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure EC
services communication.

The HID 'MSFT000C' is reserved for FFA devices.
This HID is documented in

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md#hid-definition

This commit adds a platform driver which binds with FFA device.
In its probe routine, it executes the AVAL method to check
if FFA can be used for secure EC services communication.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 555e41e noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit bdd6ed0 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer
https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md
for details regarding FFA device details for secure
EC services communication.

Each secure EC service is identified by separate UUID.
When generic FFA module loads (ffa_module), then it gets the list of
partitions. Each EC service is a FFA partition and ffa_module creates
a device for each partition. These devices will be added in
arm_ffa bus type. The device will be named as arm-ffa-<number>.
For binding with these devices, a driver needs to be registered in
arm_ffa bus type. This driver uses structure ‘struct ffa_driver’ where
it uses UUID as ID table. The binding of the driver to device
happens on basis of UUID.

The secure EC services FFA driver is dependent upon main FFA
device to be created (which uses ACPI ID MSFT000C), so
ffa_driver_register()/ffa_driver_unregister() is invoked from
nvidia_ffa_probe()/nvidia_ffa_remove().

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 9613a5c noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 5ede0e8 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

Please refer

https://github.com/OpenDevicePartnership/documentation/blob/main/bookshelf/Shelf%204%20Specifications/EC%20Interface/src/secure-ec-services-overview.md

for details regarding FFA device details for secure EC
services communication.

When ACPI interpreter runs code with FFH operation region offset 4,
then this data is meant for EC secure services. The FFH buffer has
data in FFA_REQ_PACKET format. In this packet, it has UUID for EC
service and then the service specific raw data. This commit adds
a custom FFH offset handler. When request comes with custom offset
then it will be handled by nvdia FFA EC driver. Inside the custom
ffh callback, it extracts the UUID and gets the ffa_device for it.
Then it fills raw data in ffa_send_direct_data2 and
invoke sync_send_receive2() routine for that ffa_device.
Once it gets the response back, then it fill data in
FFA_RESP_PACKET format and ACPI interpreter passes that data to
upper layer.

NOTE: In the above document, the FFA_REQ_PACKET and FFA_RESP_PACKET
uses different format. But in latest firmware code, the ACPI implementation
is done using same format for both request and response
(follows the FFA_REQ_PACKET format). The status bit will be updated
in the response (0 for success and 1 for failure).

This mixed endian is documented in
https://cdrdv2-public.intel.com/772722/asl-tutorial-v20190625.pdf

  In addition to Concatenate, there are several useful macros that generate
  buffers from strings. For example, the ToUUID macro takes a string of the
  form aabbccdd-eeff-gghh-iijj-kkllmmnnoopp where aa through pp represent
  one byte values encoded with hexadecimal characters. This string gets
  converted to a 16-byte buffer that looks like the following:
  Buffer()
  {
  dd, cc, bb, aa,
  ff, ee,
  hh, gg,
  ii, jj, kk, ll, mm, nn, oo, pp
  }

  This mixture of little endian and big-endian encoding UUID is called
  a mixed-endian format. The use of strings and the ToUUID macro is a
  convenient way to avoid having to manually encode the mixed-endian
  format. There are many other macros that provide similar
  conveniences, such as EISAID. In kernel, it is represented with guid_t.

Inside nvidia_ffh_handler(), we need to covert buffer of 16
bytes from FFA UUID to AML UUID format. nvidia_get_uuid_from_aml_buf()
converts the AML UUID buffer into FFA UUID format.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 40ca7bc noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit 613505b noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

- During boot time, ACPI probe happens first. It calls _STA method for
  each added device.

- Inside _STA method for device managed by EC, it uses FFH offset 4.

- The request will fail since there is no custom handler registered
  for offset 0x4 and device will be disabled.

- If rescan happens on acpi bus, then device _STA method will be
  called again.

This commit adds support to get acpi id from UUID and
invokes acpi_bus_scan().

NOTE: nvidia_get_acpi_id_from_uuid() returns ACPI ID only
for few services. We don't have a corresponding driver available
for all the services in the current code. For few services only,
its node uses generic ACPI ID and has driver available.
For rest of the service, the driver is not yet available,
or the published spec is not updated with full ACPI sample code.
Once we have driver available for that, then we can add
those ACPI IDs in this list.

Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>

(cherry picked from commit 971a25e noble:linux-nvidia-6.14)
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit e4ec414 noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2114230

The commit 897e9e6 ("firmware: arm_ffa: Initial support for scheduler
receiver interrupt") adds support for SGI interrupts in the FFA driver.
However, the validation for SGIs in the GICv3 is too strict, causing the
driver probe to fail.

This patch relaxes the SGI validation check, allowing callers to use SGIs
if the requested SGI number is greater than or equal to MAX_IPI, which
fixes the TFA driver probe failure.

This issue is observed on NVIDIA server platform with FFA-v1.1.

 PTP clock support registered
 EDAC MC: Ver: 3.0.0
 ARM FF-A: Driver version 1.1
 ARM FF-A: Firmware version 1.1 found
 GICv3: [Firmware Bug]: Illegal GSI8 translation request
 ARM FF-A: Failed to create IRQ mapping!
 ARM FF-A: Notification setup failed -61, not enabled
 ARM FF-A: Failed to register driver sched callback -95
 scmi_core: SCMI protocol bus registered

This patch was sent in arm mailing list for upstream but it got
rejected.

https://patchwork.kernel.org/project/linux-arm-kernel/patch/20240813033925.925947-1-sdonthineni@nvidia.com/

The proper fix requires some kind of mechanism by which a
SGI can be requested by module but that needs discussion with arm and
it will take time. This patch will break only if MAX_IPI value gets
changed. This patch adds a BUILD_BUG_ON() to catch that situation.
Once proper solution is concluded then this patch will be reverted.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Signed-off--by: Brad Figg <bfigg@nvidia.com>
(backported from commit fd136cf)
[maskedarray: removed enum ipi_msg_type definition as it appears in
upstream commit "irqchip/gic-v5: Add GICv5 LPI/IPI support"]
Signed-off-by: Abdur Rahman <abdur.rahman@canonical.com>

(cherry picked from commit df84d5d noble:linux-nvidia-6.17)
Signed-off-by: Jacob Martin <jacob.martin@canonical.com>
James Morse and others added 15 commits May 18, 2026 15:37
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.

Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.

This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.

The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.

Tested-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Co-developed-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit aeb8595)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
MPAM (Memory Partitioning and Monitoring) is now exposed to user-space via
resctrl. Add some documentation so the user knows what features to expect.

Reviewed-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Jesse Chick <jessechick@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 4ce0a2c)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The last_cmd_status file is intended to report details about the most recent
resctrl filesystem operation, specifically to aid in diagnosing failures.

However, when parsing io_alloc_cbm, if a user provides a domain ID that does
not exist in the resource, the operation fails with -EINVAL without updating
last_cmd_status. This results in inconsistent behaviour where the system call
returns an error, but last_cmd_status misleadingly reports "ok", leaving the
user unaware that the failure was caused by an invalid domain ID.

Write an error message to last_cmd_status when the target domain ID cannot
be found.

Fixes: 28fa2cc ("fs/resctrl: Introduce interface to modify io_alloc capacity bitmasks")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://patch.msgid.link/20260325001159.447075-2-atomlin@atomlin.com
(cherry picked from commit d06b8e7)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Configuring the io_alloc_cbm interface requires an explicit domain ID for each
cache domain. On systems with high core counts and numerous cache clusters,
this requirement becomes cumbersome for automation and management tasks that
aim to apply a uniform policy.

Introduce a wildcard domain ID selector "*" for the io_alloc_cbm interface.
This enables users to set the same Capacity Bitmask (CBM) across all cache
domains in a single operation.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://patch.msgid.link/20260325001159.447075-3-atomlin@atomlin.com
(cherry picked from commit d2bf45d)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The x86 maintainers handle the resctrl filesystem and x86 architectural
resctrl code. Even so, the x86 maintainers are not part of the resctrl
section and not returned when scripts/get_maintainer.pl is run on resctrl
filesystem code. With patches flowing via x86 maintainers resctrl should
also ensure it follows the tip rules.

Add the x86 maintainer alias, x86@kernel.org, to the resctrl section to
ensure x86 maintainers are included in associated resctrl submissions.

Add a reference to the tip tree handbook to make it clear which rules
resctrl follows.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/4c14dd82e81737c6413e10fe097475b1cc0886fc.1775576382.git.reinette.chatre@intel.com
(cherry picked from commit c611752)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Using the stricter "./tools/docs/kernel-doc -Wall -v" to verify proper
formatting of documentation comments includes warnings related to return
markup on functions that are omitted during the default verification
checks. This stricter verification reports a couple of missing return
descriptions in resctrl:

    Warning: .../fs/resctrl/rdtgroup.c:1536 No description found for return value of 'rdtgroup_cbm_to_size'
    Warning: .../fs/resctrl/rdtgroup.c:3131 No description found for return value of 'mon_get_kn_priv'
    Warning: .../fs/resctrl/rdtgroup.c:3523 No description found for return value of 'cbm_ensure_valid'
    Warning: .../fs/resctrl/monitor.c:238 No description found for return value of 'resctrl_find_cleanest_closid'

Add the missing return descriptions.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/1c50b9f7c73251c007133590986f127e1af57780.1775576382.git.reinette.chatre@intel.com
(cherry picked from commit 7972701)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The code to set MBA's alloc_capable to true appears to be trying to
restore alloc_capable on unmount. This can never work because
resctrl_arch_set_cdp_enabled() is never invoked with RDT_RESOURCE_MBA
as the rid parameter. Consequently,
mpam_resctrl_controls[RDT_RESOURCE_MBA].cdp_enabled always remains false.

The alloc_capable setting in resctrl_arch_set_cdp_enabled() is to
re-enable MBA if the caller opts in to separate control values using
CDP for this resource. This doesn't happen today.

Add a comment to describe this.

However a bug remains where MBA allocation is permanently disabled after
the mount with CDP option. Remounting without CDP cannot restore the MBA
partition capability.

Add a check to re-enable MBA when CDP is disabled, which happens on
unmount.

Fixes: 6789fb9 ("arm_mpam: resctrl: Add CDP emulation")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
[ morse: Added comment for existing code, added hunk to fix this bug from
  Ben H ]
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f758340)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Dan Carpenter reports that, in mpam_resctrl_alloc_domain(), any_mon_comp is
used in an 'if' condition when it may be uninitialized. Initialize it to
NULL so that the check behaves correctly when no monitor components are
found.

Reported-by: Dan Carpenter <error27@gmail.com>
Fixes: 264c285 ("arm_mpam: resctrl: Add monitor initialisation and domain boilerplate")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 67c0a48)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
resctrl_mon_ctx_waiters is not used outside of this file, so make it
static. This fixes the sparse warning:

drivers/resctrl/mpam_resctrl.c:25:1: warning: symbol 'resctrl_mon_ctx_waiters' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603281842.c2K96tJA-lkp@intel.com/
Fixes: 2a3c79c ("arm_mpam: resctrl: Allow resctrl to allocate monitors")
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 4d5bbba)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Eanble resctrl by CONFIG_RESCTRL_FS=y

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…bm counters

resctrl has two types of counters, NUMA-local and global. MPAM has only
bandwidth counters, but the position of the MSC may mean it counts
NUMA-local, or global traffic.
But the topology information is not available.
Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
probably NUMA-local. If the memory controller supports bandwidth
monitors, they are probably global.
This also allows us to assert that we don't have the same class
backing two different resctrl events.
Because the class or component backing the event may not be 'the L3',
it is necessary for mpam_resctrl_get_domain_from_cpu() to search
the monitor domains too. This matters the most for 'monitor only'
systems, where 'the L3' control domains may be empty, and the
ctrl_comp pointer NULL.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 40e0b07 https://github.com/NVIDIA/NV-Kernels 24.04_linux-nvidia-6.17-next)
[fenghuay:
  - mon_comp[] is defined in upstream. Remove its definition in this patch.
  - Resolve minor conflicts in `drivers/resctrl/mpam_resctrl.c`;
]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
When there are enough monitors, the resctrl mbm local and total
files can be exposed. These need all the monitors that resctrl
may use to be allocated up front.
Add helpers to do this.
If a different candidate class is discovered, the old array
should be free'd and the allocated monitors returned to the
driver.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 355bc5f https://github.com/NVIDIA/NV-Kernels 24.04_linux-nvidia-6.17-next)
[fenghuay:
  - Resolve minor conflicts in `drivers/resctrl/mpam_resctrl.c`;
]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
When there are not enough monitors, MPAM is able to emulate ABMC by making
a smaller number of monitors assignable. These monitors still need to be
allocated from the driver, and mapped to whichever control/monitor group
resctrl wants to use them with.
Add a second array to hold the monitor values indexed by resctrl's
cntr_id.
When CDP is in use, two monitors are needed so the available number of
counters halves. Platforms witih one monitor will have zero monitors
when CDP is in use.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit d8a0ad3 https://github.com/NVIDIA/NV-Kernels 24.04_linux-nvidia-6.17-next)
[fenghuay:
  - Resolve minor conflicts in `drivers/resctrl/mpam_resctrl.c`;
]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The current MPAM driver only considers the first component associated
with an online/offline CPU during domain creation and teardown. This
is insufficient, as CPU-initiated traffic may traverse multiple MSCs
before reaching the target, and each MSC must be programmed consistently
for proper resource partitioning.

Update the MPAM driver to include all components associated with a
given CPU during domain setup/teardown to expose expected schemata
to userspace for effective resource control.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(forward ported from commit ac1e5be https://github.com/NVIDIA/NV-Kernels 24.04_linux-nvidia-6.17-next)
[fenghuay:
  - Leaves drivers/resctrl/mpam_internal.h untouched; mpam_resctrl_offline_cpu()
    is already void in the baseline used here.
  - Tightens callers (mpam_resctrl_pick_mba, mpam_resctrl_pick_counters) around
    traffic_matches_l3() together with topology_matches_l3() and
    cpumask_equal(&class->affinity, cpu_possible_mask) and does not add a
    traffic_matches_l3() function body here, which is already defined in
    upstream.
  - Omits any edit to exposed_alloc_capable or exposed_mon_capable; those
    symbols are already absent from the baseline in favor of
    resctrl_arch_alloc_capable() / resctrl_arch_mon_capable().
  - Adds for_each_mpam_resctrl_control() only; does not add MPAM_MAX_EVENT or a
    new for_each_mpam_resctrl_mon() / mpam_resctrl_counters[] sizing hunk
    because that monitor macro and array shape are already in the baseline.
  - Omits INIT_LIST_HEAD_RCU() on res->resctrl_res.ctrl_domains and
    mon_domains, omits moving mpam_resctrl_domain_insert() after
    resctrl_online_*(), and omits adding static void
    mpam_resctrl_online_domain_hdr(); that list setup and insert ordering are
    already in the baseline.
  - Does not replay a void→int conversion for mpam_resctrl_monitor_init() or a
    mpam_pmg_max + 1 num_rmid path; the baseline already has int-returning
    mpam_resctrl_monitor_init() and resctrl_arch_system_num_rmid_idx() for
    num_rmid, so only surrounding line context shifts in this file.
  - Adds for_each_mpam_resctrl_control(), mpam_resctrl_mon_from_res() /
    mpam_resctrl_res_from_mon(), mpam_resctrl_monitor_sync_abmc_vals(struct
    rdt_resource *r), extends mpam_resctrl_alloc_domain() /
    mpam_resctrl_get_domain_from_cpu() / mpam_resctrl_get_mon_domain_from_cpu()
    with struct mpam_component *comp, hardens topology_matches_l3() with
    matched_once, switches resctrl_arch_mbm_cntr_assign_enabled() to use
    mon->assigned_counters, and extends mpam_resctrl_pick_domain_id() so
    memory level > 3 uses component IDs like cache-backed classes]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Add local bytes counter in mpam_resctrl_counters[] to fix missing
mbm_local_bytes monitoring on Grace.

Add mon->assigned_counters check to enable mbm_L3_assignments config
file on Grace.

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented May 18, 2026

Boro watcher review skipped

The GitHub watcher skips automatic boro reviews for PRs with more than 50 commits. This PR currently has 53 commits.

To run the review anyway, ask BaseOS_Kernel_Bot in #baseos-kernel:

review https://github.com/NVIDIA/NV-Kernels/pull/428

Head: f3404d4f7d2d

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher sees a newer PR head.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ❌ Errors found

Details
Checking 53 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ f3404d4f7d2d │ [SAUCE] fix mbm_l3_assign and mon_local_bytes                    │ N/A        │ N/A     │ fenghuay                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ cac60bd6aeb8 │ [SAUCE] arm_mpam: include all associated                         │ N/A        │ N/A     │ sdonthin, fenghuay        │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 951532871c83 │ [SAUCE] arm_mpam: resctrl: pre-allocate assignable monitors      │ N/A        │ N/A     │ morse, fenghuay           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 02743bd38a8e │ [SAUCE] arm_mpam: resctrl: pre-allocate free running monitors    │ N/A        │ N/A     │ morse, fenghuay           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 43f25ff87ed9 │ [SAUCE] untested: arm_mpam: resctrl: pick classes for use as mbm │ N/A        │ N/A     │ morse, fenghuay           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0a9fd91d8ed3 │ [SAUCE] update annotations to set config_resctrl_fs              │ N/A        │ N/A     │ fenghuay                  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ee03cad2087e │ 4d5bbbafc170 arm_mpam: resctrl: Make resctrl_mon_ctx_waiters sta │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ea7565865b08 │ 67c0a487efa5 arm_mpam: resctrl: Fix the check for no monitor com │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1a4da7a6a213 │ f758340da529 arm_mpam: resctrl: Fix MBA CDP alloc_capable handli │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8586341f2886 │ 79727019ce3d fs/resctrl: Add missing return value descriptions   │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ dfc3a32b70c2 │ c611752be9d7 MAINTAINERS: Update resctrl entry                   │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e3107fc2f4b1 │ d2bf45d067c7 fs/resctrl: Add "*" shorthand to set io_alloc CBM f │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c77869ef1ace │ d06b8e7c97c3 fs/resctrl: Report invalid domain ID when parsing i │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d9731fd459db │ 4ce0a2ccc035 arm64: mpam: Add initial MPAM documentation         │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ fa8dcaf5300d │ aeb8595a5f8b arm_mpam: Quirk CMN-650's CSU NRDY behaviour        │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c692e14074bd │ dc48eb1ff27c arm_mpam: Add workaround for T241-MPAM-6            │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ef3bf0252c61 │ a7efe23ed6dd arm_mpam: Add workaround for T241-MPAM-4            │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2963b4e06ea3 │ 70e81fbedc65 arm_mpam: Add workaround for T241-MPAM-1            │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3516cd90c2d2 │ fa7745218c98 arm_mpam: Add quirk framework                       │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d506f7ce11a5 │ fb481ec08699 arm_mpam: resctrl: Call resctrl_init() on platforms │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ dea0ef567586 │ 4aab135bda16 arm64: mpam: Select ARCH_HAS_CPU_RESCTRL            │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3fe4d6d95182 │ ec9a788620be ALSA: usb-audio: Replace hard-coded number with MAX │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 95686f58d927 │ efc775eadce2 arm_mpam: resctrl: Add empty definitions for assort │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 898cae34d9ee │ 49b04e401825 arm_mpam: resctrl: Update the rmid reallocation lim │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 22725c667010 │ fb56b29932ca arm_mpam: resctrl: Add resctrl_arch_rmid_read()     │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1c60c45a1950 │ 2a3c79c61539 arm_mpam: resctrl: Allow resctrl to allocate monito │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ b3f31a50f9d4 │ 1458c4f05335 arm_mpam: resctrl: Add support for csu counters     │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 7ce8a9120691 │ 264c285999fc arm_mpam: resctrl: Add monitor initialisation and d │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 24bd6422730c │ 5dc8f73eaa5d arm_mpam: resctrl: Add kunit test for control forma │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 732f3cbd8e9f │ 36528c7681b8 arm_mpam: resctrl: Add support for 'MB' resource    │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c0cb2971508f │ 1c1e2968a860 arm_mpam: resctrl: Wait for cacheinfo to be ready   │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ f5de905f9179 │ 3e9b35823aab arm_mpam: resctrl: Add rmid index helpers           │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0c8e5e60b464 │ 80d147d29313 arm_mpam: resctrl: Convert to/from MPAMs fixed-poin │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c68f15c930a4 │ 01a0021f6c39 arm_mpam: resctrl: Hide CDP emulation behind CONFIG │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e5b30002a536 │ 6789fb99282c arm_mpam: resctrl: Add CDP emulation                │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 29bb2ae85b36 │ 9d2e1a99fae5 arm_mpam: resctrl: Add plumbing against arm64 task  │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 116568246674 │ 9cd2b522be2c arm_mpam: resctrl: Implement helpers to update conf │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 89e82f34b3c8 │ 02cc66168788 arm_mpam: resctrl: Add resctrl_arch_get_config()    │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3af8f6b31127 │ 370d166d878d arm_mpam: resctrl: Implement resctrl_arch_reset_all │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c0af06defaae │ 52a4edb16121 arm_mpam: resctrl: Pick the caches we will use as r │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 64e940b1fc00 │ 09e61daf8e96 arm_mpam: resctrl: Add boilerplate cpuhp and domain │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a4993fc2738e │ 2cf9ca3fae38 arm64: mpam: Add helpers to change a task or cpu's  │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0bbe5af8e6ab │ 37fe0f984d9c arm64: mpam: Initialise and context switch the MPAM │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 09d3ea169379 │ 735dad999905 arm64: mpam: Add cpu_pm notifier to restore MPAM sy │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 899983c0231d │ 831a7f16728c arm64: mpam: Advertise the CPUs MPAM limits to the  │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 33f441c95743 │ c544f00a4732 arm64: mpam: Drop the CONFIG_EXPERT restriction     │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3eb37c37e0e2 │ 87b78a5d70e8 arm64: mpam: Re-initialise MPAM regs when CPU comes │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3aefb8b9aca3 │ 8e06d04ff1cf arm64: mpam: Context switch the MPAM registers      │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ dfb9d14f00fd │ 2e7c684bdb50 KVM: arm64: Make MPAMSM_EL1 accesses UNDEF          │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 247fd70832bd │ eda1cd1f9d29 KVM: arm64: Preserve host MPAM configuration when c │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 345fa075f58f │ 29fa1be82b83 arm64/sysreg: Add MPAMSM_EL1 register               │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 4c5a8b034132 │ a1cb6577f575 arm_mpam: Reset when feature configuration bit unse │ match      │ match   │ preserved + fenghuay adde │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d4397951a9b5 │ f91e913355f4 arm_mpam: Ensure in_reset_state is false after appl │ match      │ match   │ preserved + fenghuay adde │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint results:
W: d506f7ce11a5 ("arm_mpam: resctrl: Call resctrl_init() on platform"): subject 76 chars (>72)
W: a4993fc2738e ("arm64: mpam: Add helpers to change a task or cpu's"): subject 73 chars (>72)

PR metadata:
W: PR title missing [<branch>] prefix: "Please pull 26.04 linux nvidia.glue"
E: PR targets 26.04_linux-nvidia but body has no https://bugs.launchpad.net/... link

@fyu1 fyu1 changed the title 26.04 linux nvidia.glue Please pull 26.04 linux nvidia.glue May 18, 2026
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 18, 2026

@fyu1

cac60bd NVIDIA: SAUCE: arm_mpam: Include all associated
9515328 NVIDIA: SAUCE: arm_mpam: resctrl: Pre-allocate assignable monitors
02743bd NVIDIA: SAUCE: arm_mpam: resctrl: Pre-allocate free running monitors
43f25ff NVIDIA: SAUCE: untested: arm_mpam: resctrl: pick classes for use as mbm counters

What function are these patches providing? It is a Grace feature?

What are the upstream plans for these patches? (It looks like there were part of the MPAM Part 2 series at one point but were dropped?)


I verified the 47 patches from upstream were clean picks. No issues with those or with the annotations patch.


cac60bd NVIDIA: SAUCE: arm_mpam: Include all associated

Codex found 2 issues with this patch...

cac60bd drops a source hunk from ac1e5be in drivers/resctrl/mpam_devices.c.

The source patch changes mpam_ris_get_affinity() so memory-class components with empty affinity, or memory classes above level 3, are associated with cpu_possible_mask. The target commit does not carry that over. Current code still just
does:

drivers/resctrl/mpam_devices.c:505

case MPAM_CLASS_MEMORY:
get_cpumask_from_node_id(comp->comp_id, affinity);
/* affinity may be empty for CPU-less memory nodes */
break;

The source has:

if (cpumask_empty(affinity)) {
dev_warn_once(..., "CPU-less numa node");
cpumask_copy(affinity, cpu_possible_mask);
} else if (class->level > 3)
cpumask_copy(affinity, cpu_possible_mask);

That matters because cac60bd changes CPU online/offline handling to iterate all components whose affinity contains the CPU. Without the affinity hunk, CPU-less memory nodes stay empty, and level >3 memory components stay tied to their
NUMA node mask instead of all CPUs. That undermines the “include all associated” behavior for those components.

The [fenghuay:] note should be improved. It currently does not mention omitting the mpam_ris_get_affinity() hunk. I think this is not just an annotation problem; the hunk should likely be added unless there is a deliberate branch-specific
reason to omit it.


cac60bd duplicates for_each_mpam_resctrl_control(). The exact same macro is defined twice in drivers/resctrl/mpam_resctrl.c. That is a cleanup/build-hygiene issue, and the commit note should not say “adds” that macro if it was already present.


break;
case MPAM_CLASS_MEMORY:
get_cpumask_from_node_id(comp->comp_id, affinity);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pr#419, this portion of the code had:

		if (cpumask_empty(affinity)) {
			dev_warn_once(&msc->pdev->dev, "CPU-less numa node");
			cpumask_copy(affinity, cpu_possible_mask);
		} else if (class->level > 3)
			cpumask_copy(affinity, cpu_possible_mask);

Do you still need to keep the "else" case here?

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented May 19, 2026

Boro review

Latest watcher review: open review

Head: f3404d4f7d2d

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nirmoy nirmoy added help wanted Extra attention is needed question Further information is requested labels May 21, 2026
@nvidia-bfigg nvidia-bfigg force-pushed the 26.04_linux-nvidia branch 2 times, most recently from bbda548 to 837b23f Compare May 22, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.