NVIDIA Open GPU Kernel Modules Version
595.71.05 (Open Kernel Modules, Release Build, built 2026-04-24, builder dvs-builder@U22-I3-G08-03-1)
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Fedora Linux 44 (Workstation Edition)
Kernel Release
6.19.14-300.fc44.x86_64 - by Fedora
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 5070 (GB205, PCI ID 10de:2f04 rev a1, ASUSTeK subsystem 1043:89e6).
Describe the bug
After long uptime of normal Wayland desktop use with many GPU-accelerated clients, the NVIDIA driver attempts a DMA mapping whose computed range crosses the boundary between PCI BAR1 and BAR3. The Linux PCI resource layer rejects it with a "resource sanity check" warning, the driver returns NV_ERR_NO_MEMORY from mapping_reuse.c:273, the nvidia-drm atomic-modeset helper then fails to initialize a plane fence semaphore, and the GPU's recovery watchdog declares the GPU "probably locked" and continues to fire indefinitely. The display session becomes unrecoverable; restarting gdm is the lightest path back, full reboot is sometimes required. At the moment of the first failure, only ~1.7 GiB of 12 GiB VRAM was in use — this is not VRAM exhaustion in bytes, it appears to be exhaustion or fragmentation of the BAR1 mapping window.
Relevant hardware state
PCI BAR layout for the GPU (lspci -vv -s 08:00.0):
Region 0: Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] ← BAR1
Region 3: Memory at e0000000 (64-bit, prefetchable) [size=32M] ← BAR3
Capabilities: [134 v1] Physical Resizable BAR
Capabilities: [140 v1] Virtual Resizable BAR
The GPU advertises both Physical and Virtual Resizable BAR capabilities, but the system has Resizable BAR disabled. Per /proc/driver/nvidia/params:
So BAR1 stays at 256 MiB rather than being resized to span the full 12 GiB of VRAM. This appears to be the predisposing condition for the failure.
Kernel log of the failure
Most informative dmesg excerpt (kernel timestamps, single uptime, in order):
[168814.845214] NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping.
[168814.845222] NVRM: nvAssertOkFailedNoLog: Assertion failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051)
returned from pReuseMappingDb->pMapCb(pReuseMappingDb->pGlobalCtx, pAllocCtx, range,
cachingFlags, &token, _reusemappingdbAddMappingCallback) @ mapping_reuse.c:273
[168814.845231] NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping.
[168814.845313] NVRM: dmaAllocMapping_GM107: can't alloc VA space for mapping.
[168814.845434] resource: resource sanity check: requesting [mem 0x00000000df550000-0x00000000e013ffff],
which spans more than 0000:08:00.0 [mem 0xd0000000-0xdfffffff 64bit pref]
[168814.845438] caller __nv_drm_gem_nvkms_map+0x99/0xf0 [nvidia_drm] mapping multiple BARs
[168821.090046] [drm:__nv_drm_convert_in_fences [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800]
Failed to initialize semaphore for plane fence
[168821.090058] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800]
Failed to apply atomic modeset. Error code: -11
[168824.609543] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7
[168832.801431] NVRM: krcWatchdog_IMPL: RC watchdog: GPU is probably locked! Notify Timeout Seconds: 7
... (krcWatchdog_IMPL repeats every ~8 s indefinitely) ...
[169656.854308] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000800]
Flip event timeout on head 3
... (kernel falls back to fbcon shortly after) ...
The driver requested 0xdf550000–0xe013ffff (≈ 12 MiB). That range starts inside BAR1 (0xd0000000–0xdfffffff, 256 MiB) and ends inside BAR3 (0xe0000000–0xe1ffffff, 32 MiB). The Linux PCI core rejects the mapping with EAGAIN (-11) because it spans two distinct BARs.
Suspected root cause
Likely predisposing factor: ReBAR disabled keeps BAR1 at 256 MiB despite GB205 having 12 GiB of VRAM and advertising the rebar capability. The mapping computed by the driver under sustained mapping-reuse churn (pReuseMappingDb, mapping_reuse.c:273) crossed the BAR1→BAR3 boundary, as recorded by the kernel sanity check. The driver does not handle the rejection cleanly — it returns NV_ERR_NO_MEMORY upward, leaving the GPU in a state the RC watchdog cannot recover from.
Concurrent system state at first failure
$ nvidia-smi --query-gpu=name,memory.total,memory.used,memory.free,driver_version --format=csv
NVIDIA GeForce RTX 5070, 12227 MiB, 1710 MiB, 10062 MiB, 595.71.05
$ uptime # at SSH login, after the lockup was already in progress
23:46:58 up 1 day, 23:01, 3 users, load average: 2.60, 1.92, 1.32
$ free -h
total used free shared buff/cache available
Mem: 62Gi 27Gi 3.2Gi 1.8Gi 35Gi 35Gi
Swap: 8.0Gi 71Mi 7.9Gi
Host CPU and system RAM are not under pressure. VRAM is mostly free. Failure is GPU-side only.
Kernel command line
BOOT_IMAGE=(hd4,gpt4)/vmlinuz-6.19.14-300.fc44.x86_64 root=UUID=... ro rootflags=subvol=root
rd.luks.uuid=luks-... rhgb quiet nvidia-drm.modeset=1 snd_hda_intel.power_save=0
rd.driver.blacklist=nouveau,nova_core modprobe.blacklist=nouveau,nova_core
nouveau and nova_core are blacklisted. nvidia-drm.modeset=1 is set.
Compositor / userspace
GNOME Shell on Wayland (gdm), stock Fedora xorg-x11-drv-nvidia userspace at 595.71.05.
To Reproduce
Difficult to reproduce on demand. Observed once after very long uptime:
- Boot Fedora 44, GNOME on Wayland, NVIDIA open kernel modules 595.71.05 (
nvidia-drm.modeset=1, ReBAR disabled in BIOS).
- Use the desktop normally over ~47 hours with a busy mix of GPU-accelerated clients:
- Brave Browser (~15 windows, ~20 renderer/utility processes)
- Discord (Electron)
- Spotify (Electron)
- Steam + steamwebhelper
- RustRover (JetBrains)
- Xwayland hosting several X11 clients
- GNOME Shell + extensions
- After roughly that uptime,
dmaAllocMapping failures begin appearing, immediately followed by the BAR-spanning sanity-check warning and krcWatchdog.
- The desktop session becomes unresponsive; SSH still works, no Xid is logged, GSP appears healthy (no Xid 119), the kernel just keeps logging the watchdog every ~8 s until
gdm is restarted or the box is rebooted.
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
GitHub-issue searches that returned no existing match:
dmaAllocMapping
mapping_reuse
krcWatchdog GPU is probably locked
NV_ERR_NO_MEMORY VA
GB205
__nv_drm_gem_nvkms_map mapping multiple BARs
Function-name note
The log says dmaAllocMapping_GM107 despite the GPU being Blackwell (GB205). Appears to be a legacy-named symbol still in use across HALs — flagged in case it is informative.
Workaround being tried
Will enable Resizable BAR in BIOS and re-test. If that prevents recurrence, this points strongly at the BAR1-size / mapping-reuse interaction described above. Will update this issue with the result either way.
NVIDIA Open GPU Kernel Modules Version
595.71.05 (Open Kernel Modules, Release Build, built 2026-04-24, builder dvs-builder@U22-I3-G08-03-1)
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Fedora Linux 44 (Workstation Edition)
Kernel Release
6.19.14-300.fc44.x86_64 - by Fedora
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 5070 (GB205, PCI ID
10de:2f04rev a1, ASUSTeK subsystem1043:89e6).Describe the bug
After long uptime of normal Wayland desktop use with many GPU-accelerated clients, the NVIDIA driver attempts a DMA mapping whose computed range crosses the boundary between PCI BAR1 and BAR3. The Linux PCI resource layer rejects it with a "resource sanity check" warning, the driver returns
NV_ERR_NO_MEMORYfrommapping_reuse.c:273, thenvidia-drmatomic-modeset helper then fails to initialize a plane fence semaphore, and the GPU's recovery watchdog declares the GPU "probably locked" and continues to fire indefinitely. The display session becomes unrecoverable; restartinggdmis the lightest path back, full reboot is sometimes required. At the moment of the first failure, only ~1.7 GiB of 12 GiB VRAM was in use — this is not VRAM exhaustion in bytes, it appears to be exhaustion or fragmentation of the BAR1 mapping window.Relevant hardware state
PCI BAR layout for the GPU (
lspci -vv -s 08:00.0):The GPU advertises both Physical and Virtual Resizable BAR capabilities, but the system has Resizable BAR disabled. Per
/proc/driver/nvidia/params:So BAR1 stays at 256 MiB rather than being resized to span the full 12 GiB of VRAM. This appears to be the predisposing condition for the failure.
Kernel log of the failure
Most informative dmesg excerpt (kernel timestamps, single uptime, in order):
The driver requested
0xdf550000–0xe013ffff(≈ 12 MiB). That range starts inside BAR1 (0xd0000000–0xdfffffff, 256 MiB) and ends inside BAR3 (0xe0000000–0xe1ffffff, 32 MiB). The Linux PCI core rejects the mapping withEAGAIN (-11)because it spans two distinct BARs.Suspected root cause
Likely predisposing factor: ReBAR disabled keeps BAR1 at 256 MiB despite GB205 having 12 GiB of VRAM and advertising the rebar capability. The mapping computed by the driver under sustained mapping-reuse churn (
pReuseMappingDb,mapping_reuse.c:273) crossed the BAR1→BAR3 boundary, as recorded by the kernel sanity check. The driver does not handle the rejection cleanly — it returnsNV_ERR_NO_MEMORYupward, leaving the GPU in a state the RC watchdog cannot recover from.Concurrent system state at first failure
Host CPU and system RAM are not under pressure. VRAM is mostly free. Failure is GPU-side only.
Kernel command line
nouveauandnova_coreare blacklisted.nvidia-drm.modeset=1is set.Compositor / userspace
GNOME Shell on Wayland (gdm), stock Fedora
xorg-x11-drv-nvidiauserspace at 595.71.05.To Reproduce
Difficult to reproduce on demand. Observed once after very long uptime:
nvidia-drm.modeset=1, ReBAR disabled in BIOS).dmaAllocMappingfailures begin appearing, immediately followed by the BAR-spanning sanity-check warning andkrcWatchdog.gdmis restarted or the box is rebooted.Bug Incidence
Sometimes
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
Flip event timeouton RTX 4070, but driven by monitor power-save handling, not BAR-mapping exhaustion. Only the lateFlip event timeoutline is shared with my bug.GitHub-issue searches that returned no existing match:
dmaAllocMappingmapping_reusekrcWatchdog GPU is probably lockedNV_ERR_NO_MEMORY VAGB205__nv_drm_gem_nvkms_map mapping multiple BARsFunction-name note
The log says
dmaAllocMapping_GM107despite the GPU being Blackwell (GB205). Appears to be a legacy-named symbol still in use across HALs — flagged in case it is informative.Workaround being tried
Will enable Resizable BAR in BIOS and re-test. If that prevents recurrence, this points strongly at the BAR1-size / mapping-reuse interaction described above. Will update this issue with the result either way.