Skip to content

fix: retain GPU library paths when symlinks resolve outside mountpoint#165

Open
blinkagent[bot] wants to merge 1 commit intomainfrom
fix/gpu-lib-detection-usr-lib64
Open

fix: retain GPU library paths when symlinks resolve outside mountpoint#165
blinkagent[bot] wants to merge 1 commit intomainfrom
fix/gpu-lib-detection-usr-lib64

Conversation

@blinkagent
Copy link

@blinkagent blinkagent bot commented Mar 9, 2026

Problem

When the host uses /usr/lib64 (common on RHEL/Amazon Linux) and it is mounted into the outer container at /var/coder/usr/lib, the usrLibGPUs() function fails to detect NVIDIA libraries.

The root cause is in recursiveSymlinks(): on these systems, symlinks inside /usr/lib64 use absolute paths referencing the original host path (e.g., libnvidia-ml.so.1 -> /usr/lib64/libnvidia-ml.so.545.23.08). When the directory is mounted at /var/coder/usr/lib, these absolute symlink targets don't start with the mountpoint prefix. The function previously returned nil when it encountered such a target, discarding the entire symlink chain including the original file within the mountpoint.

This meant CODER_ADD_GPU=true would pass through /dev/nvidia* devices but mount zero NVIDIA libraries, causing nvidia-smi to fail in the inner container.

Fix

Changed recursiveSymlinks() to break out of the loop instead of return nil, nil when a symlink target resolves outside the mountpoint. This retains all paths collected so far within the mountpoint, which are still valid bind mount sources.

The change is a single line: return nil, nilbreak.

Test

Added TestGPUs_UsrLib64Symlinks which creates a real filesystem layout simulating the /usr/lib64 scenario:

  • A real .so file
  • A symlink with an absolute target pointing outside the mountpoint
  • A relative symlink in the chain

Verifies all three paths are detected as GPU bind mounts.

Fixes #164

When the host uses /usr/lib64 (common on RHEL/Amazon Linux) and it is
mounted into the outer container at /var/coder/usr/lib, symlinks inside
the directory may use absolute paths referencing the original host path
(e.g. /usr/lib64/libnvidia-ml.so.545.23.08). The recursiveSymlinks
function would discard the entire symlink chain (including the original
file within the mountpoint) when it encountered a target outside the
mountpoint, returning nil.

This changes the behavior to return all paths collected so far within
the mountpoint instead of discarding them. The GPU libraries are still
valid bind mount sources at their paths within the mountpoint.

Fixes #164
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU library auto-detection fails when host usr lib path is /usr/lib64, requiring manual CODER_MOUNTS

0 participants