This page records the most common cuBLASLt bring-up issue and the runtime behavior that matters when debugging PyTorch linear and matmul paths.
If you manually preload FakeGPU libraries but omit libcublas, PyTorch 2.x can fail with errors such as:
CUBLAS_STATUS_NOT_SUPPORTED
That usually appears around cublasLtMatmulAlgoGetHeuristic(...) or nearby linear-layer setup.
Two conditions matter:
- PyTorch 2.x expects cuBLASLt, not just the older cuBLAS surface.
- Manual preload setups must include the FakeGPU
libcublaslibrary when you want FakeGPU's cuBLAS/cuBLASLt path.
Prefer the wrapper:
./fgpu python3 your_script.pyIt keeps the preload order correct and avoids the most common manual mistakes.
LD_LIBRARY_PATH=./build:$LD_LIBRARY_PATH \
LD_PRELOAD=./build/libcublas.so.12:./build/libcudart.so.12:./build/libcuda.so.1:./build/libnvidia-ml.so.1 \
python3 your_script.pyDYLD_LIBRARY_PATH=./build:$DYLD_LIBRARY_PATH \
DYLD_INSERT_LIBRARIES=./build/libcublas.dylib:./build/libcudart.dylib:./build/libcuda.dylib:./build/libnvidia-ml.dylib \
python3 your_script.py| Mode | cuBLAS/cuBLASLt source |
|---|---|
simulate |
FakeGPU libcublas with maintained CPU-backed math for supported paths |
hybrid |
Real cuBLAS/cuBLASLt, while FakeGPU still virtualizes device identity and reporting |
passthrough |
Real cuBLAS/cuBLASLt with minimal FakeGPU interference |
The maintained CPU-simulation validation includes:
cublasSgemm_v2cublasLtMatmulfor common matmul paths- device-pointer mode checks
- strided batched GEMM
- batched GEMM
- several BLAS1 operations
That coverage is enough for basic PyTorch tensor, linear, and matmul smoke paths, but not a promise that every advanced kernel path used by every model is already implemented.
- Start with
./ftest cpu_simto confirm the maintained FakeGPU math path works in your build. - Use
./fgpubefore falling back to manual preload. - If a workload only fails in
simulate, tryhybridto separate fake-device concerns from fake-cuBLAS concerns. - For framework-specific issues, compare a real-GPU run and a FakeGPU run with the same user script.