Get rid of nvshmem dependency for cuBLASMp integration #2661

vcherepanov-nv · 2026-02-09T21:29:50Z

Description

Starting with cuBLASMp 0.8.0, they're moving away from using nvshmem for symmetric memory, use NCCL instead.
This change adapts the to changed API.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Get rid of nvshmem dependency (for cuBLASMp)
Update cuBLASMp API usage

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-02-09T22:07:20Z

Greptile Overview

Greptile Summary

This PR updates TransformerEngine’s cuBLASMp integration to reflect cuBLASMp >= 0.8.0 moving symmetric-memory support from nvshmem to NCCL. It removes the nvshmem build dependency, updates CMake linkage to bring in NCCL for the cuBLASMp path, and adjusts the comm GEMM implementation and its public header to use the updated cuBLASMp APIs.

I did not find additional fix-required issues in the changed files beyond the two items already raised in prior threads (NCCL discoverability in CMake, and workspace lifetime/synchronization in ctx destroy).

Confidence Score: 2/5

This PR has known merge-blocking issues called out in prior review threads that should be resolved before merging.
Review of the changed files did not surface additional defects beyond prior threads, but the existing unresolved items are significant: CMake must reliably find/link NCCL when NVTE_WITH_CUBLASMP is enabled, and ctx destroy currently lacks a guarantee that in-flight work using ctx->workspace has completed before deregistration/free. Those can break builds or cause use-after-free in valid usage patterns.
transformer_engine/common/CMakeLists.txt, transformer_engine/common/comm_gemm/comm_gemm.cpp, transformer_engine/common/include/transformer_engine/comm_gemm.h

Important Files Changed

Filename	Overview
setup.py	Updates build configuration to remove nvshmem dependency and pass cuBLASMp-related settings; no correctness issues found in this review.
transformer_engine/common/CMakeLists.txt	Adjusts CMake options/dependencies for cuBLASMp to use NCCL instead of nvshmem; no additional issues found beyond already-tracked NCCL discovery comment.
transformer_engine/common/comm_gemm/comm_gemm.cpp	Adapts cuBLASMp comm GEMM implementation to updated API (NCCL symmetric memory); no new functional issues found beyond already-commented workspace lifetime/synchronization concern.
transformer_engine/common/include/transformer_engine/comm_gemm.h	Updates comm GEMM public API/comments to match new cuBLASMp/NCCL integration; no new issues found.

Sequence Diagram

sequenceDiagram
    participant Py as setup.py (build)
    participant CM as CMakeLists.txt
    participant CG as comm_gemm.cpp
    participant MP as cuBLASMp
    participant NC as NCCL

    Py->>CM: Configure NVTE_WITH_CUBLASMP and dependency paths
    CM->>CM: Link cuBLASMp + NCCL (nvshmem removed)
    CG->>MP: Create/Configure cuBLASMp handle
    CG->>NC: Initialize/associate NCCL-backed symmetric memory
    CG->>MP: Register/deregister workspace buffer (new API)
    CG->>MP: Enqueue cublasMpMatmul on main_stream
    CG-->>MP: Destroy ctx/handle (must ensure stream work complete)

greptile-apps

_{3 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T22:07:27Z

transformer_engine/common/CMakeLists.txt

+    find_library(NCCL_LIB
+                 NAMES nccl libnccl
                 PATH_SUFFIXES lib
                 REQUIRED)
-  target_link_libraries(transformer_engine PUBLIC ${CUBLASMP_LIB} ${NVSHMEM_HOST_LIB})
+  target_link_libraries(transformer_engine PUBLIC ${NCCL_LIB} ${CUBLASMP_LIB})


NCCL library not discoverable

find_library(NCCL_LIB ...) is missing a PATHS/hint variable (unlike CUBLASMP_LIB, which uses ${CUBLASMP_DIR}). Unless NCCL is already in the default linker search paths, enabling NVTE_WITH_CUBLASMP will fail at configure time with NCCL_LIB not found. This PR should add a way to point CMake at NCCL (env var / CMake cache var) and pass it from setup.py similarly to CUBLASMP_DIR.

greptile-apps · 2026-02-09T22:07:43Z

Additional Comments (3)

transformer_engine/common/CMakeLists.txt
Undefined NVSHMEM_DIR include

In the NVTE_WITH_CUBLASMP block, target_include_directories(... ${NVSHMEM_DIR}/include) still references NVSHMEM_DIR, but this PR removed the only place that was setting it (setup.py). With NVTE_WITH_CUBLASMP=ON, CMake will now expand an unset var here, which can break configuration/build (and it also contradicts the goal of removing nvshmem for cuBLASMp).

transformer_engine/common/comm_gemm/comm_gemm.cpp
Uninitialized workspace fields

NVTECommGemmCtx has raw workspace/workspace_size fields, but nvte_comm_gemm_ctx_create()'s designated initializer never sets them. That means cublasmp_gemm() will read uninitialized ctx->workspace_size and may also treat an uninitialized ctx->workspace as non-null (leading to invalid deregister/free calls). These need explicit initialization (e.g., workspace=nullptr, workspace_size=0) in the create path.

transformer_engine/common/comm_gemm/comm_gemm.cpp
Workspace leak + missing cleanup

After switching from nvshmem_malloc/free to cublasMpMalloc/cublasMpFree + cublasMpBufferRegister, nvte_comm_gemm_ctx_destroy() now just deletes the ctx and never deregisters/frees ctx->workspace. Since workspace is a raw pointer (not RAII-managed), this leaks device memory and leaves the cuBLASMp registration state dirty for the lifetime of the process.

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

greptile-apps

_{3 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T22:32:25Z

transformer_engine/common/comm_gemm/comm_gemm.cpp

 void nvte_comm_gemm_ctx_destroy(NVTECommGemmCtx* ctx) {
  NVTE_API_CALL(nvte_comm_gemm_ctx_destroy);
-  nvshmemx_sync_all_on_stream(ctx->stream.get());
+  if (ctx->workspace) {
+    NVTE_CHECK_CUBLASMP(cublasMpBufferDeregister(ctx->grid_row_major.get(), ctx->workspace));
+    NVTE_CHECK_CUBLASMP(cublasMpFree(ctx->grid_col_major.get(), ctx->workspace));
+  }
  delete ctx;


Unsafe workspace free

nvte_comm_gemm_ctx_destroy can deregister/free ctx->workspace while cublasMpMatmul work on the user-provided main_stream is still in flight. cublasmp_gemm sets the cuBLASMp handle stream to main_stream (comm_gemm.cpp:389) and uses ctx->workspace in the enqueue (comm_gemm.cpp:436-439), but destroy() does not synchronize main_stream (or otherwise ensure completion) before calling cublasMpBufferDeregister/cublasMpFree.

This can become a use-after-free if a caller destroys the ctx shortly after launching a comm GEMM. Either synchronize the relevant stream(s) before freeing, or explicitly document (in comm_gemm.h) that callers must synchronize main_stream before calling destroy().

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

vcherepanov-nv and others added 5 commits January 27, 2026 02:36

Remove nvshmem usage

5b2f7cc

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

Renamings

aadb720

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

NCCL dependency

80c2d1a

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

Check for not yet allocated workspace

7ba0971

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd02d57

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Address greptile comments

706d9d1

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Add a comment per greptile

a98e550

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of nvshmem dependency for cuBLASMp integration #2661

Get rid of nvshmem dependency for cuBLASMp integration #2661

Uh oh!

vcherepanov-nv commented Feb 9, 2026

Uh oh!

greptile-apps bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 9, 2026

Uh oh!

greptile-apps bot commented Feb 9, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 9, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Get rid of nvshmem dependency for cuBLASMp integration #2661

Are you sure you want to change the base?

Get rid of nvshmem dependency for cuBLASMp integration #2661

Uh oh!

Conversation

vcherepanov-nv commented Feb 9, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 9, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Feb 9, 2026 •

edited

Loading