-
Notifications
You must be signed in to change notification settings - Fork 631
Improve handling of the NVTE_CUDA_ARCHS #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Greptile OverviewGreptile SummaryThis PR adjusts Transformer Engine’s CUDA arch selection logic by splitting The main concern is in Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant UserEnv as User env (NVTE_CUDA_ARCHS)
participant CMake as CMake configure
participant TE as common/CMakeLists.txt
participant NVCC as nvcc
participant Gen as Generic .cu sources
participant Spec as Arch-specific .cu sources
participant PTX as util/ptx.cuh
UserEnv->>CMake: "Configure build"
CMake->>TE: "Evaluate arch selection"
TE->>TE: "Split into NVTE_GENERIC_ARCHS / NVTE_SPECIFIC_ARCHS"
alt arch is 110/120
TE->>TE: "Remove base arch from CMAKE_CUDA_ARCHITECTURES"
TE->>TE: "Append base to NVTE_GENERIC_ARCHS"
TE->>TE: "Append suffixed arch to NVTE_SPECIFIC_ARCHS"
else arch is 100/101 (current PR)
TE->>TE: "Keep base arch in CMAKE_CUDA_ARCHITECTURES"
TE->>TE: "Append 100a/101a to NVTE_SPECIFIC_ARCHS"
end
TE->>NVCC: "Compile Gen with --generate-code for NVTE_GENERIC_ARCHS"
TE->>NVCC: "Compile Spec with --generate-code for NVTE_SPECIFIC_ARCHS"
TE->>NVCC: "Compile all sources for CMAKE_CUDA_ARCHITECTURES"
NVCC->>PTX: "Define NVTE_HAS_ARCH_SPECIFIC_TARGETS for Spec"
PTX->>PTX: "NVTE_CUDA_ARCH_MATCHES: runtime check when macro defined"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| @@ -54,31 +52,57 @@ endif() | |||
| # Check for architecture 101 (if we see this we are in toolkit <= 12.9) | |||
| list(FIND CMAKE_CUDA_ARCHITECTURES "101" arch_101_index) | |||
| if(NOT arch_101_index EQUAL -1) | |||
| list(REMOVE_ITEM CMAKE_CUDA_ARCHITECTURES "101") | |||
| list(APPEND NVTE_GENERIC_ARCHS "101") | |||
| list(APPEND NVTE_SPECIFIC_ARCHS "101a") | |||
| endif() | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arch 100/101 not removed
When "100" / "101" are present in CMAKE_CUDA_ARCHITECTURES, this block only appends 100a/101a to NVTE_SPECIFIC_ARCHS but never removes the base arch from CMAKE_CUDA_ARCHITECTURES nor adds it to NVTE_GENERIC_ARCHS. As a result, the build will still compile all sources for sm_100 / sm_101 (via CMAKE_CUDA_ARCHITECTURES) while also compiling arch-specific sources for sm_100a / sm_101a (via --generate-code), which defeats the “generic + specific” split and can cause unexpected extra build work / incorrect targeting. This differs from the 110/120 handling below, which removes the base arch and routes it into the explicit lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greptileai That is because arches 110/120 are routed to the "f" (like 120f) variant where the entire family is covered, whereas in the case of 100, it routes to "a" variants (like 100a) - this is intentional behavior, but in order to support different Arches in the same family, we need to still compile for the regular 100 then.
| if(CMAKE_VERSION VERSION_LESS 4.0.2) | ||
| if(NOT CMAKE_CUDA_ARCHITECTURES) | ||
| message(WARNING | ||
| "CMAKE_CUDA_ARCHITECTURES is empty after replacing arch-specific targets. " | ||
| "Please upgrade to CMake 4.0.2+ for native 'f' architecture support. " | ||
| "Adding sm_75 target in addition to the specified target to avoid configuration " | ||
| "errors - this will result in longer build time, but does not affect correctness.") | ||
| set(CMAKE_CUDA_ARCHITECTURES 75) | ||
| endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback wipes user archs
On CMake < 4.0.2, if the arch list becomes empty you set(CMAKE_CUDA_ARCHITECTURES 75) (dropping the user’s originally requested arch entirely). Given the warning text says “add sm_75 target in addition to the specified target”, this looks like a functional mismatch: the fallback should preserve the originally selected arch(es) and append 75, not replace everything with 75.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greptileai If the list is empty, then setting is the same as appending.
Description
Improve handling of the NVTE_CUDA_ARCHS env variable:
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: