[pyTorch] Fix the compilation warnings #2663

ptrendx · 2026-02-09T22:03:52Z

Description

This PR fixes most compilation warnings for the pyTorch extension. The one still left is the performance advisory from ptxas on using the multicast.cluster modifier when compiling for architecture 120 - fixing this one would require a special kernel for SM120 most probably.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Changed the deprecated set_is_inference call in cudnn frontend to set_generate_stats(false)
Fixed the warning about the base class function being shadowed in the child class rather than overridden
Fixed the warning about the wrong order of the parameters in the constructor

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps · 2026-02-09T22:06:39Z

Greptile Overview

Greptile Summary

This PR targets compilation warnings in the PyTorch extension and cuDNN frontend fused-attention code paths.

Changes include:

Reordering GroupedTensor members and its constructor initializer list to match declaration order and silence -Wreorder warnings.
Updating cuDNN frontend SDPA attribute configuration to replace deprecated set_is_inference(false) usage with set_generate_stats(...).
Fixing C++ name-hiding/overload ambiguity for copy_into_buffer in the PyTorch CommOverlap* wrappers by adding using declarations and disambiguating the pybind11 binding via static_cast.

Main remaining concern is the FP8 fused-attention path, which now hardcodes set_generate_stats(true) despite having an is_training parameter, potentially changing inference behavior/performance if stats are not needed outside training.

Confidence Score: 4/5

This PR is likely safe to merge after confirming FP8 stats generation behavior is correct for inference.
Most changes are mechanical warning fixes (member initialization order, deprecated cuDNN frontend API replacement, and pybind overload disambiguation). The only material behavioral risk is set_generate_stats(true) in the FP8 SDPA attributes, which may force stats generation when is_training is false.
transformer_engine/common/fused_attn/fused_attn_fp8.cu

Important Files Changed

Filename	Overview
transformer_engine/common/common.h	Reordered `GroupedTensor` members and initializer list to match declaration order, addressing -Wreorder warnings; no functional changes apparent.
transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu	Replaced deprecated `set_is_inference(false)` with `set_generate_stats(generate_stats)` for cuDNN frontend SDPA attributes; behavior stays parameter-driven.
transformer_engine/common/fused_attn/fused_attn_fp8.cu	Replaced `set_is_inference(false)` with `set_generate_stats(true)` for SDPA_fp8_attributes; may unconditionally generate stats regardless of training/inference intent.
transformer_engine/pytorch/csrc/extensions.h	Added `using` declarations to unhide base-class `copy_into_buffer` overloads in CommOverlap wrappers, fixing C++ shadowing warnings and enabling pybind disambiguation.
transformer_engine/pytorch/csrc/extensions/pybind.cpp	Disambiguated pybind bindings for overloaded `copy_into_buffer` via explicit `static_cast` to the intended (Tensor, bool) overload.

Sequence Diagram

sequenceDiagram
  participant Py as PyTorch Python
  participant PB as pybind11 module
  participant CO as CommOverlap/CommOverlapP2P (C++)
  participant Base as CommOverlap*Base/Core (C++)
  participant FA as FusedAttn FP16/FP8 (CUDA)
  participant FE as cuDNN Frontend Graph

  Py->>PB: instantiate CommOverlap*(...)
  PB->>CO: constructor

  Py->>PB: copy_into_buffer(tensor, local_chunk)
  PB->>CO: copy_into_buffer(const at::Tensor&, bool)
  Note over PB,CO: Binding uses static_cast to pick (Tensor,bool)
  CO->>Base: (optional) call into base/core overlap implementation

  Py->>PB: run fused attention op
  PB->>FA: fused_attn_*_fwd_impl(...)
  FA->>FE: build SDPA attributes
  Note over FE: FP16: set_generate_stats(generate_stats)
  Note over FE: FP8: set_generate_stats(true) (current PR)
  FE-->>FA: compile/execute graph
  FA-->>PB: return outputs (incl. Stats when generated)
  PB-->>Py: return tensors

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>

ptrendx · 2026-02-09T23:15:06Z

/te-ci

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-09T23:22:11Z

Additional Comments (1)

transformer_engine/common/fused_attn/fused_attn_fp8.cu
Unconditional stats generation

fused_attn_fp8_fwd_impl_v1 takes bool is_training (see signature around transformer_engine/common/fused_attn/fused_attn_fp8.cu:1654), but the new .set_generate_stats(true) here forces stats collection even when is_training == false. If cuDNN frontend uses this flag to allocate/compute extra outputs, this changes inference behavior (and can add overhead) compared to a training-only stats path. This should likely be wired to is_training (or whatever condition controls whether Stats is actually needed) rather than hardcoded true.

Fix the compilation warnings for the PyTorch extension

43e7179

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

ptrendx changed the title ~~Fix the compilation warnings~~ [pyTorch] Fix the compilation warnings Feb 9, 2026

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu Outdated Show resolved Hide resolved

Apply suggestion from @greptile-apps[bot]

5bd16dc

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Przemyslaw Tredak <ptrendx@gmail.com>

ptrendx requested a review from cyanguwa February 9, 2026 23:13

cyanguwa approved these changes Feb 9, 2026

View reviewed changes

greptile-apps bot reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pyTorch] Fix the compilation warnings #2663

[pyTorch] Fix the compilation warnings #2663

ptrendx commented Feb 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

ptrendx commented Feb 9, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[pyTorch] Fix the compilation warnings #2663

Are you sure you want to change the base?

[pyTorch] Fix the compilation warnings #2663

Conversation

ptrendx commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ptrendx commented Feb 9, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ptrendx commented Feb 9, 2026 •

edited

Loading

greptile-apps bot commented Feb 9, 2026 •

edited

Loading