Skip to content

feat(ascend): add scaled softmax operator#599

Draft
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-scaled-softmax
Draft

feat(ascend): add scaled softmax operator#599
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-scaled-softmax

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Summary

  • Add the ScaledSoftmax base operator for 2D logits/probability normalization.
  • Add the Ascend ACLNN implementation using optional aclnnMuls followed by aclnnSoftmax.
  • Add focused Ascend tests across shapes, scales, and floating dtypes.

Motivation

ScaledSoftmax is needed as a small inference operator for logits-style normalization on Ascend.

Closes N/A

API Alignment

  • Source checked: PyTorch torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None).
  • InfiniOps signature: scaled_softmax(input, scale, out).
  • Intentional deviation: scale is explicit because this operator models softmax(input * scale, dim=-1) and uses InfiniOps output-explicit calling style.
  • Test coverage: tests/test_scaled_softmax.py compares against torch.nn.functional.softmax(input.to(torch.float32) * scale, dim=-1).

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA N/A N/A Not touched.
Iluvatar N/A N/A Not touched.
MetaX N/A N/A Not touched.
Cambricon N/A N/A Not touched.
Moore N/A N/A Not touched.
Ascend Yes 27 passed 910B4, infiniops-pr-ascend-0512, ASCEND_VISIBLE_DEVICES=0.
Full `pytest` output
python3 -m pip install .[dev] --no-build-isolation -C cmake.define.WITH_ASCEND=ON -C cmake.define.BUILD_ASCEND_CUSTOM=OFF -C cmake.define.AUTO_DETECT_DEVICES=OFF -C cmake.define.GENERATE_PYTHON_BINDINGS=ON
pytest tests/test_scaled_softmax.py --devices ascend -v --tb=short

tests/test_scaled_softmax.py::test_scaled_softmax[...] PASSED

============================== 27 passed in 1.47s ==============================

Benchmark / Performance Impact

N/A

Notes for Reviewers

  • This PR intentionally excludes a CPU fallback to keep the Ascend operator PR minimal.
  • The Ascend path supports contiguous float16, bfloat16, and float32 tensors.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz.
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal.
  • No dead code, commented-out blocks, debug prints, or ownerless TODO entries.
  • No unrelated formatting churn.
  • Public API changes are intentional and tested.

General Code Hygiene

  • Comments are limited to useful interface notes.
  • Added files end with a trailing newline.
  • git diff --check passes.
  • Identifiers in comments and error messages are wrapped in backticks.
  • Comments and error messages are in English.
  • Comments and error messages follow project conventions.

C++ Specific

  • Code follows the Google C++ style used in this repository.
  • clang-format --dry-run --Werror src/base/scaled_softmax.h src/native/ascend/ops/scaled_softmax/kernel.h passes.
  • Operator parameter order is inputs, attributes, outputs.
  • No exceptions are thrown.
  • Error message wording was reviewed.
  • Kernel launcher is in kernel.h, matching the local Ascend operator pattern.
  • Initializer list order matches member declaration order.
  • Blank line rules were reviewed.
  • New operator follows the src/base/<op>.h plus src/native/ascend/ops/<op>/ pattern.
  • No raw new / delete.

Python Specific

  • ruff format --check tests/test_scaled_softmax.py passes.
  • ruff check tests/test_scaled_softmax.py passes.
  • Test comments/docstrings follow project conventions.
  • Type hints are consistent with surrounding tests.

Testing

  • Ascend focused test was run and recorded above.
  • New functionality has matching tests under tests/.
  • Tests use pytest.mark.parametrize and pytest.mark.auto_act_and_assert.
  • Default device parameterization is used.
  • N/A: No bug fix regression test is needed for this feature PR.

Build, CI, and Tooling

  • Project builds through pip install .[dev] in the Ascend test container.
  • Python bindings regenerate during the build.
  • No new backend or device auto-detection was added.
  • No new runtime dependency was added.

Documentation

  • N/A: README.md / CONTRIBUTING.md changes are not needed for this small operator addition.
  • New public operator behavior is documented in the base header and tests.
  • N/A: No user-visible breaking change.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • No third-party code was added.
  • Pointer and descriptor usage follows existing Ascend AclTensorCache patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant