Skip to content

feat(ascend): add embedding operator#598

Draft
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-embedding-operator
Draft

feat(ascend): add embedding operator#598
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:feat/ascend-embedding-operator

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Summary

  • Add the Embedding base operator for inference-time token lookup.
  • Add the Ascend ACLNN implementation backed by aclnnEmbedding.
  • Add focused Ascend tests for 1D/2D token ids, int32/int64 indices, and float32/float16/bfloat16 weights.

Motivation

Embedding is needed for Ascend inference workflows that map token ids to hidden states before decoder execution.

Closes N/A

API Alignment

  • Source checked: PyTorch torch.nn.functional.embedding(input, weight, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False).
  • InfiniOps signature: embedding(input_ids, weight, out).
  • Intentional deviation: this PR implements the inference subset only. Training-oriented options such as padding_idx, max_norm, scale_grad_by_freq, and sparse are intentionally out of scope.
  • Test coverage: tests/test_embedding.py compares against torch.nn.functional.embedding(input_ids.long(), weight).

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA N/A N/A Not touched.
Iluvatar N/A N/A Not touched.
MetaX N/A N/A Not touched.
Cambricon N/A N/A Not touched.
Moore N/A N/A Not touched.
Ascend Yes 12 passed 910B4, infiniops-pr-ascend-0512, ASCEND_VISIBLE_DEVICES=0.
Full `pytest` output
python3 -m pip install .[dev] --no-build-isolation -C cmake.define.WITH_ASCEND=ON -C cmake.define.BUILD_ASCEND_CUSTOM=OFF -C cmake.define.AUTO_DETECT_DEVICES=OFF -C cmake.define.GENERATE_PYTHON_BINDINGS=ON
pytest tests/test_embedding.py --devices ascend -v --tb=short

tests/test_embedding.py::test_embedding[...] PASSED

============================== 12 passed in 1.42s ==============================

Benchmark / Performance Impact

N/A

Notes for Reviewers

  • This is intentionally limited to the inference subset of PyTorch embedding.
  • The Ascend path supports float16, bfloat16, and float32 weights.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits.
  • Branch name follows <type>/xxx-yyyy-zzzz.
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit.
  • No stray merge commits from master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal.
  • No dead code, commented-out blocks, debug prints, or ownerless TODO entries.
  • No unrelated formatting churn.
  • Public API changes are intentional and tested.

General Code Hygiene

  • Comments are limited to useful interface notes.
  • Added files end with a trailing newline.
  • git diff --check passes.
  • Identifiers in comments and error messages are wrapped in backticks.
  • Comments and error messages are in English.
  • Comments and error messages follow project conventions.

C++ Specific

  • Code follows the Google C++ style used in this repository.
  • clang-format --dry-run --Werror src/base/embedding.h src/native/ascend/ops/embedding/kernel.h passes.
  • Operator parameter order is inputs, attributes, outputs.
  • No exceptions are thrown.
  • Error message wording was reviewed.
  • Kernel launcher is in kernel.h, matching the local Ascend operator pattern.
  • Initializer list order matches member declaration order.
  • Blank line rules were reviewed.
  • New operator follows the src/base/<op>.h plus src/native/ascend/ops/<op>/ pattern.
  • No raw new / delete.

Python Specific

  • ruff format --check tests/test_embedding.py passes.
  • ruff check tests/test_embedding.py passes.
  • Test comments/docstrings follow project conventions.
  • Type hints are consistent with surrounding tests.

Testing

  • Ascend focused test was run and recorded above.
  • New functionality has matching tests under tests/.
  • Tests use pytest.mark.parametrize and pytest.mark.auto_act_and_assert.
  • Default device parameterization is used.
  • N/A: No bug fix regression test is needed for this feature PR.

Build, CI, and Tooling

  • Project builds through pip install .[dev] in the Ascend test container.
  • Python bindings regenerate during the build.
  • No new backend or device auto-detection was added.
  • No new runtime dependency was added.

Documentation

  • N/A: README.md / CONTRIBUTING.md changes are not needed for this small operator addition.
  • New public operator behavior is documented in the base header.
  • N/A: No user-visible breaking change.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • No third-party code was added.
  • Pointer and descriptor usage follows existing Ascend AclTensorCache patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant