Skip to content

feat: add rmsnorm op on cambricon impl#19

Open
bitzyz wants to merge 8 commits intofeat/dev-infrafrom
feat/dev-rmsnorm-cambricon
Open

feat: add rmsnorm op on cambricon impl#19
bitzyz wants to merge 8 commits intofeat/dev-infrafrom
feat/dev-rmsnorm-cambricon

Conversation

@bitzyz
Copy link

@bitzyz bitzyz commented Mar 12, 2026

  1. ✅Modified data_type.h to support low precision WITH_CAMBRICON
  2. ✅ Added rmsnorm cambricon op in src/cambricon/rms_norm
  3. ✅ Modified CMakeLists.txt to correctly handle .mlu files (using cncc compiler)
  4. ✅ Added definition for NRAM_MAX_SIZE
  5. ✅ Added definition for sumInternal function
  6. ✅ Used BANG macro for conditional compilation of MLU-specific code
  7. ✅ Fixed member initialization order in base/rms_norm.h

@bitzyz bitzyz self-assigned this Mar 12, 2026
@bitzyz bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch 3 times, most recently from 7655800 to 35e2a2a Compare March 13, 2026 03:35
@bitzyz
Copy link
Author

bitzyz commented Mar 13, 2026

image

@bitzyz bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch from 482aa74 to 0a49972 Compare March 16, 2026 06:36
@bitzyz bitzyz requested a review from voltjia March 16, 2026 06:38
@bitzyz bitzyz changed the title feat: add op on cambricon impl feat: add rmsnorm op on cambricon impl Mar 16, 2026
@bitzyz bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch 3 times, most recently from 1449203 to c811f43 Compare March 19, 2026 03:20
@voltjia voltjia force-pushed the feat/dev-rmsnorm-cambricon branch from c811f43 to 1d6fe71 Compare March 20, 2026 09:53
@voltjia
Copy link
Collaborator

voltjia commented Mar 24, 2026

NVIDIA

Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing /home/huangjiacheng/InfiniOps
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=391134 sha256=8c29800c01d85af8ab99748ab120211b19d524ea6f86e835bac6ce927efde0f0
  Stored in directory: /tmp/pip-ephem-wheel-cache-fu5w4g0o/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
============================= test session starts ==============================
platform linux -- Python 3.10.16, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.12.1, xdist-3.8.0, cov-7.0.0, typeguard-4.4.4
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 260.96s (0:04:20) =======================

MetaX

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=715997 sha256=77db25370cafaa85afd7734b9ec3f3fa1167bdb96758f862dd99118f7505cd49
  Stored in directory: /tmp/pip-ephem-wheel-cache-9dcyxl0l/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-8.4.1, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 311.44s (0:05:11) =======================

Iluvatar

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=335559 sha256=29e0f38b6c8c073faf6d8f1cfef452b035620ddf7b297ebc96eb8e06b5a9b58a
  Stored in directory: /tmp/pip-ephem-wheel-cache-6ytj2fc3/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.9.0, cov-7.0.0, xdist-3.8.0
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 242.95s (0:04:02) =======================

Moore

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=344617 sha256=5d5568d49ae55ed475663c754b72961730f50db84f66071a32f01f5ed0eefedb
  Stored in directory: /tmp/pip-ephem-wheel-cache-ovgmerxc/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-7.2.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps, configfile: pyproject.toml, testpaths: tests
plugins: cov-7.0.0, xdist-3.8.0, hypothesis-6.145.0, anyio-4.10.0
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
......................................................ssssssssssssssssss [  5%]
ssssssssssssssssss                                                       [  6%]
tests/test_causal_softmax.py ..................Fatal Python error: Segmentation fault

Current thread 0x00007fce511a0740 (most recent call first):
  File "/home/huangjiacheng/InfiniOps/tests/test_causal_softmax.py", line 43 in _causal_softmax
  File "/home/huangjiacheng/InfiniOps/tests/conftest.py", line 80 in pytest_pyfunc_call
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/python.py", line 1789 in runtest
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 167 in pytest_runtest_call
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 260 in <lambda>
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 339 in from_call
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 259 in call_runtest_hook
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 220 in call_and_report
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 131 in runtestprotocol
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 112 in pytest_runtest_protocol
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 324 in _main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 270 in wrap_session
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 167 in main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 190 in console_main
  File "/usr/local/lib/python3.10/dist-packages/pytest/__main__.py", line 5 in <module>
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86 in _run_code
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, torch_musa._MUSAC, markupsafe._speedups (total: 26)

Cambricon

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=175482 sha256=efe67dcb8982e6996bb1b3d329a6040a44bdd3225b1876b3da75d3a5ef643abb
  Stored in directory: /tmp/pip-ephem-wheel-cache-z9s23za6/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: xdist-3.8.0, cov-7.0.0, hypothesis-6.135.14
collected 3300 items

tests/test_add.py ...................................................... [  1%]
..................Fatal Python error: Segmentation fault

Current thread 0x0000fffe3a5a5090 (most recent call first):
  File "/home/huangjiacheng/InfiniOps/tests/test_add.py", line 66 in _add
  File "/home/huangjiacheng/InfiniOps/tests/conftest.py", line 80 in pytest_pyfunc_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/python.py", line 1720 in runtest
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 353 in from_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 372 in _main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 318 in wrap_session
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 199 in main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 223 in console_main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pytest/__main__.py", line 9 in <module>
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/runpy.py", line 86 in _run_code
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, torch_mlu._MLUC, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing (total: 61)

Comment on lines +34 to +61
DispatchFunc<
List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>,
List<Device::Type::kCambricon>>(
{static_cast<int64_t>(input.dtype()),
static_cast<int64_t>(Device::Type::kCambricon)},
0,
[&](auto input_tag) {
constexpr DataType IDT = static_cast<DataType>(ListGet<0>(input_tag));
using InputT = TypeMapType<IDT>;
DispatchFunc<
List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>,
List<Device::Type::kCambricon>>(
{static_cast<int64_t>(weight.dtype()),
static_cast<int64_t>(Device::Type::kCambricon)},
0,
[&](auto weight_tag) {
constexpr DataType WDT =
static_cast<DataType>(ListGet<0>(weight_tag));
using WeightT = TypeMapType<WDT>;

RmsnormUnion<InputT, WeightT>(
workspace, core_per_cluster, cluster_count, queue,
out.data(), input.data(), weight.data(), out_shape_.data(),
out_strides_.data(), input_strides_.data(), eps, ndim_);
},
"CambriconRmsNorm::operator() - weight dispatch", List<>{});
},
"CambriconRmsNorm::operator() - output dispatch", List<>{});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这里分发 Device::Type::kCambricon 还是必要的吗?
  2. 这里不用嵌套两个 DispatchFunc(), 可以直接一个就分发完所有的。

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改为一个dispatch分发

Comment on lines +13 to +14
class Operator<Add, Device::Type::kCpu> : public Add,
Caster<Device::Type::kCpu> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里还需要继承 Caster<Device::Type::kCpu>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不然 Cast 从哪来的?

@bitzyz
Copy link
Author

bitzyz commented Mar 24, 2026

image

@@ -0,0 +1,352 @@
#define WITH_CAMBRICON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该不需要加这个 #define

namespace infini::ops {

template <typename T, typename Tw>
__mlu_global__ void Rmsnorm(T *output, const T *input, const Tw *weight,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应改为 RmsNorm,后面的相同,参数顺序满足先输入后输出,后面的也相同。

Image

size_t *shape, ptrdiff_t *output_strides,
ptrdiff_t *input_strides, float epsilon,
int num_dims, int norm_dim_size) {
// Calculate problem dimensions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加句号,建议检查一下所有注释。

Image

namespace infini::ops {

template <typename T, typename Tw>
void RmsnormUnion(void *workspace, int core_per_cluster, int cluster_count,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按理说这不需要这种 forward declaration。跟其他平台一样,上面那个 kernel.mlu 应该是个头文件,比如在 CUDA 里是 kernel.cuh,寒武纪我没开发过 kernel,不清楚后缀,不知道是复用 .mlu 还是 .mluh 啥的,总之应该是个头文件。


namespace infini::ops {

template <typename T, typename Tw>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面这个参数应该改成 TW 之类的,因为这个 TW 应该是两个单词,要不然就写成 TWeight 这种,但是我不确定这个 Tw 是不是 TWeight,只是举个例子。内核具体实现逻辑暂且不在 review 范围。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件现在需要嘛,感觉压根儿没用到,可以去掉之后编译试试,如果现在没用到就先不加,什么时候用到了什么时候再加。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants