feat: add rmsnorm op on cambricon impl by bitzyz · Pull Request #19 · InfiniTensor/InfiniOps

bitzyz · 2026-03-12T01:48:18Z

✅Modified data_type.h to support low precision WITH_CAMBRICON
✅ Added rmsnorm cambricon op in src/cambricon/rms_norm
✅ Modified CMakeLists.txt to correctly handle .mlu files (using cncc compiler)
✅ Added definition for NRAM_MAX_SIZE
✅ Added definition for sumInternal function
✅ Used BANG macro for conditional compilation of MLU-specific code
✅ Fixed member initialization order in base/rms_norm.h

bitzyz · 2026-03-13T06:19:08Z

voltjia · 2026-03-24T07:29:28Z

NVIDIA

Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing /home/huangjiacheng/InfiniOps
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=391134 sha256=8c29800c01d85af8ab99748ab120211b19d524ea6f86e835bac6ce927efde0f0
  Stored in directory: /tmp/pip-ephem-wheel-cache-fu5w4g0o/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
============================= test session starts ==============================
platform linux -- Python 3.10.16, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.12.1, xdist-3.8.0, cov-7.0.0, typeguard-4.4.4
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 260.96s (0:04:20) =======================

MetaX

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=715997 sha256=77db25370cafaa85afd7734b9ec3f3fa1167bdb96758f862dd99118f7505cd49
  Stored in directory: /tmp/pip-ephem-wheel-cache-9dcyxl0l/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-8.4.1, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 311.44s (0:05:11) =======================

Iluvatar

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=335559 sha256=29e0f38b6c8c073faf6d8f1cfef452b035620ddf7b297ebc96eb8e06b5a9b58a
  Stored in directory: /tmp/pip-ephem-wheel-cache-6ytj2fc3/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.9.0, cov-7.0.0, xdist-3.8.0
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
........................................................................ [  5%]
..................                                                       [  6%]
tests/test_causal_softmax.py ....................................        [  7%]
tests/test_gemm.py ..................................................... [  9%]
........................................................................ [ 11%]
........................................................................ [ 13%]
........................................................................ [ 15%]
........................................................................ [ 17%]
........................................................................ [ 19%]
........................................................................ [ 21%]
........................................................................ [ 23%]
........................................................................ [ 26%]
........................................................................ [ 28%]
........................................................................ [ 30%]
........................................................................ [ 32%]
........................................................................ [ 34%]
........................................................................ [ 36%]
........................................................................ [ 38%]
........................................................................ [ 41%]
........................................................................ [ 43%]
........................................................................ [ 45%]
........................................................................ [ 47%]
........................................................................ [ 49%]
........................................................................ [ 51%]
........................................................................ [ 53%]
........................................................................ [ 56%]
........................................................................ [ 58%]
........................................................................ [ 60%]
........................................................................ [ 62%]
........................................................................ [ 64%]
........................................................................ [ 66%]
........................................................................ [ 68%]
........................................................................ [ 70%]
........................................................................ [ 73%]
........................................................................ [ 75%]
........................................................................ [ 77%]
........................................................................ [ 79%]
........................................................................ [ 81%]
........................................................................ [ 83%]
........................................................................ [ 85%]
........................................................................ [ 88%]
........................................................................ [ 90%]
........................................................................ [ 92%]
........................................................................ [ 94%]
...................................................................      [ 96%]
tests/test_rms_norm.py ................................................. [ 97%]
.......................                                                  [ 98%]
tests/test_swiglu.py ................................................    [100%]

======================= 3372 passed in 242.95s (0:04:02) =======================

Moore

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_x86_64.whl size=344617 sha256=5d5568d49ae55ed475663c754b72961730f50db84f66071a32f01f5ed0eefedb
  Stored in directory: /tmp/pip-ephem-wheel-cache-ovgmerxc/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-7.2.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps, configfile: pyproject.toml, testpaths: tests
plugins: cov-7.0.0, xdist-3.8.0, hypothesis-6.145.0, anyio-4.10.0
collected 3372 items

tests/test_add.py ...................................................... [  1%]
........................................................................ [  3%]
......................................................ssssssssssssssssss [  5%]
ssssssssssssssssss                                                       [  6%]
tests/test_causal_softmax.py ..................Fatal Python error: Segmentation fault

Current thread 0x00007fce511a0740 (most recent call first):
  File "/home/huangjiacheng/InfiniOps/tests/test_causal_softmax.py", line 43 in _causal_softmax
  File "/home/huangjiacheng/InfiniOps/tests/conftest.py", line 80 in pytest_pyfunc_call
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/python.py", line 1789 in runtest
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 167 in pytest_runtest_call
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 260 in <lambda>
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 339 in from_call
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 259 in call_runtest_hook
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 220 in call_and_report
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 131 in runtestprotocol
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 112 in pytest_runtest_protocol
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 349 in pytest_runtestloop
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 324 in _main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 270 in wrap_session
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 317 in pytest_cmdline_main
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 167 in main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 190 in console_main
  File "/usr/local/lib/python3.10/dist-packages/pytest/__main__.py", line 5 in <module>
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86 in _run_code
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, torch_musa._MUSAC, markupsafe._speedups (total: 26)

Cambricon

WARNING: The directory '/root/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Processing ./.
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: InfiniOps
  Building wheel for InfiniOps (pyproject.toml): started
  Building wheel for InfiniOps (pyproject.toml): finished with status 'done'
  Created wheel for InfiniOps: filename=infiniops-0.1.0-cp310-cp310-linux_aarch64.whl size=175482 sha256=efe67dcb8982e6996bb1b3d329a6040a44bdd3225b1876b3da75d3a5ef643abb
  Stored in directory: /tmp/pip-ephem-wheel-cache-z9s23za6/wheels/b5/f3/6e/961028f73a2712eb7b753a23bdd1baaa714e27760fd92de56c
Successfully built InfiniOps
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Installing collected packages: InfiniOps
  Attempting uninstall: InfiniOps
    Found existing installation: InfiniOps 0.1.0
    Uninstalling InfiniOps-0.1.0:
      Successfully uninstalled InfiniOps-0.1.0
WARNING: Ignoring invalid distribution -orch (/home/huangjiacheng/.venv/lib/python3.10/site-packages)
Successfully installed InfiniOps-0.1.0
============================= test session starts ==============================
platform linux -- Python 3.10.20, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/huangjiacheng/InfiniOps
configfile: pyproject.toml
testpaths: tests
plugins: xdist-3.8.0, cov-7.0.0, hypothesis-6.135.14
collected 3300 items

tests/test_add.py ...................................................... [  1%]
..................Fatal Python error: Segmentation fault

Current thread 0x0000fffe3a5a5090 (most recent call first):
  File "/home/huangjiacheng/InfiniOps/tests/test_add.py", line 66 in _add
  File "/home/huangjiacheng/InfiniOps/tests/conftest.py", line 80 in pytest_pyfunc_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/python.py", line 1720 in runtest
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 179 in pytest_runtest_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 245 in <lambda>
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 353 in from_call
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 244 in call_and_report
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 137 in runtestprotocol
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/runner.py", line 118 in pytest_runtest_protocol
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 396 in pytest_runtestloop
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 372 in _main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 318 in wrap_session
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/main.py", line 365 in pytest_cmdline_main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 199 in main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/_pytest/config/__init__.py", line 223 in console_main
  File "/home/huangjiacheng/.venv/lib/python3.10/site-packages/pytest/__main__.py", line 9 in <module>
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/runpy.py", line 86 in _run_code
  File "/home/huangjiacheng/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/runpy.py", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, torch_mlu._MLUC, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing (total: 61)

Ziminli · 2026-03-24T07:51:16Z

src/cambricon/rmsnorm/rms_norm.h

+    DispatchFunc<
+        List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>,
+        List<Device::Type::kCambricon>>(
+        {static_cast<int64_t>(input.dtype()),
+         static_cast<int64_t>(Device::Type::kCambricon)},
+        0,
+        [&](auto input_tag) {
+          constexpr DataType IDT = static_cast<DataType>(ListGet<0>(input_tag));
+          using InputT = TypeMapType<IDT>;
+          DispatchFunc<
+              List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>,
+              List<Device::Type::kCambricon>>(
+              {static_cast<int64_t>(weight.dtype()),
+               static_cast<int64_t>(Device::Type::kCambricon)},
+              0,
+              [&](auto weight_tag) {
+                constexpr DataType WDT =
+                    static_cast<DataType>(ListGet<0>(weight_tag));
+                using WeightT = TypeMapType<WDT>;
+
+                RmsnormUnion<InputT, WeightT>(
+                    workspace, core_per_cluster, cluster_count, queue,
+                    out.data(), input.data(), weight.data(), out_shape_.data(),
+                    out_strides_.data(), input_strides_.data(), eps, ndim_);
+              },
+              "CambriconRmsNorm::operator() - weight dispatch", List<>{});
+        },
+        "CambriconRmsNorm::operator() - output dispatch", List<>{});


这里分发 Device::Type::kCambricon 还是必要的吗？

这里不用嵌套两个 DispatchFunc(), 可以直接一个就分发完所有的。

已修改为一个dispatch分发

Ziminli · 2026-03-24T07:55:56Z

src/cpu/add/add.h

+class Operator<Add, Device::Type::kCpu> : public Add,
+                                          Caster<Device::Type::kCpu> {


为什么这里还需要继承 Caster<Device::Type::kCpu> 呢

要不然 Cast 从哪来的？

bitzyz · 2026-03-24T09:28:15Z

voltjia · 2026-03-25T02:57:41Z

src/cambricon/rmsnorm/kernel.mlu

@@ -0,0 +1,352 @@
+#define WITH_CAMBRICON


这里应该不需要加这个 #define。

voltjia · 2026-03-25T02:58:24Z

src/cambricon/rmsnorm/kernel.mlu

+namespace infini::ops {
+
+template <typename T, typename Tw>
+__mlu_global__ void Rmsnorm(T *output, const T *input, const Tw *weight,


应改为 RmsNorm，后面的相同，参数顺序满足先输入后输出，后面的也相同。

voltjia · 2026-03-25T03:00:05Z

src/cambricon/rmsnorm/kernel.mlu

+                            size_t *shape, ptrdiff_t *output_strides,
+                            ptrdiff_t *input_strides, float epsilon,
+                            int num_dims, int norm_dim_size) {
+  // Calculate problem dimensions


需要加句号，建议检查一下所有注释。

voltjia · 2026-03-25T03:02:12Z

src/cambricon/rmsnorm/rms_norm.h

+namespace infini::ops {
+
+template <typename T, typename Tw>
+void RmsnormUnion(void *workspace, int core_per_cluster, int cluster_count,


按理说这不需要这种 forward declaration。跟其他平台一样，上面那个 kernel.mlu 应该是个头文件，比如在 CUDA 里是 kernel.cuh，寒武纪我没开发过 kernel，不清楚后缀，不知道是复用 .mlu 还是 .mluh 啥的，总之应该是个头文件。

voltjia · 2026-03-25T03:03:33Z

src/cambricon/rmsnorm/kernel.mlu

+
+namespace infini::ops {
+
+template <typename T, typename Tw>


后面这个参数应该改成 TW 之类的，因为这个 T 和 W 应该是两个单词，要不然就写成 TWeight 这种，但是我不确定这个 Tw 是不是 TWeight，只是举个例子。内核具体实现逻辑暂且不在 review 范围。

voltjia · 2026-03-25T03:05:49Z

src/cambricon/caster_.h

这个文件现在需要嘛，感觉压根儿没用到，可以去掉之后编译试试，如果现在没用到就先不加，什么时候用到了什么时候再加。

bitzyz self-assigned this Mar 12, 2026

bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch 3 times, most recently from 7655800 to 35e2a2a Compare March 13, 2026 03:35

bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch from 482aa74 to 0a49972 Compare March 16, 2026 06:36

bitzyz requested a review from voltjia March 16, 2026 06:38

bitzyz changed the title ~~feat: add op on cambricon impl~~ feat: add rmsnorm op on cambricon impl Mar 16, 2026

bitzyz force-pushed the feat/dev-rmsnorm-cambricon branch 3 times, most recently from 1449203 to c811f43 Compare March 19, 2026 03:20

bitzyz and others added 2 commits March 20, 2026 16:22

feat: Add RMSNorm op in cambricon backend.

0a6c187

refactor: make Cast utility to use Device::Type template parameter

1d6fe71

voltjia force-pushed the feat/dev-rmsnorm-cambricon branch from c811f43 to 1d6fe71 Compare March 20, 2026 09:53

voltjia and others added 5 commits March 23, 2026 17:20

refactor: add Caster mixin

aa3d2ca

refactor: rename cast** to caster**

802d44e

fix: fix the mlu naming to google c++ naming style

738e4c9

chore: format files with clang-format

b2221bd

refactor: update CUDA kernels to use Caster

de90bab

Ziminli requested changes Mar 24, 2026

View reviewed changes

fix: fix rmsnorm dispatch to use one dispatch

c913436

voltjia requested changes Mar 25, 2026

View reviewed changes

		class Operator<Add, Device::Type::kCpu> : public Add,
		Caster<Device::Type::kCpu> {

Conversation

bitzyz commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bitzyz commented Mar 13, 2026

Uh oh!

voltjia commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bitzyz commented Mar 24, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bitzyz commented Mar 12, 2026 •

edited

Loading