-
fill_kernel() -> aclnnop_FillScalar GeneralOp 类型
-
backends -> cupy.backends, git mv + _features.py disable _preflight, so ignore Cutensor submodule
-
enable most of math api (done), aclnnop register (almost done) matmul (done), dot
-
BitwiseAddScalar op register, _kernel.pyx need update (done) pytest
-
reduction kernel, replaced by aclnnop ===
-
concat/pad/reshape op numpy_to_acl_dtype -> numpy_dtype_to_acl_dtype
-
triton-fusion (add data adaptor API)
-
aclBlas integration
-
test build on diff OS, hardware with float32 support (310P?)
-
testing (after eanble most numpy API)
-
inplace op : _kernel.pyx need update
- templated ascend kernel JIT
- compile customer kernel
- impl missing numpy op for ascend
- random
- FFT
- single node multiple NPU distribution test
- multi-node multiple NPU
- double datatype (float32, int64)
- sparse matrix
CuPy在AMD ROCm平台上的数组创建问题分析与解决方案 - GitCode博客
不清楚CANN 和 triton-ascend 路标
ElementwiseKernel 可以做到类似 CUDA的层面, 就是有些工作量.
可以numpy来生成随机数, AsNumpy has impl
rand: #include <aclnnop/aclnn_rand.h> 有算子
HIP_random = {
'name': 'random',
'required': True,
'file': [
'cupy.random._bit_generator',
('cupy.random._generator_api',
['cupy/random/cupy_distributions.cu']),
],
'include': [
'hiprand/hiprand.h',
],
'libraries': [
# Dependency from cuRAND header files
'amdhip64', # was hiprtc and hip_hcc before ROCm 3.8.0
'hiprand',
],
'check_method': build.check_hip_version,
'version_method': build.get_hip_version,
}AsdSip也支持FFT一些算子
CUPY的多卡, 本身需要测试工作量.
HCCL和NCCL的API兼容性, 粗看很相似.
cusparse + cusolver: aicpu, torch-npu
我中心在稀疏矩阵乘算子研发中取得新进展--中国科学院计算机网络信息中心
#ifdef defined(CUPY_USE_ASCEND)
// not sure if CANN support solver, leave it later
//#include "ascend/cupy_ascend_solver.h"
#include "stub/cupy_cusolver.h" // gracefully give error message
_numpy_to_backend_dtype() 不同的backend实现不同的pyx文件
sparse, 等ascend缺失的暂不做中性化处理, 直接放入cuda/libs
这些基本GPU专用, 或者不是核心必须得代码, 拆出到cuda backend去维护 cupy.backends.backend._runtime.pyx
IF CUPY_CANN_VERSION <= 0:
# Provide access to constants from Python.
# TODO(kmaehashi): Deprecate aliases above so that we can just do:
# from cupy.backends.cuda.api._runtime_enum import *
# from cupy.backends.cuda.api._device_prop import *
def _export_enum():
import sys
import cupy.backends.backend.api._runtime_enum as _runtime_enum
this = sys.modules[__name__]
for key in dir(_runtime_enum):
if not key.startswith('_'):
setattr(this, key, getattr(_runtime_enum, key))
_export_enum()
ELSE:
# in the future, add ascend cann enum here#ifdef CUPY_USE_HIP
// Since ROCm/HIP does not have cuTENSOR, we simply include the stubs here
// to avoid code dup.
#include "stub/cupy_cutensor.h"