Open
Conversation
7655800 to
35e2a2a
Compare
Author
482aa74 to
0a49972
Compare
1449203 to
c811f43
Compare
c811f43 to
1d6fe71
Compare
Collaborator
|
NVIDIA MetaX Iluvatar Moore Cambricon |
Ziminli
requested changes
Mar 24, 2026
src/cambricon/rmsnorm/rms_norm.h
Outdated
Comment on lines
+34
to
+61
| DispatchFunc< | ||
| List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>, | ||
| List<Device::Type::kCambricon>>( | ||
| {static_cast<int64_t>(input.dtype()), | ||
| static_cast<int64_t>(Device::Type::kCambricon)}, | ||
| 0, | ||
| [&](auto input_tag) { | ||
| constexpr DataType IDT = static_cast<DataType>(ListGet<0>(input_tag)); | ||
| using InputT = TypeMapType<IDT>; | ||
| DispatchFunc< | ||
| List<DataType::kFloat16, DataType::kBFloat16, DataType::kFloat32>, | ||
| List<Device::Type::kCambricon>>( | ||
| {static_cast<int64_t>(weight.dtype()), | ||
| static_cast<int64_t>(Device::Type::kCambricon)}, | ||
| 0, | ||
| [&](auto weight_tag) { | ||
| constexpr DataType WDT = | ||
| static_cast<DataType>(ListGet<0>(weight_tag)); | ||
| using WeightT = TypeMapType<WDT>; | ||
|
|
||
| RmsnormUnion<InputT, WeightT>( | ||
| workspace, core_per_cluster, cluster_count, queue, | ||
| out.data(), input.data(), weight.data(), out_shape_.data(), | ||
| out_strides_.data(), input_strides_.data(), eps, ndim_); | ||
| }, | ||
| "CambriconRmsNorm::operator() - weight dispatch", List<>{}); | ||
| }, | ||
| "CambriconRmsNorm::operator() - output dispatch", List<>{}); |
Collaborator
There was a problem hiding this comment.
- 这里分发
Device::Type::kCambricon还是必要的吗? - 这里不用嵌套两个
DispatchFunc(), 可以直接一个就分发完所有的。
Comment on lines
+13
to
+14
| class Operator<Add, Device::Type::kCpu> : public Add, | ||
| Caster<Device::Type::kCpu> { |
Collaborator
There was a problem hiding this comment.
为什么这里还需要继承 Caster<Device::Type::kCpu> 呢
Author
voltjia
requested changes
Mar 25, 2026
| @@ -0,0 +1,352 @@ | |||
| #define WITH_CAMBRICON | |||
| namespace infini::ops { | ||
|
|
||
| template <typename T, typename Tw> | ||
| __mlu_global__ void Rmsnorm(T *output, const T *input, const Tw *weight, |
Collaborator
| size_t *shape, ptrdiff_t *output_strides, | ||
| ptrdiff_t *input_strides, float epsilon, | ||
| int num_dims, int norm_dim_size) { | ||
| // Calculate problem dimensions |
| namespace infini::ops { | ||
|
|
||
| template <typename T, typename Tw> | ||
| void RmsnormUnion(void *workspace, int core_per_cluster, int cluster_count, |
Collaborator
There was a problem hiding this comment.
按理说这不需要这种 forward declaration。跟其他平台一样,上面那个 kernel.mlu 应该是个头文件,比如在 CUDA 里是 kernel.cuh,寒武纪我没开发过 kernel,不清楚后缀,不知道是复用 .mlu 还是 .mluh 啥的,总之应该是个头文件。
|
|
||
| namespace infini::ops { | ||
|
|
||
| template <typename T, typename Tw> |
Collaborator
There was a problem hiding this comment.
后面这个参数应该改成 TW 之类的,因为这个 T 和 W 应该是两个单词,要不然就写成 TWeight 这种,但是我不确定这个 Tw 是不是 TWeight,只是举个例子。内核具体实现逻辑暂且不在 review 范围。
Collaborator
There was a problem hiding this comment.
这个文件现在需要嘛,感觉压根儿没用到,可以去掉之后编译试试,如果现在没用到就先不加,什么时候用到了什么时候再加。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Uh oh!
There was an error while loading. Please reload this page.