feat(hygon-gemm): add Hygon backend support for Add/Gemm by gongchensu · Pull Request #31 · InfiniTensor/InfiniOps

gongchensu · 2026-03-24T01:50:34Z

No description provided.

- Add `WITH_HYGON` build support and a Hygon `Add` backend that reuses the shared CUDA implementation. - Detect DTK `nvcc` from the Hygon toolkit layout and auto-detect the GPU arch from `rocminfo`. - Treat Hygon as a CUDA-like backend in shared data type, cast, and kernel helper headers. - Skip the Hygon `gemm` example for now and ignore `build-*` temporary directories. - Verified with `pip install -e .[dev]` and `pytest tests/test_add.py`.

- add a Hygon `Gemm` backend on top of the shared CUDA BLAS path - use DTK-friendly compute and algo settings for fp32/fp16 gemm - fall back to `cublasGemmEx` for single-batch Hygon gemm to avoid DTK crashes - release Hygon cublas handles after each call and re-enable the `gemm` example - verified with `pip install -e .[dev]`, `pytest tests/test_gemm.py -k cuda`, and `pytest tests/test_gemm.py`