Skip to content

Commit c4461d9

Browse files
author
shijiashuai
committed
docs: add index.md landing page for GitHub Pages
1 parent a4fd733 commit c4461d9

1 file changed

Lines changed: 56 additions & 0 deletions

File tree

index.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
layout: default
3+
title: LLM-Speed
4+
---
5+
6+
# LLM-Speed
7+
8+
CUDA LLM Kernel Optimization — 高性能 LLM 推理算子库,含 FlashAttention (online softmax)、FP16/INT8 GEMM with Tensor Core。
9+
10+
## 核心特性
11+
12+
- **FlashAttention** — Online softmax 实现,支持因果遮罩
13+
- **FP16 HGEMM** — Tensor Core 加速半精度矩阵乘法
14+
- **INT8 GEMM** — SM 75+ Tensor Core 量化矩阵乘
15+
- **Warp Primitives** — 高效 warp-level reduction / scan
16+
- **共享内存优化** — Bank conflict-free 访问模式
17+
- **Python 绑定** — 通过 pybind11 提供 Python 接口
18+
19+
## 算子实现
20+
21+
| Kernel | 关键技术 | 架构要求 |
22+
|--------|---------|---------|
23+
| Naive Attention | 共享内存 QK^T | SM 70+ |
24+
| Tiled Attention | 分块计算 + 流式 softmax | SM 70+ |
25+
| Flash Attention | Online softmax + 因果遮罩 | SM 70+ |
26+
| HGEMM | WMMA Tensor Core (FP16→FP32) | SM 70+ |
27+
| Tensor Core GEMM | INT8/FP16 混合精度 | SM 75+ |
28+
29+
## 快速开始
30+
31+
```bash
32+
# CMake 构建
33+
cmake --preset release
34+
cmake --build build/release -j$(nproc)
35+
36+
# Python 安装
37+
pip install -e .
38+
39+
# 运行测试
40+
pytest tests/
41+
```
42+
43+
## 技术栈
44+
45+
| 类别 | 技术 |
46+
|------|------|
47+
| 语言 | CUDA C++17, Python |
48+
| 构建 | CMake 3.18+, setup.py (CUDAExtension) |
49+
| 绑定 | pybind11 v2.11.1 |
50+
| GPU | SM 70+ (Volta → Hopper) |
51+
| 测试 | pytest + Hypothesis |
52+
53+
## 链接
54+
55+
- [GitHub 仓库](https://github.com/LessUp/llm-speed)
56+
- [README](README.md)

0 commit comments

Comments
 (0)