You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: include/ck/BUILD_TIME_OPTIMIZATION.md
+25-7Lines changed: 25 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,6 +8,16 @@ This document describes techniques for reducing C++ template instantiation overh
8
8
9
9
Composable Kernel relies heavily on C++ template metaprogramming to achieve GPU kernels with no runtime abstraction penalty. However, deep template instantiation can significantly impact build times. A single translation unit may trigger hundreds of thousands of template instantiations, with each instantiation adding to compile time.
10
10
11
+
## Key Types
12
+
13
+
This codebase uses compile-time types to enable zero-overhead abstractions:
14
+
15
+
-`Number<N>` - compile-time integer, enables static dispatch and compile-time arithmetic
16
+
-`Sequence<Is...>` - compile-time integer sequence, used for dimension ordering and index manipulation
17
+
-`Tuple<Ts...>` - heterogeneous container holding different types, used for tensor descriptors and transforms
18
+
19
+
These types allow the compiler to fully unroll loops, eliminate branches, and inline all operations - producing GPU kernels with no runtime abstraction cost.
20
+
11
21
## Optimization Techniques
12
22
13
23
### 1. Replace Recursive Templates with Pack Expansion
@@ -65,7 +75,7 @@ struct sequence_gen
65
75
};
66
76
```
67
77
68
-
Note: While `std::make_integer_sequence`is the standard C++14 way to generate integer sequences, it only produces `std::integer_sequence<T, ...>`. We use `__make_integer_seq` directly because it accepts any template as its first argument, enabling this pattern where the helper class receives the index pack directly.
78
+
Note: This document assumes C++17 or later. While `std::make_integer_sequence`(introduced in C++14) is the standard library facility for generating integer sequences, it only produces `std::integer_sequence<T, ...>`. We use `__make_integer_seq` directly because it accepts any template as its first argument, enabling this pattern where the helper class receives the index pack directly.
69
79
70
80
### 2. Replace Lambdas with Named Functors
71
81
@@ -153,11 +163,18 @@ Template recursion creates N template instantiations for N iterations. A constex
// Simplified example - actual implementation handles empty sequences
170
188
constexpr index_t values[] = {Is...};
171
189
for(index_t i = 0; i < sizeof...(Is); ++i)
172
190
if(values[i] == Target) return i;
173
-
return 0;
191
+
return -1; // not found
174
192
}
175
193
```
176
194
@@ -180,14 +198,14 @@ This reduced `sequence_map_inverse` instantiations from 45 to 10 (78% reduction)
180
198
181
199
Fold expressions (C++17) can replace recursive template patterns for accumulation operations.
182
200
183
-
**Before** (implicit recursion through generate_tupleand container_reduce):
201
+
**Before** (uses helper utilities that hide template recursion: `generate_tuple` recursively constructs a tuple of N elements, and `container_reduce` recursively reduces that tuple):
0 commit comments