You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.
Key changes:
- Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode
- Added encode_scan_identity_array! to reuse existing identity encoding
- Added scan intrinsic implementation using operation_identity from IntegerReduce
- Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion
- Added comprehensive codegen tests for scan operations
- Added scankernel.jl example demonstrating CSDL scan algorithm
Features:
- Supports cumulative sum (cumsum) for float and integer types
- Supports both forward and reverse scan directions
- Reuses FloatIdentityOp and IntegerIdentityOp from IntegerReduce
- Uses operation_identity function for cleaner identity value creation
- 1-indexed axis parameter (consistent with reduce operations)
- Preserves tile shape (scan is an element-wise operation along one dimension)
Tests:
- All 142 codegen tests pass (including 6 new scan tests)
- Scankernel.jl example runs successfully with CSDL algorithm
- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)
Minor comment improvements in scankernel.jl example
- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)
0 commit comments