[Optimization]: Reduce branching when possible in casting.hpp#117
[Optimization]: Reduce branching when possible in casting.hpp#117zacharyvincze wants to merge 15 commits intoROCm:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the core casting helpers to reduce branching (especially for GPU code paths) and adjusts saturation behavior for some float→integer conversions, alongside adding a small test and extending supported type traits.
Changes:
- Refactors
ScalarSaturateCast/ScalarRangeCastlogic incasting.hppto use more branchless/min-max based clamping and special-case small integer widths. - Extends type traits support to include
long/ulongvectorized types. - Adds a new C++ test covering basic
SaturateCastbehavior and a few limit/vector cases. - Adjusts the GPU block dimensions for the Composite operator kernel launch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
include/core/detail/casting.hpp |
Refactors saturate/range cast implementations to reduce branching and adjust clamping/rounding logic. |
include/core/detail/type_traits.hpp |
Adds long / ulong to the type-traits macro set. |
tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp |
Introduces a basic unit test for SaturateCast, including a couple of vectorized casts. |
src/op_composite.cpp |
Changes GPU kernel launch block dimensions for the composite operator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if constexpr (sizeof(T) <= 2) { | ||
| // 8/16 bit integer cases. These can be represented exactly in floating point. | ||
| #ifdef __HIP_DEVICE_COMPILE__ | ||
| return static_cast<T>(rintf(fminf(fmaxf(v, minVal), maxVal))); | ||
| #else | ||
| return static_cast<T>(std::round(std::clamp(v, minVal, maxVal))); | ||
| #endif | ||
| } else { | ||
| // 32/64 bit integer cases. | ||
| #ifdef __HIP_DEVICE_COMPILE__ | ||
| U rounded = rintf(v); | ||
| #else | ||
| U rounded = std::round(v); | ||
| #endif |
There was a problem hiding this comment.
The device-side float->integer paths use float-specific intrinsics (rintf, fminf/fmaxf, __saturatef, __float2int_rn) even though U is any floating-point type. If U is double, this will downcast to float and can change rounding/saturation behavior. Consider either constraining these branches to U == float (static_assert / if constexpr) or adding double-correct implementations (rint, fmin/fmax, __double2int_rn, etc.).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #117 +/- ##
===========================================
+ Coverage 73.51% 73.59% +0.08%
===========================================
Files 77 77
Lines 2956 2969 +13
Branches 640 635 -5
===========================================
+ Hits 2173 2185 +12
+ Misses 338 337 -1
- Partials 445 447 +2
🚀 New features to boost your workflow:
|
Review: [Optimization] Reduce branching in casting.hppKernel optimization focusing on GPU divergence: Changes:
Assessment: Needs Review - Performance optimization. Reducing branching in GPU kernels is always good for warp efficiency. The fixes for 32/64-bit integer saturation casts sound important - precision issues in type conversion can be subtle bugs. Would benefit from:
Solid optimization PR. |
Details
casting.hpp. Aims to reduce divergence on GPU kernel implementations.