[Optimization]: Reduce branching when possible in casting.hpp by zacharyvincze · Pull Request #117 · ROCm/rocCV

zacharyvincze · 2026-02-06T20:13:00Z

Details

Removes branching where possible to the casting helper functions seen in casting.hpp. Aims to reduce divergence on GPU kernel implementations.
Includes fixes to some float -> integer saturation casts, especially for 32/64-bit integer cases that are not represented exactly as 32-bit floats.

Copilot

Pull request overview

This PR updates the core casting helpers to reduce branching (especially for GPU code paths) and adjusts saturation behavior for some float→integer conversions, alongside adding a small test and extending supported type traits.

Changes:

Refactors ScalarSaturateCast / ScalarRangeCast logic in casting.hpp to use more branchless/min-max based clamping and special-case small integer widths.
Extends type traits support to include long/ulong vectorized types.
Adds a new C++ test covering basic SaturateCast behavior and a few limit/vector cases.
Adjusts the GPU block dimensions for the Composite operator kernel launch.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`include/core/detail/casting.hpp`	Refactors saturate/range cast implementations to reduce branching and adjust clamping/rounding logic.
`include/core/detail/type_traits.hpp`	Adds `long` / `ulong` to the type-traits macro set.
`tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp`	Introduces a basic unit test for `SaturateCast`, including a couple of vectorized casts.
`src/op_composite.cpp`	Changes GPU kernel launch block dimensions for the composite operator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/core/detail/casting.hpp

Copilot · 2026-02-06T20:19:14Z

include/core/detail/casting.hpp

+        if constexpr (sizeof(T) <= 2) {
+            // 8/16 bit integer cases. These can be represented exactly in floating point.
+#ifdef __HIP_DEVICE_COMPILE__
+            return static_cast<T>(rintf(fminf(fmaxf(v, minVal), maxVal)));
+#else
+            return static_cast<T>(std::round(std::clamp(v, minVal, maxVal)));
+#endif
+        } else {
+            // 32/64 bit integer cases.
+#ifdef __HIP_DEVICE_COMPILE__
+            U rounded = rintf(v);
+#else
+            U rounded = std::round(v);
+#endif


The device-side float->integer paths use float-specific intrinsics (rintf, fminf/fmaxf, __saturatef, __float2int_rn) even though U is any floating-point type. If U is double, this will downcast to float and can change rounding/saturation behavior. Consider either constraining these branches to U == float (static_assert / if constexpr) or adding double-correct implementations (rint, fmin/fmax, __double2int_rn, etc.).

tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp

src/op_composite.cpp

codecov-commenter · 2026-02-06T20:45:49Z

Codecov Report

❌ Patch coverage is 54.54545% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
include/core/detail/casting.hpp	54.55%	12 Missing and 3 partials ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #117      +/-   ##
===========================================
+ Coverage    73.51%   73.59%   +0.08%     
===========================================
  Files           77       77              
  Lines         2956     2969      +13     
  Branches       640      635       -5     
===========================================
+ Hits          2173     2185      +12     
+ Misses         338      337       -1     
- Partials       445      447       +2

Files with missing lines	Coverage Δ
include/core/detail/type_traits.hpp	`87.50% <ø> (ø)`
include/core/detail/casting.hpp	`78.26% <54.55%> (+2.31%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

simonCatBot · 2026-03-20T21:25:16Z

Review: [Optimization] Reduce branching in casting.hpp

Kernel optimization focusing on GPU divergence:

Changes:

Branch reduction in casting helper functions
Fixes for float->integer saturation casts (32/64-bit cases)
4 files changed, +129/-33 lines

Assessment: Needs Review - Performance optimization.

Reducing branching in GPU kernels is always good for warp efficiency. The fixes for 32/64-bit integer saturation casts sound important - precision issues in type conversion can be subtle bugs.

Would benefit from:

Performance benchmarks showing divergence reduction
Verification that precision is maintained for edge cases
Review of the saturation logic changes

Solid optimization PR.

zacharyvincze added 3 commits January 30, 2026 10:31

Avoid branching in casting implementations

4232bcd

Add more tests for Saturate cast

77cabc7

Fix issues with float -> integer saturate casts

d887102

zacharyvincze requested review from Copilot, jeffqjiangNew and paveltc February 6, 2026 20:13

zacharyvincze self-assigned this Feb 6, 2026

zacharyvincze added enhancement New feature or request ci:precheckin labels Feb 6, 2026

Copilot started reviewing on behalf of zacharyvincze February 6, 2026 20:13 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

Undo changes to composite

146a1f9

zacharyvincze added 11 commits February 6, 2026 16:18

Review fixes

e9e9f0b

Add another test case for RangeCast

13a78be

Merge branch 'develop' into zv/optimization/optimize-casting-performance

f1b1571

Merge branch 'develop' into zv/optimization/optimize-casting-performance

146c95f

Merge branch 'develop' into zv/optimization/optimize-casting-performance

1cbedda

Merge branch 'develop' into zv/optimization/optimize-casting-performance

f97231c

Merge branch 'develop' into zv/optimization/optimize-casting-performance

e712805

Merge branch 'develop' into zv/optimization/optimize-casting-performance

9ae56a5

Merge branch 'develop' into zv/optimization/optimize-casting-performance

12d355b

Merge branch 'develop' into zv/optimization/optimize-casting-performance

a883238

Merge branch 'develop' into zv/optimization/optimize-casting-performance

2fcaf2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization]: Reduce branching when possible in casting.hpp#117

[Optimization]: Reduce branching when possible in casting.hpp#117
zacharyvincze wants to merge 15 commits intoROCm:developfrom
zacharyvincze:zv/optimization/optimize-casting-performance

zacharyvincze commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 6, 2026 •

edited

Loading

Uh oh!

simonCatBot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zacharyvincze commented Feb 6, 2026

Details

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

simonCatBot commented Mar 20, 2026

Review: [Optimization] Reduce branching in casting.hpp

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Feb 6, 2026 •

edited

Loading