[Optimization]: Use dynamic HIP kernel block sizes#122
[Optimization]: Use dynamic HIP kernel block sizes#122zacharyvincze wants to merge 19 commits intoROCm:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces dynamic HIP kernel block size determination to optimize performance across different GPU architectures. It adds helper functions GetMaximumPotentialBlockSize2D and GetGridSize2D to replace hardcoded block sizes throughout the codebase.
Changes:
- Added
GetMaximumPotentialBlockSize2DandGetGridSize2Dhelper functions ininclude/core/detail/hip_utils.hpp - Refactored 14 operator files to use dynamic block sizes instead of hardcoded values
- Cleaned up unused includes and reorganized some headers
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| include/core/detail/hip_utils.hpp | Adds helper functions for dynamic 2D block size calculation using HIP occupancy API |
| include/kernels/host/bnd_box_host.hpp | Adds missing math_vector.hpp include |
| include/kernels/device/bnd_box_device.hpp | Adds missing math_vector.hpp include |
| src/op_warp_perspective.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_thresholding.cpp | Replaces hardcoded (64,16) block size with dynamic calculation per threshold type |
| src/op_rotate.cpp | Replaces hardcoded (32,16) block size with dynamic calculation |
| src/op_resize.cpp | Replaces hardcoded (64,16) block size with dynamic calculation; reformats function map |
| src/op_remap.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_normalize.cpp | Replaces hardcoded (32,8) block size with dynamic calculation; reformats line breaks |
| src/op_non_max_suppression.cpp | Removes unused include (no block size changes as this uses 1D blocks) |
| src/op_histogram.cpp | Removes unused includes (no block size changes) |
| src/op_gamma_contrast.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_flip.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_cvt_color.cpp | Replaces hardcoded (32,16) block size with dynamic calculation per color conversion type |
| src/op_custom_crop.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_copy_make_border.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_convert_to.cpp | Replaces hardcoded (64,16) block size with dynamic calculation; reformats function signatures |
| src/op_composite.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_bnd_box.cpp | Replaces hardcoded (32,32) block size with dynamic calculation |
| src/op_bilateral_filter.cpp | Replaces hardcoded (8,8) block size with dynamic calculation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #122 +/- ##
===========================================
+ Coverage 73.51% 73.56% +0.05%
===========================================
Files 77 77
Lines 2956 2996 +40
Branches 640 642 +2
===========================================
+ Hits 2173 2204 +31
- Misses 338 345 +7
- Partials 445 447 +2
🚀 New features to boost your workflow:
|
| @@ -1,2 +1,2 @@ | |||
| /** | |||
| Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. | |||
There was a problem hiding this comment.
Update (c) year to 2026 here and other places.
…aryvincze/rocCV into zv/optimization/dynamic-block-sizes
Review: [Optimization] Dynamic HIP kernel block sizesPerformance optimization for GPU kernel launches: What's changed:
Assessment: APPROVED - Performance improvement with no API changes. Already approved, this is a solid optimization. Removing hard-coded block sizes improves portability across different GPU architectures (MI series, RDNA, etc.). The approach of querying wavefront size at runtime is the right way to handle this. Nice cleanup of magic numbers throughout the codebase. |
PR Description
Notes