Allow public tuning of cub::DeviceAdjacentDifference#9218
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
important: Walkthrough This PR moves adjacent-difference tuning from internal detail namespace to public API by introducing cub::AdjacentDifferencePolicy, wiring it through policy selectors and dispatch, updating docs and benchmarks, and adding compile-time and runtime tests. important: Changes Productize AdjacentDifferencePolicy tuning API
important: Assessment against linked issues
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cub/cub/device/dispatch/dispatch_adjacent_difference.cuh (1)
347-360:⚠️ Potential issue | 🟠 Major | ⚡ Quick winimportant: Validate externally supplied policy dimensions before using them.
Now that
AdjacentDifferencePolicyis public,policy_selector(cc)can legally hand back arbitrary values here.threads_per_block <= 0oritems_per_thread <= 0will flow straight intotile_size,::cuda::ceil_div, and the triple-chevron launch config, which turns a bad public policy into divide-by-zero or invalid launch parameters. Guard these fields and fail with a clearcudaErrorInvalidValuebefore computingnum_tilesor launching the kernel.Also applies to: 430-444
🧹 Nitpick comments (1)
cub/test/catch2_test_device_adjacent_difference_env_api.cu (1)
10-10: ⚡ Quick winsuggestion: In
cub/test/catch2_test_device_adjacent_difference_env_api.cu(line 10), prefer#include <cuda/execution.tune.h>over<cuda/__execution/tune.h>since the public header already includes/forwards the private implementation that definescuda::execution::tune.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bdcbdb88-fbf2-4d4e-a2b1-894e542c3138
📒 Files selected for processing (6)
cub/benchmarks/bench/adjacent_difference/subtract_left.cucub/cub/device/device_adjacent_difference.cuhcub/cub/device/dispatch/dispatch_adjacent_difference.cuhcub/cub/device/dispatch/tuning/tuning_adjacent_difference.cuhcub/test/catch2_test_device_adjacent_difference_env.cucub/test/catch2_test_device_adjacent_difference_env_api.cu
| //! .. literalinclude:: ../../../cub/test/catch2_test_device_adjacent_difference_env_api.cu | ||
| //! :language: c++ | ||
| //! :dedent: | ||
| //! :start-after: example-begin subtract-left-copy-policy-selector | ||
| //! :end-before: example-end subtract-left-copy-policy-selector | ||
| //! | ||
| //! .. literalinclude:: ../../../cub/test/catch2_test_device_adjacent_difference_env_api.cu | ||
| //! :language: c++ | ||
| //! :dedent: | ||
| //! :start-after: example-begin subtract-left-copy-tuning | ||
| //! :end-before: example-end subtract-left-copy-tuning |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does C++17 support designated initializers like T{ .field = value }, or is that a C++20-only feature?
💡 Result:
Designated initializers, using the syntax T{.field = value }, are a C++20-only feature [1][2][3][4]. They are not part of the C++17 standard [1]. While some compilers (such as GCC and Clang) provided designated initializers as a non-standard language extension long before C++20 [1][5][6], they were not officially supported in C++17 [1]. If you use this syntax in C++17 mode, you are relying on compiler-specific extensions rather than standard C++ [1]. Compilers often allow this code to compile by default, but it may trigger errors or warnings if you compile with strict standard compliance flags (such as -Wpedantic or -pedantic) [1][2]. In contrast, C++20 formally introduced designated initializers for aggregate types [2][3][4]. There are specific differences between the C++20 implementation and C99 designated initializers: for instance, C++20 requires that members be initialized in their declaration order, whereas C99 allows arbitrary order [3][4].
Citations:
- 1: https://stackoverflow.com/questions/69009389/how-to-get-away-with-using-designated-initializers-in-c17-or-why-is-it-seemi
- 2: https://clang.llvm.org/extra/clang-tidy/checks/modernize/use-designated-initializers.html
- 3: https://mropert.github.io/2026/01/15/designed_initializers/
- 4: https://pkglog.com/en/blog/cpp-designated-initializer/
- 5: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0329r0.pdf
- 6: https://news.ycombinator.com/item?id=48228608
important: The @par Tuning docs in cub/cub/device/device_adjacent_difference.cuh include the subtract-left-copy-policy-selector example from cub/test/catch2_test_device_adjacent_difference_env_api.cu, which uses C++20 designated initializers (return {.threads_per_block = ...}). Copy-pasting this into a C++17 project will fail under standard-conforming builds; update the snippet to use C++17-compatible aggregate initialization (e.g., positional initialization) or mirror the existing _CCCL_STD_VER >= 2020 fallback.
| template <typename T> | ||
| concept adjacent_difference_policy_selector = policy_selector<T, adjacent_difference_policy>; | ||
| concept adjacent_difference_policy_selector = policy_selector<T, AdjacentDifferencePolicy>; | ||
| #endif // _CCCL_HAS_CONCEPTS() |
There was a problem hiding this comment.
important: Constrain the public selector concept to stateless selectors.
adjacent_difference_policy_selector accepts selectors that return AdjacentDifferencePolicy, but the device path still requires ::cuda::std::is_empty_v<PolicySelector> later in DeviceAdjacentDifferenceDifferenceKernel. With the API now public, that means a stateful selector passes the front-door constraint and then fails deep in dispatch. Fold the emptiness requirement into this concept so unsupported selectors are rejected at the API boundary instead. Based on learnings, policy selectors used at the CUB device API layer are stateless.
🥳 CI Workflow Results🟩 Finished in 1h 16m: Pass: 100%/284 | Total: 3d 17h | Max: 1h 04m | Hits: 64%/328568See results here. |
DeviceAdjacentDifferencecub::DeviceAdjacentDifference
Fixes: #8567
New public entities
cub::AdjacentDifferencePolicyint threads_per_blockint items_per_threadBlockLoadAlgorithm load_algorithmCacheLoadModifier load_modifierBlockStoreAlgorithm store_algorithm