Skip to content

Control unrolling in cub::DeviceMerge[Sort] via tuning #9181

Open
bernhardmgruber wants to merge 5 commits into
NVIDIA:mainfrom
bernhardmgruber:merge_sort_unroll
Open

Control unrolling in cub::DeviceMerge[Sort] via tuning #9181
bernhardmgruber wants to merge 5 commits into
NVIDIA:mainfrom
bernhardmgruber:merge_sort_unroll

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented May 29, 2026

  • Test locally that setting different unroll values actually has an effect

Fixes: #9052

@bernhardmgruber bernhardmgruber requested a review from a team as a code owner May 29, 2026 09:58
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 29, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 29, 2026
int NumThreads,
int ItemsPerThread,
typename SynchronizationPolicy,
bool _Unroll = true>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco can I use this spelling to denote an "internal" parameter on a public API?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the overload without the defaulted template argument will always be valid

Comment on lines +98 to +104
// RAPIDS cuDF needs to avoid unrolling some loops in sort to prevent compile time issues
#if defined(CCCL_AVOID_SORT_UNROLL)
bool unroll = false;
#else // CCCL_AVOID_SORT_UNROLL
bool unroll = true;
#endif // CCCL_AVOID_SORT_UNROLL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we should deprecate CCCL_AVOID_SORT_UNROLL at some point, but let's keep it for a while to ease the transition for cuDF.

Comment thread cub/cub/thread/thread_sort.cuh
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configurable loop unrolling control for merge sort and merge operations, enabling fine-grained performance tuning.
  • Tests

    • Added comprehensive test coverage validating merge and merge sort functionality with loop unrolling disabled.

important:

Walkthrough

This PR exposes loop-unrolling control as a boolean tuning parameter for DeviceMerge/DeviceMergeSort and threads it through policies, thread/block implementations, agent wiring, removes the old unroll macro, and adds tests for no-unroll mode.

Changes

Merge Sort Unrolling Control

Layer / File(s) Summary
Tuning policy unroll member
cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh, cub/cub/device/dispatch/tuning/tuning_merge.cuh
merge_sort_policy and merge_policy gain unroll members with defaults; equality and stream output operators updated.
Thread-level odd-even sort helper
cub/cub/thread/thread_sort.cuh
Internal detail::stable_odd_even_sort<Unroll> template extracts stable odd-even sorting; public StableOddEvenSort delegates to it.
Block merge sort unroll parameter and helpers
cub/cub/block/block_merge_sort.cuh
BlockMergeSortStrategy and BlockMergeSort gain _Unroll template parameter; added detail::serial_merge<Unroll>; loop pragmas and internal calls (stable sort, merge, last-tile fill) use _Unroll.
Agent dispatch wiring
cub/cub/agent/agent_merge.cuh, cub/cub/agent/agent_merge_sort.cuh
Agent templates gain Unroll parameter and agent/block instantiations / serial-merge calls thread policy.unroll into detail::serial_merge<...>.
Remove deprecated macro
cub/cub/util_macro.cuh
Removed _CCCL_SORT_MAYBE_UNROLL() macro.
Dispatch formatting changes
cub/cub/device/dispatch/dispatch_merge.cuh
Template alias line-wrapping/indentation reformatted; no semantic changes.
Test no-unroll tuning
cub/test/catch2_test_device_merge_env.cu, cub/test/catch2_test_device_merge_sort_env.cu
Added no_unroll_tuning callable and device tests validating DeviceMerge and DeviceMergeSort behave correctly with unrolling disabled.

Assessment against linked issues

Objective Addressed Explanation
Add unroll flag to merge sort tuning policy [#9052]
Replace CCCL_AVOID_SORT_UNROLL with tuning control [#9052]
Provide standard API interface for unroll control [#9052]

Suggested reviewers

  • srinivasyadav18
  • oleksandr-pavlyk
  • elstehle

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
cub/test/catch2_test_device_merge_sort_env.cu (1)

569-578: ⚡ Quick win

suggestion: switch this test to C2H_TEST (like the other tuning tests in this section) so it exercises the launch-wrapper/env path consistently; keeping it as TEST_CASE reduces coverage for regressions in tuned dispatch wiring.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 566756bd-cfec-4d90-b2b8-0e7f6cd77579

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1fb3b and b35115f.

📒 Files selected for processing (6)
  • cub/cub/agent/agent_merge_sort.cuh
  • cub/cub/block/block_merge_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/util_macro.cuh
  • cub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
  • cub/cub/util_macro.cuh

Comment thread cub/cub/block/block_merge_sort.cuh Outdated
Comment thread cub/cub/block/block_merge_sort.cuh
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
cub/cub/block/block_merge_sort.cuh (1)

110-111: 💤 Low value

suggestion: Per coding guidelines, free function calls should be fully qualified from the global namespace. Here and at lines 461, 518, the detail:: calls should use ::cub::detail::.

cub/cub/agent/agent_merge_sort.cuh (1)

558-566: 💤 Low value

suggestion: Per coding guidelines, this should be ::cub::detail::serial_merge<policy.unroll> for full qualification.

cub/cub/agent/agent_merge.cuh (1)

226-234: 💤 Low value

suggestion: Per coding guidelines, use ::cub::detail::serial_merge<Unroll> with leading ::.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d1776c12-fedb-49e8-ad1b-d6af8d9dd8bf

📥 Commits

Reviewing files that changed from the base of the PR and between 65a3115 and cfbe940.

📒 Files selected for processing (10)
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/agent/agent_merge_sort.cuh
  • cub/cub/block/block_merge_sort.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/util_macro.cuh
  • cub/test/catch2_test_device_merge_env.cu
  • cub/test/catch2_test_device_merge_sort_env.cu
💤 Files with no reviewable changes (1)
  • cub/cub/util_macro.cuh
✅ Files skipped from review due to trivial changes (1)
  • cub/cub/device/dispatch/dispatch_merge.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
  • cub/test/catch2_test_device_merge_sort_env.cu
  • cub/cub/thread/thread_sort.cuh
  • cub/cub/device/dispatch/tuning/tuning_merge_sort.cuh

@bernhardmgruber bernhardmgruber changed the title Control unrolling in cub::DeviceMergeSort via tuning Control unrolling in cub::DeviceMerge[Sort] via tuning May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 2h 43m: Pass: 97%/285 | Total: 16d 04h | Max: 2h 33m | Hits: 12%/1261524

See results here.

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

bernhardmgruber commented Jun 1, 2026

Ah yes, GCC7 is a bit limited in the constexpr evaluator

cc_dispatch.cuh:81:55: internal compiler error: in cxx_eval_bit_field_ref, at cp/constexpr.c:2578

int NumThreads,
int ItemsPerThread,
typename SynchronizationPolicy,
bool _Unroll = true>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if the overload without the defaulted template argument will always be valid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Expose unrolling control in cub::DeviceMergeSort to tuning

2 participants