Add tuning search based on CompileIQ by bernhardmgruber · Pull Request #9190 · NVIDIA/cccl

bernhardmgruber · 2026-05-29T16:30:30Z

This is mostly done by claude, trying to migrate the internal cub_tuning_evo scripts. This PR adds a simplified version using a single worker, running benchmarks on a single GPU.

Running:

mkdir build_tune & cd build_tune
cmake .. --preset cub-tuning
CUDA_VISIBLE_DEVICES=0 ../benchmarks/scripts/search_iq.py -R 'cub.bench.transform.babelstream.*' -a 'T{ct}=F32'
 ctk:  13.3.33
cccl:  v3.5.0.dev-121-g575176ff50
🧬 Generation:  0/50|░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░| [elapsed: 00:00 · eta: ?] Evaluating variant {'alg': 3, 'bif': 4, 'pref': 2, 'tpb': 768, 'unrl': 1, 'vsp2': 6}: 0.7940340093966018
Evaluating variant {'alg': 2, 'bif': 0, 'pref': 3, 'tpb': 128, 'unrl': 2, 'vsp2': 1}: 0.7870774540476999
Evaluating variant {'alg': 0, 'bif': -8, 'pref': 1, 'tpb': 640, 'unrl': 3, 'vsp2': 5}: 0.7340502424429781
Evaluating variant {'alg': 1, 'bif': -16, 'pref': 1, 'tpb': 384, 'unrl': 4, 'vsp2': 3}: 0.6930208162203446
Evaluating variant {'alg': 4, 'bif': 12, 'pref': 3, 'tpb': 1024, 'unrl': 2, 'vsp2': 2}: Build failed
Evaluating variant {'alg': 1, 'bif': -12, 'pref': 2, 'tpb': 896, 'unrl': 4, 'vsp2': 5}: 0.19958390512595525
Evaluating variant {'alg': 3, 'bif': 0, 'pref': 2, 'tpb': 128, 'unrl': 1, 'vsp2': 4}: 0.7767127690888875
Evaluating variant {'alg': 2, 'bif': 8, 'pref': 3, 'tpb': 640, 'unrl': 3, 'vsp2': 6}: 0.7872272803730863
Evaluating variant {'alg': 4, 'bif': -4, 'pref': 1, 'tpb': 768, 'unrl': 3, 'vsp2': 2}: Build failed
Evaluating variant {'alg': 0, 'bif': 16, 'pref': 1, 'tpb': 384, 'unrl': 4, 'vsp2': 1}: 0.7388724658257656
...

It's still a bit confusing, because after running, the database shows different results, but it looks like the score reported by analyze.py is just computed differently than the score passed to compile-iq.

$ ../benchmarks/scripts/analyze.py --top=100 cccl_meta_bench.db
cub.bench.transform.babelstream[T{ct}=F32]:
                                          variant     score      mins     means      maxs
9    bif_16.alg_2.tpb_768.unrl_3.pref_1.vsp2_6 ()  1.026887  1.000000  1.025528  1.200000
11    bif_8.alg_2.tpb_640.unrl_3.pref_3.vsp2_6 ()  1.026887  1.000000  1.025528  1.200000
6     bif_0.alg_2.tpb_128.unrl_2.pref_3.vsp2_1 ()  1.026641  1.000000  1.025313  1.200000
10    bif_4.alg_3.tpb_768.unrl_1.pref_2.vsp2_6 ()  1.025089  1.000000  1.023932  1.200000
7     bif_0.alg_3.tpb_128.unrl_1.pref_2.vsp2_4 ()  1.013059  0.999988  1.012499  1.200000
5     bif_0.alg_1.tpb_640.unrl_2.pref_2.vsp2_1 ()  1.012436  1.000000  1.011788  1.166667
0                                         base ()  1.000000  1.000000  1.000000  1.000000
1   bif_-12.alg_0.tpb_384.unrl_1.pref_3.vsp2_2 ()  0.962345  0.600000  0.963987  1.012048
8    bif_16.alg_0.tpb_384.unrl_4.pref_1.vsp2_1 ()  0.962344  0.600000  0.963986  1.012048
4    bif_-8.alg_0.tpb_640.unrl_3.pref_1.vsp2_5 ()  0.953269  0.600000  0.955280  1.012048
3   bif_-16.alg_1.tpb_384.unrl_4.pref_1.vsp2_3 ()  0.885473  0.500000  0.859326  1.001472
2   bif_-12.alg_1.tpb_896.unrl_4.pref_2.vsp2_5 ()  0.256298  0.048387  0.249873  0.409063

copy-pr-bot · 2026-05-29T16:30:33Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-31T20:21:34Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Added CompileIQ-based evolutionary search capability for algorithm optimization alongside brute-force search options
Bug Fixes
- Improved thread-safety handling in benchmark execution for multithreaded contexts
- Enhanced SQLite storage validation and cross-thread database access support

Walkthrough

Three changes add multithreaded benchmark execution support: ProcessRunner now guards signal handler registration to the main thread, SQLiteStorage validates thread-safety and enables cross-thread connection use, and a new CompileIQSeeker orchestrator selects between brute-force and evolutionary search strategies based on problem size.

Changes

Multithreaded Benchmark Infrastructure and Search

Layer / File(s)	Summary
ProcessRunner signal handler main-thread guard `benchmarks/scripts/cccl/bench/bench.py`	ProcessRunner.init imports threading and wraps signal.signal() calls to execute only on the main thread, preventing invalid signal registration in worker threads.
SQLiteStorage thread-safety validation and cross-thread support `benchmarks/scripts/cccl/bench/storage.py`	SQLiteStorage.init validates SQLite runtime threadsafety level requires serialized mode and sets check_same_thread=False on connection creation to enable safe cross-thread access.
CompileIQSeeker benchmark search orchestration `benchmarks/scripts/search_iq.py`	New benchmark driver script defines search-space and pool sizing helpers, builds an objective function that evaluates bench.Bench variants and filters failed/infinite results, and introduces CompileIQSeeker class that selects brute-force or evolutionary search based on estimated expected-run count.

Suggested reviewers

NaderAlAwar
pauleonix
elstehle

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

benchmarks/scripts/search_iq.py (1)

120-131: ⚡ Quick win

suggestion: Rename parameter or variable to clarify intent.

Line 125 passes num_rt_workloads to a parameter named num_objectives in get_num_expected_runs(). The function signature and iq_search() (line 83) use num_objectives=1, but the calculation here uses num_rt_workloads. Either rename the parameter in get_num_expected_runs() to reflect its actual usage, or clarify the relationship between RT workloads and the expected-runs heuristic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 75cb46dd-cbc5-44dd-9b6d-7dd9bc33fb9b

📥 Commits

Reviewing files that changed from the base of the PR and between ee20627 and 09ec91a.

📒 Files selected for processing (3)

benchmarks/scripts/cccl/bench/bench.py
benchmarks/scripts/cccl/bench/storage.py
benchmarks/scripts/search_iq.py

coderabbitai · 2026-05-31T20:21:37Z

+        )
+
+        if score == float("inf") or score == float("-inf"):
+            print("Infinite store")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

important: Fix typo "Infinite store" → "Infinite score".

Line 60 prints "Infinite store" but should print "Infinite score" to match the condition being checked.

bernhardmgruber · 2026-05-31T20:29:46Z

ok, here is a confusing bit. analyze.py shows:

                            variant     score      mins     means      maxs
0                           base ()  1.000000  1.000000  1.000000  1.000000
1  bif_-12.tpb_256.pref_2.vsp2_2 ()  0.999754  0.984375  0.999780  1.023256
2   bif_-4.tpb_768.pref_2.vsp2_6 ()  0.997953  0.753906  0.997823  1.166667
3    bif_0.tpb_384.pref_1.vsp2_5 ()  0.996448  0.753906  0.996489  1.166667
4   bif_12.tpb_384.pref_3.vsp2_4 ()  0.996448  0.753906  0.996489  1.166667
5   bif_16.tpb_640.pref_3.vsp2_1 ()  0.996416  0.753906  0.996455  1.166667
6    bif_8.tpb_896.pref_1.vsp2_3 ()  0.977935  0.753906  0.978535  1.023256

Yet, the scores reported to CompileIQ are (ordered by variant as the list above):

Evaluating variant {'bif': -12, 'pref': 2, 'tpb': 256, 'vsp2': 2}: 0.7747535944721688
Evaluating variant {'bif': -4, 'pref': 2, 'tpb': 768, 'vsp2': 6}: 0.7833624486057084
Evaluating variant {'bif': 0, 'pref': 1, 'tpb': 384, 'vsp2': 5}: 0.7720734255442793
Evaluating variant {'bif': 12, 'pref': 3, 'tpb': 384, 'vsp2': 4}: 0.7722660680432303
Evaluating variant {'bif': 16, 'pref': 3, 'tpb': 640, 'vsp2': 1}: 0.7722450212177555
Evaluating variant {'bif': 8, 'pref': 1, 'tpb': 896, 'vsp2': 3}: 0.7580094070089615

analyze.py reports the score monotonically decreasing (highest score first). But the same order of benchmarked variants does neither monotonically increase or decrease, suggesting that the analysis score is not isomorphic to the CompileIQ score. This is either a bug in the score computation or beyond my understanding of the tuning framework.

@gevtushenko as the author of the tuning framework, I kindly ask for an explanation for this observation.

oleksandr-pavlyk · 2026-06-02T15:12:29Z

+def pool_cull_sizes(num_genes, num_objectives, variant_space_size, cull=0.75):
+    min_pool_size = 128 if variant_space_size > 10000 else 32
+    target = (2 * num_objectives) + 1
+    poolsize = int(target / (1 - cull))


I would clamp cull value to between 0.05 and 0.95 (or some boundaries within the unit interval) or validate this argument and raise ValueError is not found within this range.

oleksandr-pavlyk · 2026-06-02T15:14:36Z

+    target = (2 * num_objectives) + 1
+    poolsize = int(target / (1 - cull))
+    poolsize = max(max(poolsize, min_pool_size), 2 * num_genes)
+    poolsize = poolsize if poolsize % 2 == 0 else poolsize + 1


Nit:

Suggested change

poolsize = poolsize if poolsize % 2 == 0 else poolsize + 1

poolsize = poolsize + (poolsize % 2)

Or use ((poolsize + 1) // 2) * 2.

oleksandr-pavlyk · 2026-06-02T15:15:26Z

+    poolsize = max(max(poolsize, min_pool_size), 2 * num_genes)
+    poolsize = poolsize if poolsize % 2 == 0 else poolsize + 1
+    cullsize = int(poolsize * cull)
+    cullsize = cullsize if cullsize % 2 == 0 else cullsize - 1


Nit:

Suggested change

cullsize = cullsize if cullsize % 2 == 0 else cullsize - 1

cullsize = cullsize - (cullsize % 2)

or cullsize = (cullsize // 2) * 2.

oleksandr-pavlyk · 2026-06-02T15:21:48Z

+    for rng in parameter_space:
+        search_space[rng.label] = ss.range(
+            start=rng.low, end=rng.high - 1, step=rng.step
+        )


Variable rng evokes association with "random number generator", while range is likely intended. Since range is the built-in keyword. Can we use search_range instead?

Suggested change

for rng in parameter_space:

search_space[rng.label] = ss.range(

start=rng.low, end=rng.high - 1, step=rng.step

)

for search_range in parameter_space:

search_space[search_range.label] = ss.range(

start=search_range.low, end=search_range.high - 1, step=search_range.step

)

oleksandr-pavlyk · 2026-06-02T15:22:30Z

+        for rng in parameter_space:
+            value = int(config[rng.label])
+            range_points.append(bench.RangePoint(rng.definition, rng.label, value))


Ditto here. I would prefer search_range over rng, please.

Add tuning search based on CompileIQ

09ec91a

github-project-automation Bot added this to CCCL May 29, 2026

github-project-automation Bot moved this to Todo in CCCL May 29, 2026

cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL May 29, 2026

bernhardmgruber marked this pull request as ready for review May 31, 2026 20:15

bernhardmgruber requested a review from a team as a code owner May 31, 2026 20:15

bernhardmgruber requested a review from oleksandr-pavlyk May 31, 2026 20:16

cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL May 31, 2026

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

oleksandr-pavlyk reviewed Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tuning search based on CompileIQ#9190

Add tuning search based on CompileIQ#9190
bernhardmgruber wants to merge 1 commit into
NVIDIA:mainfrom
bernhardmgruber:compile_iq

bernhardmgruber commented May 29, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

bernhardmgruber commented May 31, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	poolsize = poolsize if poolsize % 2 == 0 else poolsize + 1
	poolsize = poolsize + (poolsize % 2)

	cullsize = cullsize if cullsize % 2 == 0 else cullsize - 1
	cullsize = cullsize - (cullsize % 2)

Conversation

bernhardmgruber commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 31, 2026

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber commented May 31, 2026

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

oleksandr-pavlyk Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bernhardmgruber commented May 29, 2026 •

edited

Loading