Third round of OpenCL maintenance for 5.6 by jenshannoschwalm · Pull Request #20383 · darktable-org/darktable

jenshannoschwalm · 2026-02-23T07:09:42Z

b4ea535 introduces dt_opencl_enqueue_kernel_2d_local_args() as an equivalent to dt_opencl_enqueue_kernel_2d_args() for improved readability
84ad991 makes use of above macro
In 1089656 we have various minor OpenCL kernel improvements with subtle performance gains or maintenance by using available functions.
9a7631e will help to investigate performance issues related to host<->cldevice memory interactions.

This macro with it's backend dt_opencl_enqueue_kernel_2d_local_args_internal() is used to call the kernel including locals, sizes and all kernel parameters for simplification.

Consequently using kernel calling _args() variants.

1. Use CLFARRAY for calc_Y0_mask (dual blend), denoiseprofile and hazeremoval for a simpler interface 2. Use clipf() macro in some more places 3. The weight function in atrous.cl gets some subtle performance boost by being an inline. Tested also for fast exp() variant, there seems to be no performance gain as the native function is equally performant. 4. Use OpenCL mix() function instead of _interpolatef() 5. Use two macros in rcd demosaicer 6. Make use of dt_fast_hypot()

1. For improved debugging of OpenCL performance we want information about clmem read/write/copy actions via the -d verbose switch 2. due to float->double conversion issues we sometimes had bad negative timings.

The maximum number of OpenCL events that can be handled within the driver/device is not exposed in the OpenCL API, in case this exceeds device-internal resources an error code would be returned which is handled and reported by darktable OpenCL interface. We still try to keep event resources within sensible limits to reduce stress. The per-device dt_opencl_device_t struct now only has a flag use_events, the max number of events is defined as DT_OPENCL_EVENTS and the log has been updated.

The roundup magics for width&height are mainly relevant for kernels called without locals as good values generally improve performance. Profiling these magics is simply not worth the effort, we can do a very good guess based on maximum workgroup size for the device. Tests to do this per kernel via dt_opencl_get_kernel_work_group_size() shows that the overhead decreases performance for those simple kernels so we go the easy way. Please note: We still write calculated data to the per-device conf for now to avoid confusion for people reading that conf but very likely we will go for a cl version bump for less options offered.

jenshannoschwalm · 2026-02-24T14:00:16Z

Two commits added
5. 95507ef simplifies the OpenCL handling conf options, no problems expected at all
6. 7d191f2 is another simplification of the per-device conf, the roundups just don't need to be exposed.

TurboGit

First pass, some comments.

What about the integration tests? Do you foresee some diff?

jenshannoschwalm · 2026-02-25T17:03:01Z

What about the integration tests? Do you foresee some diff?

Unfortunately my integration tests on latest fedora has significant problems and i simply don't understand enough python to did into this and the depending libraries.

But - no, i would concider any new difference as a regression/bug. The PR is again pretty large but it's principally only about the way we call OpenCL kernels for maintenance. The late 2 commits don't interfere with results but might give a subtle perf gain if the user didn't tune ...

TurboGit · 2026-02-25T17:38:24Z

Unfortunately my integration tests on latest fedora has significant problems and i simply don't understand enough python to did into this and the depending libraries.

No problem I have started the regression test. I'll report back.

TurboGit · 2026-02-25T18:39:39Z

All tests are OK with previous testsuite.

I have added a new check to find regressions on the count of diff count pixels between the CPU & GPU run. In this new testsuite we have a slight increase of the count for the following tests:

0064-demosaic-xtrans-vng
0066-demosaic-mark3
0068-rawdenoise-xtrans
0100-invert-xtrans
0145-lens-metadata-xtransIV-modversion-6
0165-demosaic-markesteijn-vng
0173-capture-dual-markesteijn

Anyway I have updated the baseline for pixel diff for now. We will be able to detect regressions now.

Fact is that we have some tests with a big pixel diff count (> 900000 on one test for example).

TurboGit

Thanks! And sorry for my stupid questions :)

jenshannoschwalm added this to the 5.6 milestone Feb 23, 2026

jenshannoschwalm added scope: codebase making darktable source code easier to manage OpenCL Related to darktable OpenCL code scope: debugging labels Feb 23, 2026

jenshannoschwalm force-pushed the opencl_maintenance_56_3 branch from 9a7631e to 78efa38 Compare February 23, 2026 15:44

jenshannoschwalm added 6 commits February 24, 2026 13:44

Introduce dt_opencl_enqueue_kernel_2d_local_args()

654de1b

This macro with it's backend dt_opencl_enqueue_kernel_2d_local_args_internal() is used to call the kernel including locals, sizes and all kernel parameters for simplification.

OpenCL kernel calling maintenance

d367af1

Consequently using kernel calling _args() variants.

Additional OpenCL clmem logs

dce21aa

1. For improved debugging of OpenCL performance we want information about clmem read/write/copy actions via the -d verbose switch 2. due to float->double conversion issues we sometimes had bad negative timings.

jenshannoschwalm force-pushed the opencl_maintenance_56_3 branch from 78efa38 to 7d191f2 Compare February 24, 2026 13:54

jenshannoschwalm added documentation: pending a documentation work is required release notes: pending labels Feb 24, 2026

TurboGit requested changes Feb 25, 2026

View reviewed changes

Comment thread data/kernels/atrous.cl

Comment thread src/common/bilateral.c

Comment thread data/kernels/demosaic_rcd.cl

TurboGit approved these changes Feb 25, 2026

View reviewed changes

TurboGit merged commit a433d9d into darktable-org:master Feb 25, 2026
9 of 10 checks passed

jenshannoschwalm deleted the opencl_maintenance_56_3 branch February 25, 2026 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Third round of OpenCL maintenance for 5.6#20383

Third round of OpenCL maintenance for 5.6#20383
TurboGit merged 6 commits intodarktable-org:masterfrom
jenshannoschwalm:opencl_maintenance_56_3

jenshannoschwalm commented Feb 23, 2026

Uh oh!

jenshannoschwalm commented Feb 24, 2026

Uh oh!

TurboGit left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenshannoschwalm commented Feb 25, 2026

Uh oh!

TurboGit commented Feb 25, 2026

Uh oh!

TurboGit commented Feb 25, 2026

Uh oh!

TurboGit left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jenshannoschwalm commented Feb 23, 2026

Uh oh!

jenshannoschwalm commented Feb 24, 2026

Uh oh!

TurboGit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jenshannoschwalm commented Feb 25, 2026

Uh oh!

TurboGit commented Feb 25, 2026

Uh oh!

TurboGit commented Feb 25, 2026

Uh oh!

TurboGit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants