Skip to content

Some OpenCL pixelpipe reorganising for efficacy#21483

Open
jenshannoschwalm wants to merge 1 commit into
darktable-org:masterfrom
jenshannoschwalm:pixelpipe_cache_validations
Open

Some OpenCL pixelpipe reorganising for efficacy#21483
jenshannoschwalm wants to merge 1 commit into
darktable-org:masterfrom
jenshannoschwalm:pixelpipe_cache_validations

Conversation

@jenshannoschwalm

Copy link
Copy Markdown
Collaborator
  1. Whenever we have to OpenCL transform colorspace for the module's input we ensure it's written back to host memory, the cacheline dsc is updated and the fast blending cache is cleared. By doing so input cacheline invalidations are not required any more.

  2. As we also want input cl_mem data to be available as a host bound cacheline if there are importance hints - we do this to increase cache hit rate for UI responsiveness - or if we later have to go into tiling mode, we do an early test for these cases and combine device to host copies with those for colorspace transforms (as in 1) for fewer costly copy calls.

  3. Activate DT_PIPE_CAS_SHUTDOWN, now we can set shutdown mode only once per pixelpipe run. Also some shutdown log improvements, we want this via -d pipe only if relevant & new, otherwise the -d verbose switch must be used.

  4. Some overhaul of _module_pipe_stop() and it's callers as being sure about write backs (1+2). For some shutdown modes we would like to restart a OpenCL pipe at the stopping module. This requires further information for the caller so we now return dt_dev_stoptest_t instead of a gboolean.

  5. Fixes:

  • The pixelpipe CPU path missed the _module_pipe_stop().
  • In pixelpipe cache code we missed setting the DT_INVALID_HASH while freeing invalid cachelines resulting in less efficient use of cachelines. We also allocate cachelines with a size aligned to 4k for slightly improved perf. Also some log improvements here.
  1. Some maintenance for less parameters and readability
  • get_output_format() doesn't need dev as parameter
  • _piece_process_hash() renamed to _bcache_hash() for clarity, removed 'module' parameter
  • _piece_fast_blend() got the 'module' parameter removed
  • introduced and use _copy_image_to_host_err() for readability.

Overall a clearly improved UI responsiveness as the pixelpipe internal cacheline invalidations have gone and we have a better cache-hit rate.

@kofa73 i know you have been very ionterested in pixelpipe code efficiacy and bugs. I would appreciate another in-depth review and test :-)
@masterpiga this should fully ensure valid cacheline :-)

@jenshannoschwalm jenshannoschwalm added this to the 5.8 milestone Jul 3, 2026
@jenshannoschwalm jenshannoschwalm requested a review from kofa73 July 3, 2026 18:26
@jenshannoschwalm jenshannoschwalm added bugfix pull request fixing a bug scope: performance doing everything the same but faster scope: codebase making darktable source code easier to manage release notes: pending OpenCL Related to darktable OpenCL code labels Jul 3, 2026
@jenshannoschwalm jenshannoschwalm changed the title Some OpenCL pixelpipe reorganize for efficacy Some OpenCL pixelpipe reorganising for efficacy Jul 3, 2026
Whenever we have to OpenCL transform colorspace for the module's input we ensure it's
written back to host memory, the cacheline dsc is updated and the fast blending cache
is cleared.
By doing so input cacheline invalidations are not required any more.

As we also want input cl_mem data to be available as a host bound cacheline if there
are importance hints - we do this to increase cache hit rate for UI responsiveness
- or if we later have to go into tiling mode, we do an early test for these cases and
combine device to host copy with those for colorspace transforms for fewer costly calls.

Activate DT_PIPE_CAS_SHUTDOWN, now we only can set shutdown mode once per pixelpipe run.
Also some shutdown log improvements, we want this via -d pipe only if relevant & new,
otherwise the -d verbose switch must be used.

Some overhaul of _module_pipe_stop() and it's callers as being sure about write backs.
For some shutdown modes we would like to restart a OpenCL pipe at the stopping module.
This requires further information for the caller so we now return dt_dev_stoptest_t
instead of a gboolean.

Fixes:
The pixelpipe CPU path missed the _module_pipe_stop()

In pixelpipe cache code we missed setting the DT_INVALID_HASH while freeing invalid
cachelines resulting in less efficient use of cachelines.
We also allocate cachelines with a size aligned to 4k for slightly improved perf.
Also some log improvements here.

Some maintenance for less parameters and readability
- get_output_format() doesn't need dev as parameter
- _piece_process_hash() renamed to _bcache_hash() for clarity, removed 'module' parameter
- _piece_fast_blend() got the 'module' parameter removed
- introduced and use _copy_image_to_host_err() for readability.

To be investigated in OpenCL pipe:
We currently request a host-bound cacheline for every module leading to lots of alloc-here
and invalidate-in-next-module if there was no write back to host cacheline.
Ideas: 1. late cacheline allocation? 2. propagate unused cacheline to next module?
@jenshannoschwalm jenshannoschwalm force-pushed the pixelpipe_cache_validations branch from 98165e5 to f52bcc4 Compare July 4, 2026 05:15
@kofa73

kofa73 commented Jul 4, 2026

Copy link
Copy Markdown
Collaborator

@jenshannoschwalm: Thanks for your trust. In Hungarian, we have a saying: "magas nekem, mint malacnak a zsiráfvályú" (it's as high (= over my head) for me as a giraffe's feeding trough is for a piglet). On my best days, I can follow the paths that the code reviewer agents find. Today, it's not one of those days: I'm really sleep deprived and have a headache. I'll send you the findings once they are done. Maybe tomorrow I'll have a chance to do more. :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix pull request fixing a bug OpenCL Related to darktable OpenCL code release notes: pending scope: codebase making darktable source code easier to manage scope: performance doing everything the same but faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants