Some OpenCL pixelpipe reorganising for efficacy#21483
Open
jenshannoschwalm wants to merge 1 commit into
Open
Conversation
Whenever we have to OpenCL transform colorspace for the module's input we ensure it's written back to host memory, the cacheline dsc is updated and the fast blending cache is cleared. By doing so input cacheline invalidations are not required any more. As we also want input cl_mem data to be available as a host bound cacheline if there are importance hints - we do this to increase cache hit rate for UI responsiveness - or if we later have to go into tiling mode, we do an early test for these cases and combine device to host copy with those for colorspace transforms for fewer costly calls. Activate DT_PIPE_CAS_SHUTDOWN, now we only can set shutdown mode once per pixelpipe run. Also some shutdown log improvements, we want this via -d pipe only if relevant & new, otherwise the -d verbose switch must be used. Some overhaul of _module_pipe_stop() and it's callers as being sure about write backs. For some shutdown modes we would like to restart a OpenCL pipe at the stopping module. This requires further information for the caller so we now return dt_dev_stoptest_t instead of a gboolean. Fixes: The pixelpipe CPU path missed the _module_pipe_stop() In pixelpipe cache code we missed setting the DT_INVALID_HASH while freeing invalid cachelines resulting in less efficient use of cachelines. We also allocate cachelines with a size aligned to 4k for slightly improved perf. Also some log improvements here. Some maintenance for less parameters and readability - get_output_format() doesn't need dev as parameter - _piece_process_hash() renamed to _bcache_hash() for clarity, removed 'module' parameter - _piece_fast_blend() got the 'module' parameter removed - introduced and use _copy_image_to_host_err() for readability. To be investigated in OpenCL pipe: We currently request a host-bound cacheline for every module leading to lots of alloc-here and invalidate-in-next-module if there was no write back to host cacheline. Ideas: 1. late cacheline allocation? 2. propagate unused cacheline to next module?
98165e5 to
f52bcc4
Compare
Collaborator
|
@jenshannoschwalm: Thanks for your trust. In Hungarian, we have a saying: "magas nekem, mint malacnak a zsiráfvályú" (it's as high (= over my head) for me as a giraffe's feeding trough is for a piglet). On my best days, I can follow the paths that the code reviewer agents find. Today, it's not one of those days: I'm really sleep deprived and have a headache. I'll send you the findings once they are done. Maybe tomorrow I'll have a chance to do more. :-( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Whenever we have to OpenCL transform colorspace for the module's input we ensure it's written back to host memory, the cacheline dsc is updated and the fast blending cache is cleared. By doing so input cacheline invalidations are not required any more.
As we also want input cl_mem data to be available as a host bound cacheline if there are importance hints - we do this to increase cache hit rate for UI responsiveness - or if we later have to go into tiling mode, we do an early test for these cases and combine device to host copies with those for colorspace transforms (as in 1) for fewer costly copy calls.
Activate
DT_PIPE_CAS_SHUTDOWN, now we can set shutdown mode only once per pixelpipe run. Also some shutdown log improvements, we want this via-d pipeonly if relevant & new, otherwise the-d verboseswitch must be used.Some overhaul of
_module_pipe_stop()and it's callers as being sure about write backs (1+2). For some shutdown modes we would like to restart a OpenCL pipe at the stopping module. This requires further information for the caller so we now returndt_dev_stoptest_tinstead of a gboolean.Fixes:
_module_pipe_stop().DT_INVALID_HASHwhile freeing invalid cachelines resulting in less efficient use of cachelines. We also allocate cachelines with a size aligned to 4k for slightly improved perf. Also some log improvements here.get_output_format()doesn't need dev as parameter_piece_process_hash()renamed to_bcache_hash()for clarity, removed 'module' parameter_piece_fast_blend()got the 'module' parameter removed_copy_image_to_host_err()for readability.Overall a clearly improved UI responsiveness as the pixelpipe internal cacheline invalidations have gone and we have a better cache-hit rate.
@kofa73 i know you have been very ionterested in pixelpipe code efficiacy and bugs. I would appreciate another in-depth review and test :-)
@masterpiga this should fully ensure valid cacheline :-)