[Investigation PR]: Improving performance of image sampling in Vello Hybrid#1547
[Investigation PR]: Improving performance of image sampling in Vello Hybrid#1547
Conversation
This reverts commit 4aa02e9.
|
Wow, thanks for digging into this! However, I'm still curious whether this precludes the other optimizations I tried. I'm wondering, have you tried, on top of strategy 2, to see what numbers you get when:
I'm still curious if this gives even more performance, because in my experiments that did clearly also haven an impact. 🤔 |
This didn't have much impact when I tried it on top of #2. But, I think it should be verified.
About 10-20% improvement in performance on top of #2 (edit: this is overcome by 58ea8fe) |
|
What size is your source image here, and what size are you rendering it at? Do these numbers vary for:
|
Good questions. These are likely good benchmarks to add to |
In #1517, there is some uncertainty about direction for improving bilinear image sampling. I performed an investigation into bilinear image sampling to try understand bottlenecks. My conclusions are that the bottlenecks identified and previously discussed are slightly off the mark.
Baseline
I used #1493 as the baseline. It produced these values on my Samsung A05s.
source image
Strategy 1: Don't query texture dimensions per pixel + simplify its calculation + move per-pixel calcs to vertex
Using #1517 as a base, in 36e3f3e, we simply ensured atlas dimensions were a power of 2 (to remove a division) and moved per-pixel calculations to the vertex shader.
This yielded a 30-40% improvement across the board against the control.
source image
Strategy 2: Remove branching
IMO, I wasn't sure why we pay the runtime cost of branching on image quality in the shader of Vello Hybrid - aren't most consumers wanting bilinear sampling only? I wondered whether it make sense to strip out the branching from consumers who only use bilinear sampling. This improved performance by 2x again!. See b50aaee.
The idea here would be to add a feature flag to Hybrid to build-time remove this branching from the shader.
source image
Note: I tried removing the
extend_modefunction from the shader and didn't see much improvement after #2.Extra: 58ea8fe
Simply returning
final_colorsaw further performance improvement (the same as if tinting logic was commented out).