[Investigation PR]: Improving performance of image sampling in Vello Hybrid by taj-p · Pull Request #1547 · linebender/vello

taj-p · 2026-03-28T20:22:04Z

In #1517, there is some uncertainty about direction for improving bilinear image sampling. I performed an investigation into bilinear image sampling to try understand bottlenecks. My conclusions are that the bottlenecks identified and previously discussed are slightly off the mark.

Baseline

I used #1493 as the baseline. It produced these values on my Samsung A05s.

Benchmark	Latency
200 Rect - 200×200 - Image - Nearest	127.23 ms/f (9 iters)
200 Rect - 200×200 - Image - Bilinear	144.63 ms/f (7 iters)
200 Rect - 200×200 - Opaque Image - Nearest	120.22 ms/f (9 iters)
200 Rect - 200×200 - Opaque Image - Bilinear	143.74 ms/f (7 iters)
200 Rect - 200×200 - Opaque Image (draw_image) - Nearest	105.75 ms/f (10 iters)
200 Rect - 200×200 - Opaque Image (draw_image) - Bilinear	104.19 ms/f (10 iters)

source image

Strategy 1: Don't query texture dimensions per pixel + simplify its calculation + move per-pixel calcs to vertex

Using #1517 as a base, in 36e3f3e, we simply ensured atlas dimensions were a power of 2 (to remove a division) and moved per-pixel calculations to the vertex shader.

This yielded a 30-40% improvement across the board against the control.

Benchmark	Latency	Improvement
200 Rect - 200×200 - Image - Nearest	76.45 ms/f (15 iters)	-35.9%
200 Rect - 200×200 - Image - Bilinear	71.47 ms/f (14 iters)	-48.6%
200 Rect - 200×200 - Opaque Image - Nearest	70.11 ms/f (15 iters)	-39.5%
200 Rect - 200×200 - Opaque Image - Bilinear	72.89 ms/f (15 iters)	-48.6%
200 Rect - 200×200 - Opaque Image (draw_image) - Nearest	70.07 ms/f (15 iters)	-33.9%
200 Rect - 200×200 - Opaque Image (draw_image) - Bilinear	69.81 ms/f (15 iters)	-32.7%

source image

Strategy 2: Remove branching

IMO, I wasn't sure why we pay the runtime cost of branching on image quality in the shader of Vello Hybrid - aren't most consumers wanting bilinear sampling only? I wondered whether it make sense to strip out the branching from consumers who only use bilinear sampling. This improved performance by 2x again!. See b50aaee.

The idea here would be to add a feature flag to Hybrid to build-time remove this branching from the shader.

Benchmark	Latency	Improvement
200 Rect - 200×200 - Image - Nearest	43.15 ms/f (30 iters)	-63.8%
200 Rect - 200×200 - Image - Bilinear	34.97 ms/f (23 iters)	-74.9%
200 Rect - 200×200 - Opaque Image - Nearest	34.86 ms/f (33 iters)	-69.9%
200 Rect - 200×200 - Opaque Image - Bilinear	37.35 ms/f (27 iters)	-73.7%
200 Rect - 200×200 - Opaque Image (draw_image) - Nearest	36.18 ms/f (30 iters)	-65.9%
200 Rect - 200×200 - Opaque Image (draw_image) - Bilinear	34.29 ms/f (31 iters)	-67.0%

source image

Note: I tried removing the extend_mode function from the shader and didn't see much improvement after #2.

Extra: `58ea8fe`

Simply returning final_color saw further performance improvement (the same as if tinting logic was commented out).

This reverts commit 4aa02e9.

…ture

LaurenzV · 2026-03-28T20:33:05Z

Wow, thanks for digging into this! However, I'm still curious whether this precludes the other optimizations I tried. I'm wondering, have you tried, on top of strategy 2, to see what numbers you get when:

You remove extend calculations and use the transparent padding
You remove the image tinting calculations

I'm still curious if this gives even more performance, because in my experiments that did clearly also haven an impact. 🤔

taj-p · 2026-03-28T20:44:51Z

Wow, thanks for digging into this! However, I'm still curious whether this precludes the other optimizations I tried. I'm wondering, have you tried, on top of strategy 2, to see what numbers you get when:

You remove extend calculations and use the transparent padding

You remove the image tinting calculations

I'm still curious if this gives even more performance, because in my experiments that did clearly also haven an impact. 🤔

You remove extend calculations and use the transparent padding

This didn't have much impact when I tried it on top of #2. But, I think it should be verified.

You remove the image tinting calculations

About 10-20% improvement in performance on top of #2 (edit: this is overcome by 58ea8fe)

taj-p · 2026-03-28T20:49:08Z

cc @LaurenzV

58ea8fe removes the performance cost of tinting. In fact, it's faster than commenting out tinting for some suites.

edit: Unsure if related to tinting or simply a means to improve performance generally.

nicoburns · 2026-03-29T15:04:59Z

What size is your source image here, and what size are you rendering it at? Do these numbers vary for:

Downsampling vs. upsampling vs. exact size match?
Different ratio's between source and output size?

taj-p · 2026-03-30T22:54:50Z

What size is your source image here, and what size are you rendering it at? Do these numbers vary for:

Downsampling vs. upsampling vs. exact size match?

Different ratio's between source and output size?

Good questions. These are likely good benchmarks to add to vello_bench2 at some point. We're rendering at device screen size I believe

metaeaux and others added 10 commits March 20, 2026 10:25

bilinear texture sampling with textureSample instead of textureLoad

e5a4a07

test 0.0 offset in CI

4aa02e9

Revert "test 0.0 offset in CI"

83c7261

This reverts commit 4aa02e9.

simplify

3e37929

fmt

d2ca24f

webgl: split texture_array and texture path

7d29b89

hybrid tolerance for 8-bit fixed-point on windows

2e1ac4e

Merge remote-tracking branch 'upstream/main' into harley/bilinear-tex…

bd9d16c

…ture

30-40% faster via atlas_dim_bits + vertex UV

36e3f3e

Uhhhh 50% faster again?

b50aaee

This was referenced Mar 28, 2026

bilinear texture sampling with textureSample instead of textureLoad #1517

Draft

vello_hybrid: Faster bilinear image rendering #1493

Open

Remove tint cost

58ea8fe

raphlinus mentioned this pull request Apr 14, 2026

shader: implement bicubic image sampling #1557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Investigation PR]: Improving performance of image sampling in Vello Hybrid#1547

[Investigation PR]: Improving performance of image sampling in Vello Hybrid#1547
taj-p wants to merge 11 commits intomainfrom
tajp/bilinear

taj-p commented Mar 28, 2026 •

edited

Loading

Uh oh!

LaurenzV commented Mar 28, 2026

Uh oh!

taj-p commented Mar 28, 2026 •

edited

Loading

Uh oh!

taj-p commented Mar 28, 2026 •

edited

Loading

Uh oh!

nicoburns commented Mar 29, 2026

Uh oh!

taj-p commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

taj-p commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Baseline

Strategy 1: Don't query texture dimensions per pixel + simplify its calculation + move per-pixel calcs to vertex

Strategy 2: Remove branching

Extra: 58ea8fe

Uh oh!

LaurenzV commented Mar 28, 2026

Uh oh!

taj-p commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taj-p commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicoburns commented Mar 29, 2026

Uh oh!

taj-p commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

taj-p commented Mar 28, 2026 •

edited

Loading

Extra: `58ea8fe`

taj-p commented Mar 28, 2026 •

edited

Loading

taj-p commented Mar 28, 2026 •

edited

Loading