Migrate vello sparse strips renderers to use glifo by LaurenzV · Pull Request #1562 · linebender/vello

LaurenzV · 2026-04-07T11:23:32Z

Overview

This PR attempts to integrate glifo into vello_hybrid and vello_cpu. As mentioned, the git diff is a bit messy because 1) it's based on an old version of main and 2) some commits that are needed have been cherry-picked, bust should ideally be merged separately (i.e. the other currently open PRs).

Blockers

PRs I’d consider blocking are:
~~#1553~~
~~#1555~~

Semi-blocking is also #1459, but for now we could just accept that (non-cached) bitmap glyphs panic in vello_hybrid and not render them for now.

Limitations / TODOs

Since this PR by itself is already big by itself, it still some TODOs and definitely doesn’t represent the ideal end state we want to have:

glifo still has the dependency on vello_common.
While I agree that there are probably improvements that can be made to the API (see the Zulip thread), I have not investigated those yet and just implemented the minimum necessary to get tests to run. For now, core text rendering logic still lives in vello_hybrid/vello_cpu instead of an external crate or somewhere else. I also haven’t introduced any “render from sub-rect of atlas” API, instead just sticking as closely as possible to what Alex previously implemented.

Some other TODOs:

When caching is enabled, zooming is currently extremely slow. Looking at the profile, it becomes clear that this is because deallocating and reallocating cached glyphs with guillotierre becomes a huge bottleneck with many glyphs. So we either need to explore using etagere for that, or investigate whether there is something else that causes the issue in our case. See also the video:

slowness.mp4

There also seems to be some other bug where some glyphs become corrupted at certain zoom levels, but it's not easily reproducible. I am not sure yet whether this is a regression introduced in this PR or already existed previously. If I find the time I might piggy-back a fix into this PR, otherwise it will be a follow-up:

cache_bug.mp4

It seems like when caching is enabled, the glyphs have some kind of funny "jump" when panning the viewport. I suspect that the subpixel offsets are somehow applied inversely, but I need to look into it. EDIT: Nevermind, I realized that the answer is very simple, it's because we don't do any subpixel rendering in the y direction at all. Only in the x direction.

jump_bug.mp4

So overall, there are definitely more things to investigate, but as you can see the integration seems to work overall!

Changes

The biggest change is probably to resources handling. Both renderers now require the user to pass a Resources (open for better name suggestions) for certain operations (anything text-related on the scene, and the final rendering operation). In vello_cpu, this basically only contains stuff relevant for glyph caching. In vello_hybrid, this additionally now contains the image cache, which previously lived in the “Renderer”. This allows us to share resources between scene encoding and the final rendering operation. The disadvantage is that it puts the burden on the user to always pass the correct resources, but hopefully this shouldn’t be too hard to get wrong. I don’t really see any better way of handling this otherwise. Perhaps this API can in the future be extended to allow the user passing custom resources (e.g. custom texture views for sampling). In vello_cpu, we could probably get away with hiding this behind RenderContext, but it also gets a bit messy and I think having this explicit consistently is better.
The default size of an atlas on Vello CPU was changed to 4096, I hope that’s a reasonable default? Or should it be smaller?
I’ve made slight changes to how the glyph run builder API works. In the previous parley version, GlyphRunBuilder was a simple struct with a builder pattern, where you needed to manually the image cache and glyph caches to the method. In order to make it work better with the Resources struct used by both renders, I’ve made the builder generic over a backend. This way, the user simply has to call glyph_run once, passing the resources to it, and the rest then happens automatically.
The previous benchmarks had some benches for measuring specific parts of the pipeline (e.g. maintain), since those are now more hidden away it’s not easy to benchmark anymore. Therefore, I’ve changed the benchmarks to simply be “integration” benchmarks for Vello CPU.
I split up the previous GlyphCaches struct into a separate struct for outline/hint caches and atlas caching. The reason for this change is that I want it to be possible to lazily initialize glyph atlases and glyph renderers (especially in vello_cpu), such that the user doesn’t have to pay the cost for constructing those and keeping them in memory, if they don’t make use of the feature at all. This required making it easier to pass no atlas cache to all of the glyph drawing methods, and splitting those two up seemed like the easiest way. This way, you can still use the outline/hint cache even if atlas caching itself is disabled.
I’m not sure what the best way of handling this is, but obviously we want to exercise the caching path in the tests. Unfortunately, caching simply can result in quite large pixel differences, so reusing existing snapshots is not really an option, and we need dedicated ones for new ones. Right now, I went down the path of simply adding an attribute that parametrizes the test on whether glyph caching should be enabled or not. I’m not sure if the better option here is to just manually hardcode the variants for each test, or something else. Open for suggestions here!

Other Notes

Something that make things awkward are recordings. Ideally, glifo would take care of handling and re-exporting all text-related dependencies, and vello_common doesn’t have any of that. However, since the Recording type lives in vello_common and we need to have the abilities to replay glyph drawing instructions, some of it has to live in vello_common for now. In particular, I had to move the Glyph type to vello_common, in the long term this should probably move back to glifo.
To expand on that, the fact that we have recordings makes things even harder. On current main, we are able to record normal outline glyphs but will panic for COLR and bitmap glyphs. However that exact setup is unfortunately difficult to replicate now. The reason being that the version currently on main relies on the ability to lower glyph runs into strip commands directly in vello_common. However, now that logic lives in glifo, and vello_common can’t take a dependency on glifo. Therefore, for now I’ve opted to simply record glyph render commmands as they are, but not actually provide any caching.
For similar reasons, we need to duplicate the glyph run builder API in vello_common, which is also not ideal, but I wouldn't know how to deal with that better.

This reverts commit ee0dd29.

Closes #1528. # Context Nearly all gradients have a clearly defined value at all position, the only exception being very specific degenerate radial gradients. The behavior in this case is that all positions that are undefined should yield a transparent pixel. Previously, we had handling for this in the CPU shader, where at sampling time we would explicitly mask out such positions. As part of #1301, a number of performance changes were introduced for gradient handling. Among that, a trick was used that encodes an additional transparent padding pixel in the gradient LUT, allowing to handle that case much better in Vello CPU. While this is a valid approach, it has two problems: 1) It destroys the assumption that the length of gradient LUTs are a multiple of 2. While for Vello CPU this doesn't matter and it currently also doesn't matter for Vello Hybrid, it could be a very useful property to have in the future, especially if we decide to change how gradients are uploaded in Vello hybrid. 2) It requires us to cache stops based on whether the underlying gradient has potentially undefined locations, which is also unfortunate and the root cause for #1528. # Solution This PR revert that change (while still keeping in-tact the other performance changes from that PR), such that gradients LUT's are always a multiple of 2 again and don't need to know anything about undefined locations. From my benchmarks, this doesn't have any notable performance impact for the normal case. In certain cases, it seems to even slightly improve performance. While I haven't measured it, this is likely to make undefined gradients about 2x slower, but since those are very rare edge cases we don't really care about apart from for correctness, I think this is very much a worthy change. Here you can find the change in performance before vs. after this PR: ``` fine/gradient/linear/opaque_u8_neon time: [359.08 ns 360.57 ns 362.26 ns] change: [-5.0401% -4.6310% -4.2254%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 3 (3.00%) high mild 5 (5.00%) high severe fine/gradient/radial/opaque_u8_neon time: [475.66 ns 476.60 ns 477.56 ns] change: [-1.3570% -1.0840% -0.8030%] (p = 0.00 < 0.05) Change within noise threshold. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe fine/gradient/radial/opaque_conical_u8_neon time: [577.07 ns 578.02 ns 579.05 ns] change: [-1.2222% -0.8150% -0.3570%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe fine/gradient/sweep/opaque_u8_neon time: [847.68 ns 849.27 ns 850.89 ns] change: [-0.0226% +0.2834% +0.5911%] (p = 0.07 > 0.05) No change in performance detected. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe fine/gradient/extend/pad_u8_neon time: [346.25 ns 346.96 ns 347.77 ns] change: [-0.2873% +0.0627% +0.3827%] (p = 0.72 > 0.05) No change in performance detected. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe fine/gradient/extend/repeat_u8_neon time: [368.20 ns 369.45 ns 371.00 ns] change: [-4.5087% -4.1494% -3.8349%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe fine/gradient/extend/reflect_u8_neon time: [417.63 ns 418.73 ns 420.20 ns] change: [-6.1869% -5.5205% -4.6695%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 8 (8.00%) high mild 6 (6.00%) high severe fine/gradient/many_stops_u8_neon time: [479.07 ns 480.13 ns 481.45 ns] change: [-0.7523% -0.4523% -0.0884%] (p = 0.01 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe fine/gradient/transparent_u8_neon time: [459.30 ns 461.39 ns 463.68 ns] change: [-0.4103% +0.0785% +0.6946%] (p = 0.77 > 0.05) No change in performance detected. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe ``` --------- Co-authored-by: Laurenz Stampfl <laurenz@canva.com> Co-authored-by: Alex Gemberg <gemberg@canva.com> (cherry picked from commit 8859b91)

…o default blending only

… necessary

LaurenzV · 2026-04-13T11:45:26Z

@taj-p Thanks for the great suggestion, code is now simpler indeed, let me know what you think! I haven't yet merged the two traits because conceptually they do kind of do different things, but I'm happy to revisit this in the future.

There's probably still a lot more that can be improved, but I hope this is a good start now! Feel free to make any changes you want and merge yourself, or I can do it once you are happy and approved.

LaurenzV · 2026-04-13T12:12:38Z

I also removed the COLR scene wrapper and just always wrap color glyphs in a push/pop command, which seems like the correct thing to do on both, Vello Hybrid and Vello CPU (regardless of the default_blending_problem).

taj-p

Sorry, I ran out of time this morning to finish the review. The migration looks good.

Some of the comments are addressed in #1571 (feel free to merge that PR / cherry pick / whatever you like).

There's a current high memory cost to glyph caching that we're likely going to need to address before being able to consume it. Either by device detection or improving memory efficiency. But that can definitely come later.

taj-p · 2026-04-13T20:53:24Z

+    // TODO: Similarly to wgpu, maybe this can be done in a more effective
+    // way?


100 what? 😄

Ohhh I generally write this to mean "100% - totally agree!" - as in I strongly agree.

taj-p · 2026-04-13T21:28:03Z

+                (width as u16, height as u16)
+            };
+            // TODO: Use a config optionally provided by the user!
+            self.glyph_resources = Some(GlyphAtlasResources::with_config(


Ah, so the user pays the cost of duplicate alpha and strip storage and all the other buffer allocations of a new renderer. That's quite a memory cost. Ideally, these buffers are shared between renderers, but to enable that, we would require deferring alpha/strip evaluation until render time - that is, buffer all draw commands and then, on render, execute the caching glyph pipeline first, reset and pass internal buffer allocations to the screen renderer, and then render the screen.

No action here. Just sharing thoughts on opportunities.

Hmm yeah, I think it would be tricky! We should do some measurements to determine memory impact.

taj-p · 2026-04-13T21:32:52Z

+    }
+
+    /// Upload all pending bitmaps, rasterize pending outline/COLR glyphs, etc.
+    fn sync_glyph_cache(&mut self) {


I believe this has been addressed in your updates 🙏

taj-p · 2026-04-13T21:41:54Z

+        // TODO: We need to figure something out here API-wise. At the moment, the user can
+        // theoretically rasterize the same `RenderContext` multiple times without resetting in-between.
+        // However, if glyph caching is enabled, this method call could now evict that were previously
+        // assumed to exist in `RenderContext`, meaning that if the user rasterizes the same `RenderContext`
+        // again without resetting it, some of the cached glyphs might be stale and not exist anymore.
+        resources.after_render();


Sorry, I ran out of my timebox this morning in reviewing the PR

taj-p

Considering this PR was supposed to simply factor out the vello_* logic into corresponding crates, I think I've overstepped how much improvement we should try make to glifo in this PR.

It works and it looks good!

cLGTM if the prior comments are adopted

taj-p · 2026-04-14T05:14:00Z

-    glyph::{GlyphCaches, GlyphRenderer, GlyphRunBuilder, GlyphType, PreparedGlyph},
-};
+pub(crate) const DEFAULT_GLYPH_ATLAS_SIZE: u16 = 4096;
+// Why do we need this? The reason is that the way uploaded images work in Vello Hybrid


We might be able to avoid this but let's not tackle that now. The PR is in a much better state already.

laurenz-canva and others added 30 commits March 30, 2026 08:42

Add glyph image cache

3c2854d

Move vello_cpu backend out of glifo

3d8b70d

Hook up vello cpu glifo backend

50af427

Make direct glyph rendering with vello_cpu work

f21d333

Replace vello_cpu tests

27f7b37

glifo: Add proper support for caching stroked glyphs

ee0dd29

Integrate vello_cpu into the test suite

a09b2b9

Add support for recordings

5090237

Move vello_hybrid backend out of glifo

219f298

Make it compile

791130d

Integrate vello_hybrid and fix recordings

998563f

Hook up into vello example scenes and vlelo bench

d599286

Remove old glyph stuff

11d98dc

Remove more old glyph stuff

7b00034

Put glyph type into vello_common

5aa055d

Reformat

0955081

Add tests for cached glyphs

5f59eee

Revert "glifo: Add proper support for caching stroked glyphs"

8b43ace

This reverts commit ee0dd29.

Never cache stroked glyphs

c293ce0

Support caching with vello_hybrid

8af8e54

Add scene for testing caching

e39448d

Tweaks

9b99e83

Fix examples

d37e20d

Try to fix glyph scaling issues

f53b1ec

vello_hybrid: Allow nested blending even if scene constraint is set t…

6339f5b

…o default blending only

vello_hybrid: Wrap COLR draw calls in layer

ea3a31b

Add test case

e1994c0

vello_hybrid: Actually schedule fast path strips in the highest round…

97ddc1f

… necessary

Fix coarse batches as well

7104eca

laurenz-canva added 7 commits April 13, 2026 08:26

radme

4e4a13f

Clippy

5c99b98

Move and rename renderer module

dcccf45

Trim comment length

46a63cb

Fix comment

32e7e3b

Unwrap more explicitly

6b406d8

.

2a575b7

laurenz-canva added 2 commits April 13, 2026 13:58

Clippy

608b9f5

Remove COLR wrapper

d0d49ef

taj-p reviewed Apr 13, 2026

View reviewed changes

Comment thread glifo/src/interface.rs

taj-p approved these changes Apr 14, 2026

View reviewed changes

taj-p and others added 12 commits April 14, 2026 07:45

Remove pending_glyph_uploads_scratch

a37e196

Pass rects

b3545e0

Remove CPU buffers

20f8640

Tweak messages

45684ee

Remove hybrid buffers

eb3acd2

.

faf9c7f

Move DrawSink impl

e4d393b

Add a TODO

ca16497

Add another TODO

50da997

Reformat

b782d57

Clippy

35e9aae

Panic for COLR emojis with non-default blending

f390b17

laurenz-canva added this pull request to the merge queue Apr 14, 2026

Merged via the queue into main with commit 46cc30b Apr 14, 2026
17 checks passed

laurenz-canva deleted the laurenz/parley_migration branch April 14, 2026 06:54

		// TODO: Similarly to wgpu, maybe this can be done in a more effective
		// way?

Conversation

LaurenzV commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Blockers

Limitations / TODOs

Changes

Other Notes

Uh oh!

LaurenzV commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LaurenzV commented Apr 13, 2026

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

taj-p left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LaurenzV commented Apr 7, 2026 •

edited

Loading

LaurenzV commented Apr 13, 2026 •

edited

Loading