Skip to content

Commit efe4226

Browse files
committed
Task mesh benchmark publish.
1 parent 391c5bd commit efe4226

7 files changed

Lines changed: 73 additions & 17 deletions

File tree

index.html

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,17 +28,26 @@
2828

2929
(#) Technical Articles
3030

31+
- **[Task and Mesh Shader Benchmark](technical/task-mesh-benchmarking/task-mesh-benchmarking.md.html)**
32+
- Performance comparison of traditional rendering vs GPU-driven mesh shaders with practical Vulkan implementations.
33+
- Benchmarks show 5.3x speedup with task+mesh shaders on high-poly geometry, plus cache behavior analysis.
34+
- Last Modified 2025/11/26
3135
- **[Vulkan Descriptor Buffers (Redux)](technical/descriptorBuffers2.md.html)**
32-
- A technical post about implementing descriptor buffers in place of the traditional descriptor pool in my renderer/engine.
33-
- A redux of the previous descriptor buffer article with improvements and clarifications.
36+
- A technical post about implementing descriptor buffers in place of the traditional descriptor pool in my renderer/engine.
37+
- A redux of the previous descriptor buffer article with improvements and clarifications.
38+
- Last Modified 2025/10/23
3439
- **[Environment Mapping in Vulkan](technical/environmentMapping.md.html)**
3540
- A brief look at the implementation of environment mapping in my renderer/engine. This includes Specular/Irradiance Cubemaps.
41+
- Last Modified 2024/06/25
3642
- **[GPU-Driven Rendering and Culling](technical/gpudrivenandculling.md.html)**
3743
- A short summary of gpu draw call generation and culling in my renderer/engine.
44+
- Last Modified 2024/06/13
3845
- **[Multi-Sample Anti-Aliasing](technical/msaa.md.html)**
3946
- A short summary of msaa implementations in my renderer/engine and visual/performance comparisons.
47+
- Last Modified 2024/06/04
4048
- **[Vulkan Descriptor Buffers](technical/descriptorBuffers.md.html)**
4149
- A technical post about implementing descriptor buffers in place of the traditional descriptor pool in my renderer/engine.
50+
- Last Modified 2024/05/30
4251

4352
<!-- Markdeep: --><style class="fallback">body{visibility:hidden;white-space:pre;font-family:monospace}</style><script src="markdeep.min.js" charset="utf-8"></script><script src="https://morgan3d.github.io/markdeep/latest/markdeep.min.js?" charset="utf-8"></script><script>window.alreadyProcessedMarkdeep||(document.body.style.visibility="visible")</script>
4453

119 KB
Loading
5.66 KB
Loading
5.65 KB
Loading

technical/task-mesh-benchmarking/task-mesh-benchmarking.md.html

Lines changed: 62 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,19 @@
11
**_GPU-Driven Rendering Performance: Traditional vs Mesh Shaders_**
2-
**William Gunawan** - 22 November, 2025
2+
**William Gunawan** - 26 November, 2025
33

44
# Introduction
55

66
This technical article is a benchmark comparison of traditional rendering pipelines against modern GPU-driven techniques: draw indirect, mesh shaders, and task shaders.
77
The focus is on measuring the performance impact of moving culling decisions from CPU to GPU, and from per-instance to per-meshlet granularity.
88

9+
## Code Availability
10+
11+
The code for this vulkan application is readily available at:
12+
- https://github.com/Williscool13/MeshTaskBenchmark
13+
14+
The profiler results and images used in this article are also readily available at:
15+
- https://github.com/Williscool13/Williscool13.github.io/tree/main/technical/task-mesh-benchmarking
16+
917
## Test Scene
1018

1119
**Geometry:** 125 Stanford Bunnies (72,378 vertices, 144,046 triangles each) arranged in a 5x5x5 grid.
@@ -76,9 +84,6 @@
7684
This configuration adds GPU-driven per-instance culling while keeping the traditional vertex pipeline.
7785
A compute shader performs frustum culling and writes draw commands for visible instances to an indirect buffer.
7886

79-
80-
### Compute Pass: Instance Culling
81-
8287
The compute shader dispatches one thread per instance to evaluate visibility. Here, we only cull on an instance-level using frustum culling.
8388

8489
```````````````````cpp
@@ -141,8 +146,6 @@
141146
A more optimized approach would batch identical models into single draws with instanceCount > 1.
142147

143148

144-
### Render Pass: Indirect Draw
145-
146149
The GPU reads the culled draw commands without CPU involvement. We're going all-in on GPU-Driven Rendering:
147150
```````````````````cpp
148151
vkCmdBindVertexBuffers(cmd, 0, 1, &megaVertexBuffer.handle, &vertexOffset);
@@ -332,8 +335,6 @@
332335

333336
I'm particularly proud about this implementation :)
334337

335-
### Compute Pass: Instance Culling
336-
337338

338339
```````````````````
339340
public struct MeshIndirectDrawParameters {
@@ -390,7 +391,6 @@
390391
}
391392
```````````````````
392393

393-
### Render Pass: Indirect Draw
394394
With some modification to the task shader to accommodate indirect
395395
```````````````````
396396
[shader("task")]
@@ -479,11 +479,13 @@
479479

480480
# Discussion
481481

482-
**Task and Mesh shaders provide massive gains with only meshlet-level culling**
482+
(###) **Task and Mesh shaders provide massive gains with only meshlet-level culling**
483+
483484
- Task and Mesh achieves a 5.3x performance increase over traditional vertex pipeline
484485
- Processing 2,251 meshlets per instance vs full 72K vertex model
485486

486-
**Instance-level culling shows mixed results**
487+
(###) **Instance-level culling shows mixed results**
488+
487489
- Indirect + Traditional: 46% faster than baseline (74 FPS vs 51 FPS)
488490
- Traditional rendering processes every vertex regardless of visibility, so eliminating even a few instances provides measurable savings.
489491
- There is a considerable amount of waste in traditional rendering. Lots of backface rasterization, reasonably worse cache locality, and less control overall of the geometry pipeline.
@@ -496,23 +498,68 @@
496498
The finer granularity results in better GPU occupancy and aligns the graphics pipeline with modern GPU-driven rendering techniques.
497499
More importantly, it allows precise control over which parts of a mesh actually get rendered, eliminating wasted work before it reaches the rasterizer.
498500

501+
## Profiler Analysis
502+
503+
Cache behavior shows clear differences between traditional and mesh shader approaches:
504+
505+
![Benchmark Overview](benchmarkOverview.png)
506+
507+
![Traditional Cache](traditionalCache.png)
508+
509+
![Traditional Indirect Cache](traditionalIndirectCache.png)
510+
511+
![Meshlet Cache](meshCache.png)
512+
513+
![Meshlet Indirect Cache](meshIndirectCache.png)
514+
515+
(###) **L2 Cache Performance**
516+
517+
- Traditional: 57.8% hit rate
518+
- Traditional Indirect: 57.4% hit rate
519+
- Mesh: 64.2% hit rate
520+
- Mesh Indirect: 63.9% hit rate
521+
522+
(###) **Observations**
523+
524+
Mesh shaders achieve ~11% better L2 cache hit rates.
525+
526+
This improvement likely stems from processing meshlets as independent, tightly-packed units rather than strided vertex buffers across the entire model.
527+
528+
Notably, adding indirect culling doesn't significantly hurt cache performance in either pipeline. The compute pass overhead is minimal compared to the rendering workload.
529+
530+
(###) **L1 Cache Behavior**
531+
532+
L1 cache hit rates are consistently low across all configurations (4-7%).
533+
This pattern appears in both this benchmark and my game engine, suggesting it may be related to the draw setup or memory access patterns.
534+
While this could affect absolute performance numbers, the relative comparison between pipelines remains valid.
499535

500536
## Limitations
501537

502538
This benchmark favors mesh shaders due to the high vertex count (72K vertices per bunny). The 5.3x speedup reflects ideal conditions for meshlet-level culling.
503539
Other optimizations may also disproportionately improve the performance of traditional rendering techniques, further reducing the performance gap between the 2 approaches.
504-
Geometry LOD for example, could help with would likely help traditional rendering slightly more than it does meshlet rendering.
540+
Geometry LOD for example, would likely help traditional rendering slightly more than it does meshlet rendering.
505541

506-
## Pratical Considerations
542+
### Practical Considerations
507543

508544
Other factors that make traditional pipelines more appealing also need to be considered:
509-
- Much better support on older GPUs. Task+Mesh is only supported on NVIDIA Turing+ (RTX 2000+), AMD RDNA2+ (RX 6000+), and Intel Arc. Traditional pipelines work on any GPU from the past decade.
545+
- Much better support on older GPUs. Task+Mesh is only supported on NVIDIA Turing+, AMD RDNA2+, and Intel Arc. Traditional pipelines work on any GPU from the past decade.
510546
- Simpler debugging and profiling. Mesh shader workloads can be harder to trace and analyze with standard GPU tools.
511547
- Traditional rendering is much more ubiquitous so learning material and general developer familiarity with them is high.
512548
- Task + Mesh shaders aren't universally beneficial. Low-poly meshes (< 1000 triangles) may not benefit from the added complexity, while high-density photogrammetry scans and CAD models see the largest gains.
513549

550+
# Conclusion
551+
552+
If you're planning on exploring modern rendering techniques for use in your game engine, you need to know the benefits and drawbacks of using them.
553+
Task and mesh shaders are great for scenes with high geometry complexity, but may not perform as well for simple scenes.
554+
Adoption rate is still low, requiring modern hardware from the user. [Vulkan GPU Info](https://vulkan.gpuinfo.org/listextensions.php) reports adoption rates at <10%, so there is still a way to go before this technique can be broadly used.
555+
If you plan on making a game engine or renderer with large reach, this technique may not be the right choice for you.
556+
557+
With all this in mind, if these circumstances are right for you, use task and mesh shader! They're not that complicated.
558+
559+
Thanks for reading! Feel free to contact me for fun talks about graphics and game engines :)
560+
561+
(#) References
514562

515-
# Further Reading
516563
- [NVIDIA - Introduction to Mesh Shaders](https://developer.nvidia.com/blog/introduction-turing-mesh-shaders/)
517564
- [AMD - Mesh Shader Guide](https://gpuopen.com/learn/mesh_shaders/mesh_shaders-from_vertex_shader_to_mesh_shader/).
518565
- [NVIDIA - Using Mesh Shaders For Professional Graphics](https://developer.nvidia.com/blog/using-mesh-shaders-for-professional-graphics/)
5.72 KB
Loading
5.7 KB
Loading

0 commit comments

Comments
 (0)