Skip to content

perf(gui): implement batched UI rendering#2514

Open
githubawn wants to merge 21 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch
Open

perf(gui): implement batched UI rendering#2514
githubawn wants to merge 21 commits intoTheSuperHackers:mainfrom
githubawn:perf/gui-batch

Conversation

@githubawn
Copy link
Copy Markdown

@githubawn githubawn commented Mar 31, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR implements batched 2D UI rendering by centralizing all render-state setup in a new setup2DRenderState helper that accumulates draw calls within beginBatch/endBatch scope, only issuing GPU Render() calls when texture or draw mode changes — yielding 2–4× FPS improvements across all tested configurations.

The implementation is architecturally sound: m_batchNeedsInit prevents spurious empty flushes, texture ref-counts are correctly balanced in both batching and non-batching paths (including the Get_Texture/Release_Ref pair in drawImage), and TheDisplay->flush() in W3DDisplayString::draw() correctly drains pending 2D geometry before the text renderer's own draw pass. As a bonus, the drawImage clipping early-exit is extended to cover the missing startX >= m_clipRegion.hi.x and startY >= m_clipRegion.hi.y bounds, fixing a pre-existing culling gap.

Confidence Score: 5/5

Safe to merge — no P0 or P1 issues found; texture ref-counting, render-state transitions, and batch lifecycle are all correct.

Thorough analysis of ref-counting in both batching and non-batching paths, render state management across mode transitions, re-entrancy guards, the m_batchNeedsInit dirty-flag semantic, and the flush-before-text ordering — all are implemented correctly. The previous thread concerns (m_batchNeedsInit initialization and texture lifetime around Release_Ref) are resolved in the current code.

No files require special attention; Generals and GeneralsMD implementations are kept in sync and are logically identical.

Important Files Changed

Filename Overview
Core/GameEngineDevice/Source/W3DDevice/GameClient/W3DView.cpp Wraps drawablePostDraw iteration in beginBatch/endBatch for the 3D-world post-draw pass
Generals/Code/GameEngine/Include/GameClient/Display.h Adds beginBatch/endBatch/flush public API and onBeginBatch/onEndBatch/onFlush protected hooks with m_isBatching field
Generals/Code/GameEngine/Source/GameClient/Display.cpp Implements base batch control with re-entrancy guards; m_isBatching initialized to FALSE
Generals/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Declares setup2DRenderState, overrides batch hooks, adds m_batchTexture/m_batchMode/m_batchGrayscale/m_batchNeedsInit fields
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Core batching engine: setup2DRenderState centralises state transitions with correct ref counting; clipping early-exit bug fix included
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Adds TheDisplay->flush() before text render to commit pending batched 2D geometry in correct draw order
Generals/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp Wraps entire UI draw pass in beginBatch/endBatch to batch all HUD draw calls together
GeneralsMD/Code/GameEngine/Include/GameClient/Display.h Zero Hour mirror of Generals Display.h batch API additions
GeneralsMD/Code/GameEngine/Source/GameClient/Display.cpp Zero Hour mirror of Generals Display.cpp batch implementation
GeneralsMD/Code/GameEngineDevice/Include/W3DDevice/GameClient/W3DDisplay.h Zero Hour mirror: same setup2DRenderState declaration and batch member fields
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplay.cpp Zero Hour mirror of Generals implementation with identical batching logic and ref counting
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DDisplayString.cpp Zero Hour mirror: flush before text renderer
GeneralsMD/Code/GameEngineDevice/Source/W3DDevice/GameClient/W3DInGameUI.cpp Zero Hour mirror: wraps UI draw pass in beginBatch/endBatch

Sequence Diagram

sequenceDiagram
    participant G as Game Loop
    participant D as Display
    participant S as setup2DRenderState
    participant R2D as Render2DClass

    G->>D: beginBatch()
    D->>D: m_isBatching=TRUE
    D->>R2D: Reset(), m_batchNeedsInit=TRUE

    G->>S: drawImage(texA, ALPHA)
    S->>S: onFlush() — no-op (batchNeedsInit=TRUE)
    S->>R2D: Add_Ref(texA), Set_Texture(texA), Enable_Alpha(TRUE)
    S->>R2D: Add_Quad(...)

    G->>S: drawImage(texA, ALPHA) [same state]
    S->>S: fast-path return — state unchanged
    S->>R2D: Add_Quad(...)

    G->>S: drawImage(texB, ALPHA) [texture change]
    S->>R2D: Render() — flush texA batch
    S->>R2D: Reset(), m_batchNeedsInit=TRUE
    S->>S: Release_Ref(texA), Add_Ref(texB)
    S->>R2D: Set_Texture(texB)
    S->>R2D: Add_Quad(...)

    G->>D: TheDisplay->flush() [before text render]
    D->>R2D: Render() — flush texB batch
    D->>R2D: Reset()

    G->>D: endBatch()
    D->>D: onFlush() — no-op (batchNeedsInit=TRUE after last flush)
    D->>D: Release_Ref(texB), m_isBatching=FALSE
Loading

Reviews (17): Last reviewed commit: "fix generals replication" | Re-trigger Greptile

@githubawn
Copy link
Copy Markdown
Author

githubawn commented Mar 31, 2026

Significantly improves #109
Fully fixing this requires significant refactoring to Render2DSentenceClass for fonts

optimize clipping check
return statement on seperate line
centralized flush
@githubawn
Copy link
Copy Markdown
Author

@greptileai

"One P1 render-state initialization bug in onBeginBatch() should be fixed before merging to avoid potential cross-frame visual corruption."

The P1 finding could be a false positive based on a local variable analysis that omits the downstream hardware synchronization in Render2DClass::Render() (render2d.cpp:601).

Render2DClass is a self-synchronizing state machine. Its Render() call (line 628 for textures, line 694 for shaders) explicitly pushes its internal members to the DX8Wrapper on every execution. The tracking variables in W3DDisplay (m_batchTexture, etc.) are mirror-images of Render2DClass's internal members.

happy?

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 31, 2026

No, the P1 finding stands.

Your argument about Render2DClass being self-synchronizing addresses hardware state consistency, but not the C++ undefined behavior in W3DDisplay.

The issue:

  • onFlush() reads !m_batchNeedsInit (line 617 of your diff)
  • setup2DRenderState() reads !m_batchNeedsInit (line 627)
  • If uninitialized, these branches are unpredictable

Even if Render2DClass correctly synchronizes hardware state on every Render() call, W3DDisplay's control flow still depends on m_batchNeedsInit to decide:

  1. Whether to call Render() in onFlush()
  2. Whether to skip redundant state setup in setup2DRenderState()

An uninitialized bool means these decisions are based on garbage. The fact that the hardware might recover downstream doesn't make reading uninitialized memory defined behavior.

The fix is one line. Keep it.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I see greptile woke up on the sassy side of the motherboard.

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

Reduces draw calls by batching 2D elements by texture state in setup2DRenderState.

Environment Scenario (-quickstart) Before After
Main Machine Skirmish Screen 477 FPS 918 FPS
M1 Macbook (Wine) Skirmish Screen 16 FPS 61 FPS
M1 Macbook (Wine) In-Game (Match Start) 22 FPS 33 FPS

Another test would be to get a lot of units on screen then select them all so their unit info is being drawn.

The health bars make use of the line drawing classes which i believe are inherently affected by this change.

@githubawn
Copy link
Copy Markdown
Author

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.

image

@Mauller
Copy link
Copy Markdown

Mauller commented Apr 1, 2026

I was initially going to say that health bars render in drawViews() (outside this batch), but your comment made me realize it's only a 6-line change to wrap that too.

Jumped from 45 FPS to 56 FPS.
image

Nice ~30% improvement there

@Skyaero42
Copy link
Copy Markdown

Compile errors need to be fixed before this can be reviewed

@githubawn githubawn force-pushed the perf/gui-batch branch 2 times, most recently from 33ccf35 to 5dd0e6e Compare April 6, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants