text.c: Improve the performance of text rendering#101
text.c: Improve the performance of text rendering#101TheGag96 wants to merge 2 commits intodevkitPro:masterfrom
Conversation
...by doing the following: - Create caches around `fontGetCharWidthInfo` and `fontCalcGlyphPos` for ASCII characters, because those are pretty slow. (Non-English languages do not get this benefit - perhaps an LRU cache for non-ASCII glyphs could be made to help here.) - Instead of doing one draw call per glyph, try to batch them as much as possible. - Because the system font is so fragmented (5 glyphs per texture), this requires collecting glyphs and sorting them before drawing batches to avoid costly texture swaps. (citro2d has the function `C2D_TextOptimize` for this exact reason.) This lets me maintain 60 FPS on my o3DS with the 3D turned on in most but not all circumstances. Enough glyphs on screen can still cause dropped frames.
…ets as combined As it turns out, the sytem font texture sheets are all 128x32 pixels and adjacent in memory! We can reinterpet the memory starting at sheet 0 and describe a much bigger texture that encompasses all of the ASCII glyphs and make our cache use that instead of the individual sheets. This will massively improve performance by reducing texture swaps within a piece of text, down to 0 if it's all English. We don't need any extra linear allocating to do this! The coalescing will be applied to all characters / glyph sheets up until the last `glyphInfo.nSheets % 32` sheets. This means that there are more operations per glyph being done in `textGetGlyphPosFromCodePoint`, but this is probably offset by the savings from not switching textures as often. And, this won't matter for English text, which has these results cached.
| // As it turns out, the sytem font texture sheets are all 128x32 pixels and adjacent in memory! We can reinterpet | ||
| // the memory starting at sheet 0 and describe a much bigger texture that encompasses all of the ASCII glyphs and | ||
| // make our cache use that instead of the individual sheets. This will massively improve performance by reducing | ||
| // texture swaps within a piece of text, down to 0 if it's all English. We don't need any extra linear allocating to | ||
| // do this! |
There was a problem hiding this comment.
There are more system fonts (KR,CN,TW) besides the normal one (JP/EU/NA). Instead of blindly making an assumption, would it be possible to explicitly detect if the texture sheets are in fact adjacent in memory, and fall back to the normal way if they aren't?
There was a problem hiding this comment.
So, I just wrote some code to loop over each sheet to do this, but I'm now realizing that fontGetGlyphSheetTex is just indexing out of a flat array assuming a single sheet size shared between them to begin with:
static inline void* fontGetGlyphSheetTex(CFNT_s* font, int sheetIndex)
{
if (!font)
font = fontGetSystemFont();
TGLP_s* tglp = fontGetGlyphInfo(font);
return &tglp->sheetData[sheetIndex*tglp->sheetSize];
}It seems then checking beforehand would be needless. What do you think?
There was a problem hiding this comment.
Good catch. In that case, the format does guarantee that all texsheets are adjacent in memory. (Can you tell I haven't looked at this code in many, many years? :p)
|
Great work. Combined with some planned upcoming changes, I think this will finally fix hbmenu rendering performance. However I am not entirely convinced by the second commit, I think it should be revised to explicitly check if the optimization can be performed, and fall back on the normal method otherwise. |
|
Understandable, I can do that. So far, I have not observed this hack to fail on the couple systems I've tested, as well as Azahar with the system font available from a NAND dump. |
|
Can you rebase your branch against latest master? There is a fix for a libctru API change in there without which CI fails on this PR. |
| #include "text.h" | ||
|
|
||
| #define NUM_ASCII_CHARS 128 | ||
| #define SHEETS_PER_BIG_SHEET 32 |
There was a problem hiding this comment.
Instead of hardcoding this, I think it would be a better idea to explicitly calculate it as 1024/texSheetHeight
...by doing the following:
fontGetCharWidthInfoandfontCalcGlyphPosfor ASCII characters, because those are pretty slow. (Non-English languages do not get this benefit - perhaps an LRU cache for non-ASCII glyphs could be made to help here.)C2D_TextOptimizefor this exact reason.)This lets me maintain 60 FPS on my o3DS with the 3D turned on in most but not all circumstances. Enough glyphs on screen can still cause dropped frames.
UPDATE: One more commit from another breakthrough:
As it turns out, the system font texture sheets are all 128x32 pixels and adjacent in memory! We can reinterpet the memory starting at sheet 0 and describe a much bigger texture that encompasses all of the ASCII glyphs and make our cache use that instead of the individual sheets. This will massively improve performance by reducing texture swaps within a piece of text, down to 0 if it's all English. We don't need any extra linear allocating to do this!
The coalescing will be applied to all characters / glyph sheets up until the last
glyphInfo.nSheets % 32sheets. This means that there are more operations per glyph being done intextGetGlyphPosFromCodePoint, but this is probably offset by the savings from not switching textures as often. And, this won't matter for English text, which has these results cached.