feat(ui): smooth + cheap agent streaming (adaptive reveal + block-split markdown)#2716
feat(ui): smooth + cheap agent streaming (adaptive reveal + block-split markdown)#2716KarloAldrete wants to merge 5 commits into
Conversation
Each streamed token re-parsed the entire accumulated markdown of the active agent message (react-markdown + remark-gfm) — O(n) per token, O(n^2) per message. Past ~8k chars a single token costs ~19ms of parsing, over the 16.6ms frame budget, so long answers stutter and saturate the main thread. StreamingMarkdown splits the active message into top-level blocks (fenced code kept intact). Completed blocks keep a stable string so the memoized MarkdownRenderer skips them and only the growing tail re-parses; an open code fence renders as plain text until it closes, so syntax highlighting runs once rather than per token. On a 3k-char answer this is ~10x less markdown CPU (878ms -> 88ms total; 3.5ms -> 0.35ms per token) and the per-token cost stays flat instead of growing with message length. Completed messages still render via a single full MarkdownRenderer parse, so output is byte-identical — no visual change, only less work per token. Complements PostHog#2685 (smooth token reveal): that PR eases when text appears, this one keeps rendering it cheap so the smooth reveal stays at 60fps even on long messages. Generated-By: PostHog Code Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
Prompt To Fix All With AIFix the following 3 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 3
packages/ui/src/features/editor/components/StreamingMarkdown.tsx:16-22
**`parseOpenFence` targets the first fence, not the unterminated one**
`/.exec(block)` finds the first fence marker in the block. If the tail block contains a completed fence followed by an open fence with no blank line between them — e.g. `` ```ts\ncode\n```\ntext\n```ts\npartial `` (no blank lines, so `splitMarkdownBlocks` keeps it as one block) — `before` becomes `""` and `code` absorbs everything from the first fence opener onward, including the intermediate closing fence and second opening marker. The result is garbled plain-text output during streaming (it self-corrects once the fence closes and `MarkdownRenderer` takes over, but the in-progress display is wrong).
A fix is to scan all lines with the same fence-tracking logic as `hasOpenCodeFence`, record the last fence-open line index, then slice `before` and `code` relative to that line.
### Issue 2 of 3
packages/ui/src/features/editor/components/splitMarkdownBlocks.ts:59-74
**Fence-tracking logic duplicated between the two exported functions**
`hasOpenCodeFence` re-implements the identical per-line regex match + `inFence`/`fenceChar` state machine that lives inside `splitMarkdownBlocks`. Extracting a shared helper (e.g. `parseFenceState(lines)` that returns `{ inFence, fenceChar }`) would satisfy the codebase's OnceAndOnlyOnce rule and make future spec-compliance fixes (e.g. requiring the closing fence to be at least as long as the opening fence) apply to both functions automatically.
### Issue 3 of 3
packages/ui/src/features/editor/components/splitMarkdownBlocks.test.ts:5-44
**Tests prefer parameterised style over manual loops and multi-assertion cases**
The "never drops text" case iterates over samples with a `for` loop, and the `hasOpenCodeFence` suite packs four independent assertions into one `it`. Vitest's `it.each` / `test.each` makes each sample a separate, named test entry — failures name the exact input and expected output, which saves time when something breaks. Converting both suites to parameterised form would align with the project's stated preference.
Reviews (1): Last reviewed commit: "perf(markdown): re-parse only the tail b..." | Re-trigger Greptile |
…fence, it.each - Extract a single stepFence state machine shared by splitMarkdownBlocks, hasOpenCodeFence and parseOpenFence (OnceAndOnlyOnce — no duplicated fence-tracking logic). - parseOpenFence now targets the LAST unterminated fence, so a completed fence earlier in the same block stays in `before` and renders normally instead of being swallowed as plain text mid-stream. - Parameterize the splitter tests with it.each and add parseOpenFence cases, including the completed-then-open fence edge case. Generated-By: PostHog Code Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
|
Thanks for the review! Addressed all three in 3f6bc14:
|
Smooth the reveal of streamed tokens on top of the block-split renderer. useSmoothedText eases the displayed prefix toward the accumulated text via requestAnimationFrame with a step proportional to the backlog, so it tracks a fast token stream (catching up in ~6 frames) instead of a fixed chars/sec that lags behind and then snaps. It reveals on word boundaries so partial markdown tokens never flash, and shows completed (non-streaming) messages in full immediately. Pairs with the block-split renderer: only the growing tail re-parses each frame, so the reveal holds ~60fps even on long messages — which a fixed-rate reveal over a full markdown re-parse can't. Generated-By: PostHog Code Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
Resolve AgentMessage.tsx: keep main's Box className (drops py-1) together with the streaming changes (StreamingMarkdown + adaptive reveal). Generated-By: PostHog Code Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
|
Hey @KarloAldrete, the streaming in #2685 is much smoother and aligned with the expectation I had for how this feels. Anyway to adjust that here? Thanks for your contribution! |
…review) @charlesvien preferred the steadier reveal in PostHog#2685. Switch useSmoothedText from an adaptive (backlog-proportional) rate to a constant ~120 chars/sec, character by character (dropping the word-boundary stepping), and honor prefers-reduced-motion — so the cadence reads as even typing. The block-split renderer is unchanged and still keeps the per-token markdown cost flat on long messages. Generated-By: PostHog Code Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
|
Thanks @charlesvien! Switched the reveal in c51416c to match #2685's cadence — a constant ~120 chars/sec, character by character (dropped the adaptive, backlog-proportional stepping and the word-boundary snapping that made ours feel jumpier), and it now honors The block-split renderer is unchanged, so the steady reveal stays ~60fps even on long messages (only the growing tail re-parses each frame). Let me know how it feels! |
Is the performance still decent from your testing? |
|
Yep — perf is the whole point of this PR, and the cadence change didn't touch it (the reveal rate and the block-split renderer are independent). Two angles: 1. Measured in real Chromium (Playwright, client-rendered into a real DOM, on a Ryzen 9 5900X) — total main-thread CPU to stream a full message token by token:
Full re-parse is ~O(n²) (re-renders the whole message every token); block-split is ~O(n) (only the growing tail re-parses), so the longer the answer the bigger the win — an 8k-char answer drops from ~1.4 s of main-thread CPU to ~80 ms. Default components here, so it's a lower bound (the app also re-runs syntax highlighting per token, which block-split avoids too). 2. I've been dogfooding it as my daily driver. I'm in a single conversation that's ~370K tokens deep right now and it still hasn't stuttered or felt janky — the perf issues I used to hit are gone. That's honestly why I kept iterating on it. And if the cadence still doesn't feel right to you, I'm happy to keep tuning and digging into it — just say the word. Thanks for the thoughtful review! |
Problem
Streamed agent replies have two issues:
AgentMessage→MarkdownRendererre-parses the entire accumulated markdown (react-markdown + remark-gfm) on every token: O(n) per token, O(n²) per message. Past ~8k chars a single token costs ~19 ms of parsing, over the 16.6 ms frame budget, so long answers stutter and saturate the main thread.Addresses #2517 (smooth token streaming).
Changes
Two pieces that reinforce each other.
1. Block-split rendering (
StreamingMarkdown+splitMarkdownBlocks) — the active message is split into top-level blocks (fenced code kept intact). Completed blocks keep a stable string, so the memoizedMarkdownRendererskips them and only the growing tail re-parses. An open code fence renders as plain text until it closes, so syntax highlighting runs once, not per token.Measured on a 3,000-char answer streamed in ~250 tokens (jsdom +
react-dom/server, default components — a lower bound):~10× less, and the per-token cost stays flat instead of growing with length.
2. Steady reveal (
useSmoothedText) — reveals the accumulated text at a constant ~120 chars/sec, character by character, honoringprefers-reduced-motion. Decouples paint from the bursty token arrival so the text reads as even typing.The two are complementary: the reveal makes streaming look smooth, and the block-split keeps each frame cheap so it stays ~60 fps even on long messages. Completed messages render via a single full
MarkdownRendererparse, so the final output is byte-identical.Relationship to #2685
#2685 also implements smooth token streaming. Per @charlesvien's review, this PR's reveal now matches the steadier cadence from #2685 (constant ~120 chars/sec, char by char). The distinct value here is the block-split renderer: it keeps the per-token markdown cost flat so the smooth reveal holds up on long messages — the part #2685 doesn't address. Happy to coordinate or fold these together however the team prefers.
How did you test this?
splitMarkdownBlocks.test.ts(round-trip / no text dropped, paragraph + fenced-block splitting,parseOpenFenceincl. the completed-then-open-fence edge case) anduseSmoothedText.test.ts(the purenextRevealLengtheasing + the hook: immediate on mount, gradual steady reveal, snap-on-replace, reduced-motion).pnpm --filter @posthog/ui test→ 779 pass.pnpm --filter @posthog/ui typecheckpasses; pre-commitpnpm typecheckpasses.biome checkclean on changed files;node scripts/check-host-boundaries.mjsreports no new violations.Automatic notifications
Created with PostHog Code