feat(agents-server-ui): stream model reasoning into the UI#4508
Open
kevin-dp wants to merge 2 commits into
Open
feat(agents-server-ui): stream model reasoning into the UI#4508kevin-dp wants to merge 2 commits into
kevin-dp wants to merge 2 commits into
Conversation
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing `Thinking` shimmer heading + elapsed-time ticker. Once the reasoning settles, it collapses to `▸ Thought for 12s` — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). End-to-end plumbing: - Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic redacted blocks must round-trip back to the model), and `summary_title` (extracted at write time). New `reasoningDeltas` collection mirrors `textDeltas` for streamed content. - Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta` / `onReasoningEnd`, parallel to text. - Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` / `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading once at write time (OpenAI Responses; no-op for others). - Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>` on `EntityTimelineRunRow`, content built via delta-join. - UI: new `<ReasoningSection>` renders above items in `AgentResponseLive`. Streamdown body, click-to-expand on settle, redacted-block placeholder for opaque Anthropic payloads.
Contributor
❌ 3 Tests Failed:
View the top 2 failed test(s) by shortest run time
View the full list of 1 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Contributor
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.
Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject
thinking: { type: "enabled", budget_tokens: <by effort> }
into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.
KyleAMathews
approved these changes
Jun 4, 2026
Contributor
KyleAMathews
left a comment
There was a problem hiding this comment.
Lovely! Could you add a screenshot of the UI to the PR body?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing
Thinkingshimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to▸ Thought for 12s— click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.Implementation (end-to-end)
reasoningrow gainsrun_id,encrypted(Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), andsummary_title(extracted at write time). NewreasoningDeltascollection mirrorstextDeltas. Strictly additive.OutboundBridgegainsonReasoningStart/onReasoningDelta/onReasoningEnd, parallel to the text path. Reasoning counter added toOutboundIdSeed.pi-adapter.tsroutes pi-ai'sthinking_start/thinking_delta/thinking_endevents to the bridge. Parses a**Title**\n\n<body>heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles latethinking_deltawithout a precedingthinking_start, and closes an open reasoning row onmessage_end(e.g. provider abort).reasoning: Collection<EntityTimelineReasoningItem>onEntityTimelineRunRow, content built via the same delta-join pattern asEntityTimelineTextItem.content.<ReasoningSection>renders above items inAgentResponseLive:StreamdownwithThinkingIndicatorheading + summary title + elapsed-time ticker▸ Thought for Nswith click-to-expand. Closure duration snapshotted fromDate.now() - timestampusing the samesawStreamingReftrick from the elapsed-time PR — accurate for in-session settles, stays a bareThoughtfor rows already settled on first mount (no real end timestamp available client-side).⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.Reference
Patterns informed by reading OpenCode's reasoning implementation:
reasoning-start/reasoning-delta/reasoning-end)ReasoningPartstorage shape includingencryptedfor Anthropic round-tripreasoningSummary()headline parser (5-line regex, OpenAI Responses only)Test plan
pnpm typecheckclean inagents-runtime+agents-server-uipnpm test outbound-bridge pi-adapter entity-timelineinagents-runtime(95 passed: 18 bridge + 21 adapter + 56 timeline)pnpm testinagents-server-ui(66 passed)pnpm -C packages/agents-runtime build— dist artifacts emit cleanlyThought for Nson settleNotes
AgentResponse(the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.runtime-dsl.test.ts401 failures (anddispatch-policy-routing.test.ts500 failures) reproduce identically on cleanmainand were not introduced by this PR.🤖 Generated with Claude Code