Skip to content

feat(agents-server-ui): stream model reasoning into the UI#4508

Open
kevin-dp wants to merge 2 commits into
mainfrom
kevin/reasoning-content
Open

feat(agents-server-ui): stream model reasoning into the UI#4508
kevin-dp wants to merge 2 commits into
mainfrom
kevin/reasoning-content

Conversation

@kevin-dp
Copy link
Copy Markdown
Contributor

@kevin-dp kevin-dp commented Jun 4, 2026

Summary

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing Thinking shimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to ▸ Thought for 12s — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.

Implementation (end-to-end)

  • Schemareasoning row gains run_id, encrypted (Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), and summary_title (extracted at write time). New reasoningDeltas collection mirrors textDeltas. Strictly additive.
  • BridgeOutboundBridge gains onReasoningStart / onReasoningDelta / onReasoningEnd, parallel to the text path. Reasoning counter added to OutboundIdSeed.
  • Adapterpi-adapter.ts routes pi-ai's thinking_start / thinking_delta / thinking_end events to the bridge. Parses a **Title**\n\n<body> heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles late thinking_delta without a preceding thinking_start, and closes an open reasoning row on message_end (e.g. provider abort).
  • Timeline — Live reasoning: Collection<EntityTimelineReasoningItem> on EntityTimelineRunRow, content built via the same delta-join pattern as EntityTimelineTextItem.content.
  • UI — New <ReasoningSection> renders above items in AgentResponseLive:
    • Live: faded markdown via Streamdown with ThinkingIndicator heading + summary title + elapsed-time ticker
    • Settled: ▸ Thought for Ns with click-to-expand. Closure duration snapshotted from Date.now() - timestamp using the same sawStreamingRef trick from the elapsed-time PR — accurate for in-session settles, stays a bare Thought for rows already settled on first mount (no real end timestamp available client-side).
    • Redacted: Anthropic safety-filter payloads render ⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.

Reference

Patterns informed by reading OpenCode's reasoning implementation:

  • 3-event streaming protocol (reasoning-start / reasoning-delta / reasoning-end)
  • ReasoningPart storage shape including encrypted for Anthropic round-trip
  • reasoningSummary() headline parser (5-line regex, OpenAI Responses only)
  • Collapsed-by-default UX with click-to-expand

Test plan

  • pnpm typecheck clean in agents-runtime + agents-server-ui
  • pnpm test outbound-bridge pi-adapter entity-timeline in agents-runtime (95 passed: 18 bridge + 21 adapter + 56 timeline)
  • pnpm test in agents-server-ui (66 passed)
  • pnpm -C packages/agents-runtime build — dist artifacts emit cleanly
  • Manual: prompt Anthropic Claude with extended-thinking enabled; verify streaming reasoning appears faded above the answer with elapsed ticker, then collapses to Thought for Ns on settle
  • Manual: multi-step tool-using turn; verify each step's reasoning renders as a separate collapsible row

Notes

  • Cached AgentResponse (the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.
  • The pre-existing runtime-dsl.test.ts 401 failures (and dispatch-policy-routing.test.ts 500 failures) reproduce identically on clean main and were not introduced by this PR.

🤖 Generated with Claude Code

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1
reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent
response now shows the reasoning text faded above the answer, with the
existing `Thinking` shimmer heading + elapsed-time ticker. Once the
reasoning settles, it collapses to `▸ Thought for 12s` — click to
expand. Multiple reasoning rows per run render independently in order
(one per LLM step in tool-using turns).

End-to-end plumbing:

- Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic
  redacted blocks must round-trip back to the model), and
  `summary_title` (extracted at write time). New `reasoningDeltas`
  collection mirrors `textDeltas` for streamed content.
- Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta`
  / `onReasoningEnd`, parallel to text.
- Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` /
  `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading
  once at write time (OpenAI Responses; no-op for others).
- Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>`
  on `EntityTimelineRunRow`, content built via delta-join.
- UI: new `<ReasoningSection>` renders above items in
  `AgentResponseLive`. Streamdown body, click-to-expand on settle,
  redacted-block placeholder for opaque Anthropic payloads.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Electric Agents Desktop Builds

Build artifacts for commit a2d56c3.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
1064 3 1061 41
View the top 2 failed test(s) by shortest run time
test/process-wake.test.ts > processWake > applies SIGINT that arrives before the handler run controller is created
Stack Traces | 0.123s run time
TypeError: Cannot read properties of undefined (reading 'toArray')
 ❯ Module.loadOutboundIdSeed src/outbound-bridge.ts:70:46
 ❯ runAgent src/context-factory.ts:554:33
 ❯ Object.handler test/process-wake.test.ts:1068:9
 ❯ Module.processWake src/process-wake.ts:2074:9
 ❯ test/process-wake.test.ts:1090:5
test/process-wake.test.ts > processWake > aborts an active run for server-handled SIGKILL without rewriting the signal
Stack Traces | 5s run time
Error: Test timed out in 5000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".
 ❯ test/process-wake.test.ts:1184:3
View the full list of 1 ❄️ flaky test(s)
test/process-wake.test.ts > processWake > hydrates dynamic event source wake rows into the agent trigger message

Flake rate in main: 66.67% (Passed 2 times, Failed 4 times)

Stack Traces | 0.0203s run time
TypeError: Cannot read properties of undefined (reading 'toArray')
 ❯ Module.loadOutboundIdSeed src/outbound-bridge.ts:70:46
 ❯ Object.run src/context-factory.ts:608:17
 ❯ Object.handler test/process-wake.test.ts:1712:9
 ❯ Module.processWake src/process-wake.ts:2074:9
 ❯ test/process-wake.test.ts:1716:5

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Electric Agents Mobile Build

Local mobile checks ran for commit a2d56c3.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.

Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject

  thinking: { type: "enabled", budget_tokens: <by effort> }

into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.
Copy link
Copy Markdown
Contributor

@KyleAMathews KyleAMathews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely! Could you add a screenshot of the UI to the PR body?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants