feat(agents-server-ui): show per-response token usage in meta row#4502
feat(agents-server-ui): show per-response token usage in meta row#4502kevin-dp wants to merge 1 commit into
Conversation
Sums input/output tokens across every step of the run and renders them
next to the elapsed-time ticker (e.g. `Thinking · 12s · 1.2k ↑ 412 ↓`).
Counter updates at step boundaries — the LLM SDK only reports `usage`
at end-of-step, so within a single text stream the value stays flat;
tool-using runs see jumps as each step settles.
Token plumbing (additive, no migration):
- `StepValue` Zod + TS gains optional `input_tokens` / `output_tokens`
- `outbound-bridge.ts:onStepEnd` now persists the `tokenInput` /
`tokenOutput` values it was already receiving but dropping
- `IncludesStep` / `EntityTimelineStepItem` and the three step
`.select()` blocks surface the new fields
- The cached `agent_response` section gets a summed `tokens?: { input?,
output? }`, and the section-cache fingerprint includes per-step token
deltas so a late `onStepEnd` invalidates a stale section
❌ 107 Tests Failed:
View the top 3 failed test(s) by shortest run time
View the full list of 17 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
samwillis
left a comment
There was a problem hiding this comment.
Interactive review with GPT.
Thanks for wiring this through. I traced the token usage path end-to-end:
pi-adapter message_end.usage
→ bridge.onStepEnd({ tokenInput, tokenOutput })
→ steps.update({ input_tokens, output_tokens })
→ timeline step rows
→ UI meta row.
Overall the stream write looks sound: token usage is attached to the step completion update, so it appears once at the end of a pure text generation step, and jumps at step boundaries for tool-using runs.
A couple of suggestions/questions:
- Can we compute the per-run total in the timeline query/view model?
Right now AgentResponseLive subscribes to run.steps and sums input_tokens / output_tokens in React, while buildAgentSection separately performs the same aggregation for materialized sections.
If TanStack DB supports this shape cleanly, I think it would be better to expose a single per-run tokens field from createEntityTimelineQuery / the includes query, e.g. via a scoped aggregate over steps for the run. Then the UI only renders run.tokens / section.tokens, and the aggregation logic lives in one layer.
Since usage only lands at step end, not token-by-token, updating the parent run/timeline row at those boundaries seems acceptable to me.
- Missing usage fields currently become real zeroes
In pi-adapter, when msg.usage exists, missing sides are coerced to 0:
...(usage && {
tokenInput: usage.input ?? usage.inputTokens ?? 0,
tokenOutput: usage.output ?? usage.outputTokens ?? 0,
}),Now that these values are persisted and displayed, that can make an unknown side look like a real 0. If pi-ai guarantees both input and output are always present, a small regression test would be useful. Otherwise I’d preserve undefined for missing sides and only write numeric values.
- Test coverage
I’d like to see a targeted regression test that proves token usage reaches the steps.update event, and another around the timeline/view-model total if we move the aggregation there. That would lock down the important stream contract introduced by this PR.
|
👍 yeah just showing total tokens for the run at the bottom makes sense to me |
Summary
Adds a token-usage label to the agent response meta row, e.g.
Thinking · 12s · 1.2k ↑ 412 ↓while streaming and✓ done in 12s · 1.2k ↑ 412 ↓ · 14:18once settled. Counter updates atstep boundaries — for a single-turn LLM call it lands once at done;
for tool-using runs it jumps as each step completes (the LLM SDK only
emits
usageat end-of-step, so we can't tick smoothly betweenstreamed tokens — the elapsed-time ticker still ticks every second
alongside it).
Plumbing
The runtime already had the token data —
pi-adapter.ts:358-359extracts
tokenInput/tokenOutputfrom the provider's per-stepusagepayload — but the bridge silently dropped them beforepersistence. This PR closes that gap and surfaces them all the way to
the UI:
StepValuegains optionalinput_tokens/output_tokenscolumns(Zod + TS). Strictly additive: events recorded before this change
still validate (both fields optional), so no migration is needed.
outbound-bridge.ts:onStepEndnow persists the values it wasalready receiving from
pi-adapter.ts.IncludesStep/EntityTimelineStepItemsurface the new fields,and the three
.select()blocks that materialize step rows includethem.
agent_responsesection grows atokens?: { input?, output? }summed across the run's steps atsection-build time, and the
fingerprintRuncache key includesper-step token deltas so a late-arriving
onStepEndinvalidates astale cached section.
<TokenUsage>component inagents-server-uiwithtabular-numsso digits don't jitter, locale-aware compactformatting via
Intl.NumberFormat. Renders next to<ElapsedTime>in both the live and cached meta rows.
Test plan
pnpm typecheckclean inagents-runtime+agents-server-uipnpm testinagents-server-ui(66 passed)pnpm test outbound-bridge use-chat entity-timelineinagents-runtime(74 passed)agents-runtimetest suite: my branch matches the samepre-existing 401 failures observed on clean
main(unrelatedpermission-system breakage in the test harness, not introduced
by this PR)
jump at each step boundary
Notes
persisted (older
stepsrows lack the columns). Thetokensfieldis conditional on at least one step reporting a number, so those
sections continue to render with no token row instead of "0 / 0".
1.2k ↑ 412 ↓chosen for compactness in the metarow. Open to changing to
1.2k in / 412 outor similar if thearrow direction is unclear — input goes up to the model, output
comes down.
🤖 Generated with Claude Code