otel: emit reasoning_tokens on invoke_agent root span for foreground agent#319469
Merged
Conversation
…agent Reasoning/thinking tokens are already populated on every chat span by chatMLFetcher (dual-emit of gen_ai.usage.reasoning_tokens and gen_ai.usage.reasoning.output_tokens), but the invoke_agent root span built by toolCallingLoop only aggregated input/output/cache_read/cache_creation totals. As a result App Insights queries that read parent-only token totals off the invoke_agent span were missing reasoning entirely. Accumulate response.usage.completion_tokens_details.reasoning_tokens across all turns (same shape chatMLFetcherTelemetry.ts already consumes for 1DS telemetry) and dual-emit it onto the span when non-zero. ATIF / Kusto reports already read this from chat spans via transformSqliteToAtif (vscode-copilot-evaluation), so this is purely an App-Insights-side parity fix for the foreground agent. vscode-claude has the same gap on its invoke_agent root span. Documented the limitation on _accumulateParentTokenUsage: the Anthropic SDK Usage type does not surface reasoning/thinking tokens separately, so fixing the Claude case requires bridging the count from ClaudeStreamingPassThroughEndpoint into ClaudeOTelTracker and is left as a follow-up.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Copilot’s OpenTelemetry (OTel) root invoke_agent span for the foreground agent so that it includes aggregated reasoning/thinking token totals, matching what is already emitted on per-turn chat spans. This closes a telemetry parity gap for App Insights dashboards that query only the parent/root span.
Changes:
- Accumulates
response.usage.completion_tokens_details.reasoning_tokensacross all turns intoolCallingLoop.ts. - Conditionally dual-emits aggregated reasoning totals on the root span under both
gen_ai.usage.reasoning_tokens(legacy) andgen_ai.usage.reasoning.output_tokens(semconv-aligned) when non-zero. - Documents why the equivalent Claude root span cannot currently emit reasoning tokens in
claudeOTelTracker.ts.
Show a summary per file
| File | Description |
|---|---|
| extensions/copilot/src/extension/intents/node/toolCallingLoop.ts | Aggregates and emits reasoning token usage on the foreground agent’s invoke_agent root span. |
| extensions/copilot/src/extension/chatSessions/claude/node/claudeOTelTracker.ts | Adds inline documentation explaining the Claude reasoning-token aggregation gap and pointing to the foreground pattern. |
Copilot's findings
- Files reviewed: 2/2 changed files
- Comments generated: 1
Yoyokrazy
approved these changes
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
chatMLFetcheralready setsgen_ai.usage.reasoning.output_tokenson everychatspan, but theinvoke_agentroot span intoolCallingLoopdidn't aggregate it. App Insights queries reading parent-only totals were missing reasoning.response.usage.completion_tokens_details.reasoning_tokensacross turns.output_tokens, not an addition.vscode-claudehas the same gap; Anthropic SDKUsagedoesn't expose thinking tokens, so it needs a bridge fromClaudeStreamingPassThroughEndpoint. Inline TODO left.