fix: ensure call_llm spans are always ended in multi-agent scenarios#4717
fix: ensure call_llm spans are always ended in multi-agent scenarios#4717OiPunk wants to merge 1 commit intogoogle:mainfrom
Conversation
Replace `tracer.start_as_current_span('call_llm')` context manager with
explicit span lifecycle management in `_call_llm_with_tracing()`.
In multi-agent setups using `transfer_to_agent`, the async generator
receives `GeneratorExit` after the sub-agent completes execution. At
that point, the OTel context manager's `finally` block calls
`context.detach(token)` which raises `ValueError` because the
contextvars token became stale during the async context switch. This
exception prevents `span.end()` from ever being reached, so the span
is never exported to trace backends.
The fix uses `tracer.start_span()` + manual `context.attach()`/
`context.detach()` with a `try/finally` that catches the `ValueError`
from `detach()` and always calls `span.end()`.
Fixes google#4715
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in OpenTelemetry tracing for multi-agent setups, specifically ensuring that Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where OpenTelemetry spans for call_llm were not being exported in multi-agent scenarios involving transfer_to_agent. The root cause was correctly identified as an unhandled ValueError during context detachment in an async generator, which prevented the span from being properly ended. The fix replaces the tracer.start_as_current_span context manager with explicit span lifecycle management, wrapping the context detachment in a try/except block to ensure span.end() is always called. The changes are correct and effectively resolve the issue. The corresponding test updates are also appropriate to ensure the fix is validated.
Summary
Fixes #4715
In multi-agent setups using
transfer_to_agent, thecall_llmtracing spans for parent agents are created but never exported to OpenTelemetry backends. This happens because_call_llm_with_tracing()inbase_llm_flow.pyusestracer.start_as_current_span('call_llm')as a context manager around an async generator that yields responses. When the LLM returnstransfer_to_agent, the sub-agent runs (potentially for 10+ seconds), then the async generator is closed, raisingGeneratorExit. Inside the OTel context manager'sfinallyblock,context.detach(token)raisesValueError(the contextvars token is stale after the async context switch), which preventsspan.end()from ever being called. Spans that are never ended are never exported.This is the same root cause as #501 and #1670 (previously fixed in
base_agent.py), butbase_llm_flow.pywas not updated with the same fix pattern.Changes
src/google/adk/flows/llm_flows/base_llm_flow.py: Replacetracer.start_as_current_span('call_llm')context manager with explicit span lifecycle management (tracer.start_span()+ manualcontext.attach()/context.detach()) wrapped in atry/finallythat catches theValueErrorfromdetach()and always callsspan.end().tests/unittests/telemetry/test_functional.py: Update thespan_exportertest fixture to also monkeypatchstart_span(in addition to the existingstart_as_current_span), so that thecall_llmspans created by the new code path are properly captured by the in-memory span exporter.Test Plan
tests/unittests/telemetry/)tests/unittests/flows/llm_flows/)test_tracer_start_as_current_spanfunctional test validates thatcall_llmspans are correctly exportedtest_exception_preserves_attributesconfirms span attributes are preserved on errors