Skip to content

fix: ensure call_llm spans are always ended in multi-agent scenarios#4717

Open
OiPunk wants to merge 1 commit intogoogle:mainfrom
OiPunk:codex/adk-4715-fix-call-llm-span-loss
Open

fix: ensure call_llm spans are always ended in multi-agent scenarios#4717
OiPunk wants to merge 1 commit intogoogle:mainfrom
OiPunk:codex/adk-4715-fix-call-llm-span-loss

Conversation

@OiPunk
Copy link

@OiPunk OiPunk commented Mar 5, 2026

Summary

Fixes #4715

In multi-agent setups using transfer_to_agent, the call_llm tracing spans for parent agents are created but never exported to OpenTelemetry backends. This happens because _call_llm_with_tracing() in base_llm_flow.py uses tracer.start_as_current_span('call_llm') as a context manager around an async generator that yields responses. When the LLM returns transfer_to_agent, the sub-agent runs (potentially for 10+ seconds), then the async generator is closed, raising GeneratorExit. Inside the OTel context manager's finally block, context.detach(token) raises ValueError (the contextvars token is stale after the async context switch), which prevents span.end() from ever being called. Spans that are never ended are never exported.

This is the same root cause as #501 and #1670 (previously fixed in base_agent.py), but base_llm_flow.py was not updated with the same fix pattern.

Changes

  • src/google/adk/flows/llm_flows/base_llm_flow.py: Replace tracer.start_as_current_span('call_llm') context manager with explicit span lifecycle management (tracer.start_span() + manual context.attach()/context.detach()) wrapped in a try/finally that catches the ValueError from detach() and always calls span.end().

  • tests/unittests/telemetry/test_functional.py: Update the span_exporter test fixture to also monkeypatch start_span (in addition to the existing start_as_current_span), so that the call_llm spans created by the new code path are properly captured by the in-memory span exporter.

Test Plan

  • All 56 telemetry tests pass (tests/unittests/telemetry/)
  • All 357 llm_flows tests pass (tests/unittests/flows/llm_flows/)
  • test_tracer_start_as_current_span functional test validates that call_llm spans are correctly exported
  • test_exception_preserves_attributes confirms span attributes are preserved on errors

Replace `tracer.start_as_current_span('call_llm')` context manager with
explicit span lifecycle management in `_call_llm_with_tracing()`.

In multi-agent setups using `transfer_to_agent`, the async generator
receives `GeneratorExit` after the sub-agent completes execution. At
that point, the OTel context manager's `finally` block calls
`context.detach(token)` which raises `ValueError` because the
contextvars token became stale during the async context switch. This
exception prevents `span.end()` from ever being reached, so the span
is never exported to trace backends.

The fix uses `tracer.start_span()` + manual `context.attach()`/
`context.detach()` with a `try/finally` that catches the `ValueError`
from `detach()` and always calls `span.end()`.

Fixes google#4715
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in OpenTelemetry tracing for multi-agent setups, specifically ensuring that call_llm spans are reliably ended and exported. The change transitions from context manager-based span handling to explicit lifecycle management, preventing tracing data loss caused by asynchronous context switches and generator exits. This significantly improves the observability of multi-agent interactions by providing complete tracing information.

Highlights

  • OpenTelemetry Span Export Fix: Resolved an issue where call_llm tracing spans were not being exported in multi-agent scenarios due to ValueError during context.detach() when an async generator received GeneratorExit.
  • Explicit Span Management: Replaced the tracer.start_as_current_span context manager with explicit tracer.start_span, otel_context.attach, otel_context.detach, and span.end() calls, wrapped in a try/finally block to guarantee span closure.
  • Test Fixture Update: Updated the span_exporter test fixture to monkeypatch tracer.start_span in addition to tracer.start_as_current_span to correctly capture spans in telemetry tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/google/adk/flows/llm_flows/base_llm_flow.py
    • Imported opentelemetry.context and opentelemetry.trace modules.
    • Refactored _call_llm_with_tracing to use explicit OpenTelemetry span management instead of a context manager.
    • Implemented a try/finally block to ensure span.end() is always called, even if otel_context.detach() raises a ValueError.
  • tests/unittests/telemetry/test_functional.py
    • Modified the span_exporter test fixture to monkeypatch tracer.start_span for proper span capture.
Activity
  • All 56 telemetry tests passed.
  • All 357 llm_flows tests passed.
  • The test_tracer_start_as_current_span functional test validated correct call_llm span export.
  • The test_exception_preserves_attributes test confirmed span attributes are preserved on errors.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the tracing [Component] This issue is related to OpenTelemetry tracing label Mar 5, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where OpenTelemetry spans for call_llm were not being exported in multi-agent scenarios involving transfer_to_agent. The root cause was correctly identified as an unhandled ValueError during context detachment in an async generator, which prevented the span from being properly ended. The fix replaces the tracer.start_as_current_span context manager with explicit span lifecycle management, wrapping the context detachment in a try/except block to ensure span.end() is always called. The changes are correct and effectively resolve the issue. The corresponding test updates are also appropriate to ensure the fix is validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tracing [Component] This issue is related to OpenTelemetry tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] call_llm spans not exported in multi-agent setups due to GeneratorExit breaking span.end() in _call_llm_async

2 participants