Skip to content

Conversation

@VedantMadane
Copy link

@VedantMadane VedantMadane commented Jan 19, 2026

This is a follow-up to #4218 (auto-closed by bot) addressing the same race in LLM callback handling without holding a global lock across the network call.

What changed

  • Stop mutating LiteLLM global callback lists for per-request callbacks.
  • Pass callbacks via the request params ("callbacks") and continue to invoke token usage callbacks from CrewAI response handlers.
  • Make test_llm_callback_replacement deterministic by mocking litellm.completion (removes sleep/heisenbug).

Why

The approach in #4218 used a class-level lock held across the entire LLM request which can serialize all concurrent agent calls. This keeps concurrency while still ensuring callback isolation.

Fixes #4214.


Note

Ensures LLM callback isolation during concurrent calls by avoiding mutation of LiteLLM global callback lists.

  • Passes callbacks per request via params["callbacks"] in LLM.call/acall and streaming/non-streaming handlers; removes ad-hoc set_callbacks usage during calls
  • Uses effective_callbacks (request callbacks or instance-level) without touching global state; forwards them to streaming/non-streaming paths
  • Keeps env-driven global callbacks via set_env_callbacks() unchanged
  • Adds tests/llms/test_concurrency.py validating thread-safe isolation; updates test_llm_callback_replacement to mock litellm.completion for determinism
  • Minor: normalizes message handling for o1 models; preserves token usage logging via callbacks/usage info

Written by Cursor Bugbot for commit 31fdc55. This will update automatically on new commits. Configure here.

@VedantMadane
Copy link
Author

Not covered in this PR description:

  1. Lock scoping alternative (save previous global callbacks, set new ones, perform request, then restore) and why we avoided it.
  2. Context local callback isolation using contextvars or thread local dispatch.
  3. A true concurrency regression test (multi thread or async) that proves no cross contamination under parallel calls.

If you prefer, I can add a follow up commit that documents these options or adds a concurrency focused test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Hidden race condition in LLM callback system causing test failures

1 participant