Fix LLM callback isolation without serializing requests #4252

VedantMadane · 2026-01-19T19:21:16Z

This is a follow-up to #4218 (auto-closed by bot) addressing the same race in LLM callback handling without holding a global lock across the network call.

What changed

Stop mutating LiteLLM global callback lists for per-request callbacks.
Pass callbacks via the request params ("callbacks") and continue to invoke token usage callbacks from CrewAI response handlers.
Make test_llm_callback_replacement deterministic by mocking litellm.completion (removes sleep/heisenbug).

Why

The approach in #4218 used a class-level lock held across the entire LLM request which can serialize all concurrent agent calls. This keeps concurrency while still ensuring callback isolation.

Fixes #4214.

Note

Ensures LLM callback isolation during concurrent calls by avoiding mutation of LiteLLM global callback lists.

Passes callbacks per request via params["callbacks"] in LLM.call/acall and streaming/non-streaming handlers; removes ad-hoc set_callbacks usage during calls
Uses effective_callbacks (request callbacks or instance-level) without touching global state; forwards them to streaming/non-streaming paths
Keeps env-driven global callbacks via set_env_callbacks() unchanged
Adds tests/llms/test_concurrency.py validating thread-safe isolation; updates test_llm_callback_replacement to mock litellm.completion for determinism
Minor: normalizes message handling for o1 models; preserves token usage logging via callbacks/usage info

^{Written by Cursor Bugbot for commit 31fdc55. This will update automatically on new commits. Configure here.}

VedantMadane · 2026-01-19T20:27:34Z

Not covered in this PR description:

Lock scoping alternative (save previous global callbacks, set new ones, perform request, then restore) and why we avoided it.
Context local callback isolation using contextvars or thread local dispatch.
A true concurrency regression test (multi thread or async) that proves no cross contamination under parallel calls.

If you prefer, I can add a follow up commit that documents these options or adds a concurrency focused test.

Fix LLM callback isolation without global LiteLLM locks

a4c1104

VedantMadane mentioned this pull request Jan 19, 2026

Fix race condition in LLM callback system #4218

Closed

3 tasks

VedantMadane added 2 commits January 20, 2026 23:58

Merge branch 'main' into fix/llm-callbacks-no-global-mutation

baa43a0

test: add concurrency regression test for LLM callback isolation

31fdc55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LLM callback isolation without serializing requests #4252

Fix LLM callback isolation without serializing requests #4252

VedantMadane commented Jan 19, 2026 •

edited by cursor bot

Loading

Uh oh!

VedantMadane commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix LLM callback isolation without serializing requests #4252

Are you sure you want to change the base?

Fix LLM callback isolation without serializing requests #4252

Conversation

VedantMadane commented Jan 19, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

Uh oh!

VedantMadane commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

VedantMadane commented Jan 19, 2026 •

edited by cursor bot

Loading