Skip to content

fix(llm): make set_callbacks thread-safe for concurrent crews#6163

Open
hp-8 wants to merge 1 commit into
crewAIInc:mainfrom
hp-8:fix/set-callbacks-thread-safety
Open

fix(llm): make set_callbacks thread-safe for concurrent crews#6163
hp-8 wants to merge 1 commit into
crewAIInc:mainfrom
hp-8:fix/set-callbacks-thread-safety

Conversation

@hp-8

@hp-8 hp-8 commented Jun 14, 2026

Copy link
Copy Markdown

What

Make LLM.set_callbacks (and set_env_callbacks) thread-safe so concurrent crews don't crash during setup.

Why

When several crews are launched with kickoff_async() and asyncio.gather(), each ends up in asyncio.to_thread(), so kickoff() runs on real OS threads concurrently. During LLM setup they all mutate litellm's process-global success_callback / _async_success_callback lists.

The old code did a read-modify-write with list.remove():

for callback in litellm.success_callback[:]:
    if type(callback) in callback_types:
        litellm.success_callback.remove(callback)

Two threads can snapshot the same entry, both call .remove(), and the second raises:

ValueError: list.remove(x): x not in list

Reported in #2939.

Fix

  • Add a module-level _CALLBACK_LOCK and guard the critical sections of both set_callbacks and set_env_callbacks.
  • Replace the .remove() loop with in-place slice assignment that filters by type. It mutates the same global list object (litellm keeps a reference), is atomic under the GIL, and can never raise ValueError even if another thread already removed the entry.

Behaviour is unchanged for the single-threaded path: the same callbacks are removed.

Test

Added test_set_callbacks_is_thread_safe, which hammers set_callbacks from 24 threads with a lowered sys.setswitchinterval to force interleaving. It fails on main (raises the ValueError) and passes with this fix.

uv run --package crewai pytest lib/crewai/tests/test_llm.py::test_set_callbacks_is_thread_safe

Ruff + strict mypy pass on the changed files.

Fixes #2939


Disclosure: this PR was prepared with the help of an AI coding assistant (Claude Code), so it carries the llm-generated label as required by CONTRIBUTING.md. (If I lack permission to set the label, please add it.)

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced thread safety for LLM callback management to prevent race conditions when callbacks are updated concurrently across multiple threads.

Concurrent crews launched via kickoff_async() dispatch to threads that all
mutate litellm's process-global success_callback lists during setup. The
read-modify-write used list.remove(), which races and raises
"ValueError: list.remove(x): x not in list" when two threads remove the
same entry.

Guard both set_callbacks and set_env_callbacks with a module-level lock and
replace the .remove() loop with in-place slice assignment, which is atomic
under the GIL and cannot raise. Add a regression test that reproduces the
race under a lowered thread-switch interval.

Fixes the crash reported in discussion crewAIInc#2939.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Harsh Patadia <patadiaharsh.8@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c1a28fc3-8c99-4dd4-9bfe-8047beee6f4a

📥 Commits

Reviewing files that changed from the base of the PR and between bb477f8 and be22811.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/llm.py
  • lib/crewai/tests/test_llm.py

📝 Walkthrough

Walkthrough

A module-level threading.Lock (_CALLBACK_LOCK) is introduced in llm.py to protect mutations of the process-global LiteLLM callback lists. LLM.set_callbacks() is rewritten to use lock-protected slice assignment with type-based filtering instead of per-element .remove() calls, and LLM.set_env_callbacks() wraps its assignment block in the same lock. A new regression test exercises both methods under high thread contention.

Changes

Thread-safe LiteLLM callback management

Layer / File(s) Summary
Lock definition and thread-safe callback updates
lib/crewai/src/crewai/llm.py
Adds threading import and module-level _CALLBACK_LOCK. Rewrites set_callbacks() to acquire the lock and update litellm.success_callback / litellm._async_success_callback via slice assignment with type-based filtering, replacing the prior .remove() loop. Wraps set_env_callbacks() callback assignment in the same lock.
Concurrency regression test
lib/crewai/tests/test_llm.py
Adds sys and threading imports. Adds test_set_callbacks_is_thread_safe, which spawns multiple threads coordinated by a threading.Barrier with a reduced sys.setswitchinterval to force frequent context switches, calling LLM.set_callbacks repeatedly and asserting no exception is raised.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 A race in the callbacks, oh what a fright,
Threads tugging at lists in the dead of the night!
Now a lock stands guard with a click and a clack,
Slice-assignment replaces the old .remove() hack.
No ValueError shall wake me from my burrow — hooray! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: making set_callbacks thread-safe for concurrent crews, which is the core objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant