fix(reasoning): detect short-form "READY" when concluding a plan#6206
fix(reasoning): detect short-form "READY" when concluding a plan#6206i-anubhav-anand wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughReplaces hardcoded substring checks for agent readiness with a new ChangesReadiness Detection Refactor
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/utilities/reasoning_handler.py`:
- Around line 596-602: The full-sentence READY match check currently scans the
entire response string, allowing the text "READY: I am ready to execute the
task." appearing anywhere in the response body or examples to immediately return
True and bypass the conclusion line evaluation. Move the full-sentence READY
match check to only apply to the conclusion variable (the final stripped line
being evaluated in the reversed loop), not to the entire response, so that
intermediate text in the response does not override the actual final conclusion
determination. This ensures that if the final conclusion line is "NOT READY",
the function will correctly return False instead of being short-circuited by an
earlier mention of the full sentence elsewhere in the response.
In `@lib/crewai/tests/utilities/test_structured_planning.py`:
- Around line 732-735: The test suite for AgentReasoning._check_ready is missing
a regression test for the case where a full READY sentence appears in the body
or middle of the response, but the final conclusion should still be NOT READY.
Add a new test method (similar to test_full_sentence_not_ready) that creates a
response containing the full READY sentence in the body followed by a NOT READY
conclusion, then verify that AgentReasoning._check_ready correctly returns False
for this case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: af060d2d-276f-44d6-aff9-9e5ffae393af
📒 Files selected for processing (2)
lib/crewai/src/crewai/utilities/reasoning_handler.pylib/crewai/tests/utilities/test_structured_planning.py
| if "READY: I am ready to execute the task." in response: | ||
| return True | ||
| for line in reversed(response.splitlines()): | ||
| conclusion = line.strip().strip("*`#_-.!:> ").upper() | ||
| if not conclusion: | ||
| continue | ||
| return conclusion == "READY" |
There was a problem hiding this comment.
Restrict the full-sentence READY match to the conclusion line.
Line 596 scans the entire response, so a body/example line containing READY: I am ready to execute the task. overrides a final NOT READY conclusion and stops refinement too early.
🐛 Proposed fix
- if "READY: I am ready to execute the task." in response:
- return True
for line in reversed(response.splitlines()):
conclusion = line.strip().strip("*`#_-.!:> ").upper()
if not conclusion:
continue
- return conclusion == "READY"
+ return conclusion in {
+ "READY",
+ "READY: I AM READY TO EXECUTE THE TASK",
+ }
return False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/utilities/reasoning_handler.py` around lines 596 - 602,
The full-sentence READY match check currently scans the entire response string,
allowing the text "READY: I am ready to execute the task." appearing anywhere in
the response body or examples to immediately return True and bypass the
conclusion line evaluation. Move the full-sentence READY match check to only
apply to the conclusion variable (the final stripped line being evaluated in the
reversed loop), not to the entire response, so that intermediate text in the
response does not override the actual final conclusion determination. This
ensures that if the final conclusion line is "NOT READY", the function will
correctly return False instead of being short-circuited by an earlier mention of
the full sentence elsewhere in the response.
| def test_full_sentence_not_ready(self): | ||
| """Full-sentence NOT READY is correctly treated as not ready.""" | ||
| response = "My plan.\n\nNOT READY: I need to refine because the tools are unclear." | ||
| assert AgentReasoning._check_ready(response) is False |
There was a problem hiding this comment.
Add a regression for full-sentence READY in the body.
This suite misses the case that currently fails: a body/example line contains the full READY sentence, but the final conclusion is NOT READY.
🧪 Proposed test
def test_full_sentence_not_ready(self):
"""Full-sentence NOT READY is correctly treated as not ready."""
response = "My plan.\n\nNOT READY: I need to refine because the tools are unclear."
assert AgentReasoning._check_ready(response) is False
+
+ def test_full_sentence_ready_in_body_with_not_ready_conclusion(self):
+ """Full-sentence READY in the body must not override final NOT READY."""
+ response = (
+ "The prompt says to answer READY: I am ready to execute the task.\n\n"
+ "NOT READY"
+ )
+ assert AgentReasoning._check_ready(response) is False📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_full_sentence_not_ready(self): | |
| """Full-sentence NOT READY is correctly treated as not ready.""" | |
| response = "My plan.\n\nNOT READY: I need to refine because the tools are unclear." | |
| assert AgentReasoning._check_ready(response) is False | |
| def test_full_sentence_not_ready(self): | |
| """Full-sentence NOT READY is correctly treated as not ready.""" | |
| response = "My plan.\n\nNOT READY: I need to refine because the tools are unclear." | |
| assert AgentReasoning._check_ready(response) is False | |
| def test_full_sentence_ready_in_body_with_not_ready_conclusion(self): | |
| """Full-sentence READY in the body must not override final NOT READY.""" | |
| response = ( | |
| "The prompt says to answer READY: I am ready to execute the task.\n\n" | |
| "NOT READY" | |
| ) | |
| assert AgentReasoning._check_ready(response) is False |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/tests/utilities/test_structured_planning.py` around lines 732 -
735, The test suite for AgentReasoning._check_ready is missing a regression test
for the case where a full READY sentence appears in the body or middle of the
response, but the final conclusion should still be NOT READY. Add a new test
method (similar to test_full_sentence_not_ready) that creates a response
containing the full READY sentence in the body followed by a NOT READY
conclusion, then verify that AgentReasoning._check_ready correctly returns False
for this case.
The refine-plan and planning prompts instruct the model to conclude with a bare "READY" or "NOT READY", but the readiness parser only recognised the full sentence "READY: I am ready to execute the task." (emitted by the initial-plan prompt). A model replying with the short form was therefore read as "not ready", so the agent kept re-planning — indefinitely when max_attempts is None (crewAIInc#6204). Add an AgentReasoning._check_ready helper that accepts both the full-sentence form and the short "READY"/"NOT READY" form on the last meaningful line, stripping markdown emphasis/punctuation and skipping separator lines, while never matching "NOT READY". Used by _parse_planning_response and the two _call_with_function fallback paths.
4f33415 to
119d246
Compare
Summary
Fixes #6204. When an agent uses reasoning/planning, the executor asks the model to conclude its plan with a readiness signal and re-plans while "not ready". The model frequently replies with a bare
READY, but the parser only recognised the full sentenceREADY: I am ready to execute the task.— so a ready model was read as not ready and the agent kept refining the plan (indefinitely whenmax_attempts is None).The mismatch is between the prompts and the parser:
translations/en.json)reasoning.create_plan_prompt"READY: I am ready to execute the task."(full sentence)reasoning.refine_plan_promptConclude with READY or NOT READY.(bare token)planning.create_plan_promptAfter your plan, state READY or NOT READY.(bare token)planning.refine_plan_promptConclude with READY or NOT READY as before.(bare token)The readiness check only matched the full sentence:
So three of the four prompts could never produce a "ready" verdict. In
_refine_plan_if_needed,while not ready and (max_attempts is None or attempt < max_attempts)then loops — wasted refine calls, or no termination at all with the defaultmax_attempts=None.Fix
Add
AgentReasoning._check_ready, used by_parse_planning_responseand the two_call_with_functionfallback paths. It accepts both forms and inspects only the last meaningful line for the short form, so a plan body that merely mentions the word "ready" isn't misread, andNOT READYis never matched:Surrounding markdown emphasis/punctuation (
**READY**,## READY,READY.) is stripped and pure-separator lines (---) are skipped, since real model output (the report used an Ollama GLM model) is rarely a clean bare token.Test plan
New
TestCheckReady(15 cases) intests/utilities/test_structured_planning.pycovering full-sentence and short forms, markdown/heading/separator variants, and the critical negatives (NOT READY,ALREADY DONE, body-only "ready").Fail-before / pass-after (source stashed, tests kept):
origin/main):13 failed, 2 passed— e.g.test_parse_planning_response_short_readyfailsassert False is True.15 passed.Full file:
38 passed; the 3 remaining failures are pre-existing optional-provider import errors (azure/anthropic/gemininative providers not installed) unrelated to this change.ruff checkclean.Summary by CodeRabbit
readybehavior.