fix(llm): avoid Groq schema-tool failures during guideline generation#264
Conversation
Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably. Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced Constraint: Upstream renamed tip generation paths to guideline generation during rebase Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode Confidence: high Scope-risk: narrow Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
📝 WalkthroughWalkthroughThe PR disables constrained decoding (JSON-schema validation and structured response formats) for the Groq LLM provider in both the clustering and guidelines modules, implementing a fallback to prompt-based JSON format markers and response cleaning instead of structured outputs. ChangesDisable Constrained Decoding for Groq Provider
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
altk_evolve/llm/guidelines/clustering.py (1)
144-144: ⚡ Quick winCentralize the Groq detection predicate used for
is_groq.
The exactis_groqlogic is duplicated inaltk_evolve/llm/guidelines/clustering.py(line 144) andaltk_evolve/llm/guidelines/guidelines.py(line 174):is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/")Extract this into a small shared helper (e.g., on
llm_settingsor a shared util) so both call sites stay consistent if Groq naming conventions change.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@altk_evolve/llm/guidelines/clustering.py` at line 144, Extract the duplicated Groq-detection logic into a single helper and replace both inline checks with it: add a method or function (e.g., llm_settings.is_groq() or util function is_groq_provider(llm_settings)) that returns the boolean computed by the existing predicate, implement it defensively (handle None and use startswith safely), then update the usages that currently set is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/") in clustering.py and guidelines.py to call the new helper and adjust imports accordingly.altk_evolve/llm/guidelines/guidelines.py (2)
184-184: 💤 Low valueRemove redundant
bool()wrapper.The
bool()wrapper is unnecessary since all operands already evaluate to boolean values. The expressionnot is_groq and supports_response_format and response_schema_enabledalready yields a bool.♻️ Simplified expression
- constrained_decoding_supported = bool(not is_groq and supports_response_format and response_schema_enabled) + constrained_decoding_supported = not is_groq and supports_response_format and response_schema_enabled🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@altk_evolve/llm/guidelines/guidelines.py` at line 184, The assignment to constrained_decoding_supported wraps a boolean expression in an unnecessary bool() call; remove the redundant bool() wrapper and assign the expression directly (constrained_decoding_supported = not is_groq and supports_response_format and response_schema_enabled) in the guidelines.py location where constrained_decoding_supported is defined to keep the code concise and equivalent.
174-174: ⚡ Quick winMake Groq detection case-insensitive (avoid false negatives); no
Nonerisk from settings
llm_settings.guidelines_modelis declared asstrinaltk_evolve/config/llm.pywith adefault_factorythat always returns a string, and there’s no evidence in code/tests setting it toNone, so.startswith()shouldn’t hit anAttributeErrorin normal operation.is_groqchecks are case-sensitive (== "groq"andstartswith("groq/")). IfEVOLVE_CUSTOM_LLM_PROVIDER/EVOLVE_GUIDELINES_MODELare set with different casing (e.g.,Groq), Groq detection will fail andconstrained_decoding_supportedwill flip behavior. Same pattern exists inaltk_evolve/llm/guidelines/clustering.py.🛡️ Proposed fix with None guard and case-insensitive check
- is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/") + is_groq = ( + (llm_settings.custom_llm_provider or "").lower() == "groq" + or (llm_settings.guidelines_model or "").lower().startswith("groq/") + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@altk_evolve/llm/guidelines/guidelines.py` at line 174, The Groq-provider detection is case-sensitive and can miss variants like "Groq" and similar; update the logic that computes is_groq (using llm_settings.custom_llm_provider and llm_settings.guidelines_model) to perform a case-insensitive comparison and guard against None by lowercasing values (e.g., compare custom_llm_provider.lower() == "groq" and guidelines_model.lower().startswith("groq/")). Apply the same change in the analogous check in clustering.py so constrained_decoding_supported uses the case-insensitive test and is robust if either setting is unexpectedly falsy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@altk_evolve/llm/guidelines/clustering.py`:
- Line 144: Extract the duplicated Groq-detection logic into a single helper and
replace both inline checks with it: add a method or function (e.g.,
llm_settings.is_groq() or util function is_groq_provider(llm_settings)) that
returns the boolean computed by the existing predicate, implement it defensively
(handle None and use startswith safely), then update the usages that currently
set is_groq = llm_settings.custom_llm_provider == "groq" or
llm_settings.guidelines_model.startswith("groq/") in clustering.py and
guidelines.py to call the new helper and adjust imports accordingly.
In `@altk_evolve/llm/guidelines/guidelines.py`:
- Line 184: The assignment to constrained_decoding_supported wraps a boolean
expression in an unnecessary bool() call; remove the redundant bool() wrapper
and assign the expression directly (constrained_decoding_supported = not is_groq
and supports_response_format and response_schema_enabled) in the guidelines.py
location where constrained_decoding_supported is defined to keep the code
concise and equivalent.
- Line 174: The Groq-provider detection is case-sensitive and can miss variants
like "Groq" and similar; update the logic that computes is_groq (using
llm_settings.custom_llm_provider and llm_settings.guidelines_model) to perform a
case-insensitive comparison and guard against None by lowercasing values (e.g.,
compare custom_llm_provider.lower() == "groq" and
guidelines_model.lower().startswith("groq/")). Apply the same change in the
analogous check in clustering.py so constrained_decoding_supported uses the
case-insensitive test and is robust if either setting is unexpectedly falsy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0bb80a9e-b620-4ee2-8b9f-e7e1a135b19f
📒 Files selected for processing (4)
altk_evolve/llm/guidelines/clustering.pyaltk_evolve/llm/guidelines/guidelines.pytests/unit/test_combine_guidelines.pytests/unit/test_guidelines.py
Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably.
Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced
Constraint: Upstream renamed tip generation paths to guideline generation during rebase
Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing
Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode
Confidence: high
Scope-risk: narrow
Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior
Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v
Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
Summary by CodeRabbit
Bug Fixes
Tests