Skip to content

fix(llm): avoid Groq schema-tool failures during guideline generation#264

Merged
gaodan-fang merged 1 commit into
mainfrom
fix-groq-schema-tool-guidelines
Jun 2, 2026
Merged

fix(llm): avoid Groq schema-tool failures during guideline generation#264
gaodan-fang merged 1 commit into
mainfrom
fix-groq-schema-tool-guidelines

Conversation

@gaodan-fang
Copy link
Copy Markdown
Collaborator

@gaodan-fang gaodan-fang commented Jun 2, 2026

Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably.

Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced
Constraint: Upstream renamed tip generation paths to guideline generation during rebase
Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing
Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode
Confidence: high
Scope-risk: narrow
Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior
Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v
Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py

Summary by CodeRabbit

  • Bug Fixes

    • Improved compatibility with Groq LLM provider by automatically adjusting the guideline generation approach to work with Groq's constraints.
  • Tests

    • Added test coverage for Groq-specific behavior to ensure proper guideline generation across different LLM providers.

Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably.

Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced
Constraint: Upstream renamed tip generation paths to guideline generation during rebase
Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing
Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode
Confidence: high
Scope-risk: narrow
Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior
Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v
Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR disables constrained decoding (JSON-schema validation and structured response formats) for the Groq LLM provider in both the clustering and guidelines modules, implementing a fallback to prompt-based JSON format markers and response cleaning instead of structured outputs.

Changes

Disable Constrained Decoding for Groq Provider

Layer / File(s) Summary
Groq provider detection in clustering
altk_evolve/llm/guidelines/clustering.py
combine_cluster() detects Groq via custom_llm_provider or guidelines_model prefix and adds not is_groq to constrained_decoding_supported condition, disabling structured response handling for Groq.
Test coverage for clustering Groq behavior
tests/unit/test_combine_guidelines.py
Imports are updated to access clustering llm_settings. Existing structured-output test is patched with Groq settings. New test verifies that Groq uses "Output Format (JSON)" in prompts and does not receive response_format parameter.
Groq provider detection in guidelines
altk_evolve/llm/guidelines/guidelines.py
generate_guidelines() detects Groq via custom_llm_provider or guidelines_model prefix and adds not is_groq to constrained_decoding_supported condition, mirroring the clustering implementation.
Test coverage for guidelines Groq behavior
tests/unit/test_guidelines.py
Imports expanded to include mocking utilities. Helper _mock_completion_response() fabricates mock completions with JSON content. New test verifies generate_guidelines() omits response_format for Groq and includes "Output Format (JSON)" in prompt messages.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A Groq, a Groq, constrained you are not!
We've taught both functions to forget what they got—
No schemas, no structures, just JSON so plain,
Prompt markers instead shall forever remain.
With tests as our witness, the fallback holds true! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(llm): avoid Groq schema-tool failures during guideline generation' directly and clearly describes the main change: disabling constrained decoding for Groq to prevent schema failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-groq-schema-tool-guidelines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
altk_evolve/llm/guidelines/clustering.py (1)

144-144: ⚡ Quick win

Centralize the Groq detection predicate used for is_groq.
The exact is_groq logic is duplicated in altk_evolve/llm/guidelines/clustering.py (line 144) and altk_evolve/llm/guidelines/guidelines.py (line 174):

is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/")

Extract this into a small shared helper (e.g., on llm_settings or a shared util) so both call sites stay consistent if Groq naming conventions change.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/clustering.py` at line 144, Extract the duplicated
Groq-detection logic into a single helper and replace both inline checks with
it: add a method or function (e.g., llm_settings.is_groq() or util function
is_groq_provider(llm_settings)) that returns the boolean computed by the
existing predicate, implement it defensively (handle None and use startswith
safely), then update the usages that currently set is_groq =
llm_settings.custom_llm_provider == "groq" or
llm_settings.guidelines_model.startswith("groq/") in clustering.py and
guidelines.py to call the new helper and adjust imports accordingly.
altk_evolve/llm/guidelines/guidelines.py (2)

184-184: 💤 Low value

Remove redundant bool() wrapper.

The bool() wrapper is unnecessary since all operands already evaluate to boolean values. The expression not is_groq and supports_response_format and response_schema_enabled already yields a bool.

♻️ Simplified expression
-    constrained_decoding_supported = bool(not is_groq and supports_response_format and response_schema_enabled)
+    constrained_decoding_supported = not is_groq and supports_response_format and response_schema_enabled
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/guidelines.py` at line 184, The assignment to
constrained_decoding_supported wraps a boolean expression in an unnecessary
bool() call; remove the redundant bool() wrapper and assign the expression
directly (constrained_decoding_supported = not is_groq and
supports_response_format and response_schema_enabled) in the guidelines.py
location where constrained_decoding_supported is defined to keep the code
concise and equivalent.

174-174: ⚡ Quick win

Make Groq detection case-insensitive (avoid false negatives); no None risk from settings

  • llm_settings.guidelines_model is declared as str in altk_evolve/config/llm.py with a default_factory that always returns a string, and there’s no evidence in code/tests setting it to None, so .startswith() shouldn’t hit an AttributeError in normal operation.
  • is_groq checks are case-sensitive (== "groq" and startswith("groq/")). If EVOLVE_CUSTOM_LLM_PROVIDER / EVOLVE_GUIDELINES_MODEL are set with different casing (e.g., Groq), Groq detection will fail and constrained_decoding_supported will flip behavior. Same pattern exists in altk_evolve/llm/guidelines/clustering.py.
🛡️ Proposed fix with None guard and case-insensitive check
-    is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/")
+    is_groq = (
+        (llm_settings.custom_llm_provider or "").lower() == "groq"
+        or (llm_settings.guidelines_model or "").lower().startswith("groq/")
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/guidelines.py` at line 174, The Groq-provider
detection is case-sensitive and can miss variants like "Groq" and similar;
update the logic that computes is_groq (using llm_settings.custom_llm_provider
and llm_settings.guidelines_model) to perform a case-insensitive comparison and
guard against None by lowercasing values (e.g., compare
custom_llm_provider.lower() == "groq" and
guidelines_model.lower().startswith("groq/")). Apply the same change in the
analogous check in clustering.py so constrained_decoding_supported uses the
case-insensitive test and is robust if either setting is unexpectedly falsy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@altk_evolve/llm/guidelines/clustering.py`:
- Line 144: Extract the duplicated Groq-detection logic into a single helper and
replace both inline checks with it: add a method or function (e.g.,
llm_settings.is_groq() or util function is_groq_provider(llm_settings)) that
returns the boolean computed by the existing predicate, implement it defensively
(handle None and use startswith safely), then update the usages that currently
set is_groq = llm_settings.custom_llm_provider == "groq" or
llm_settings.guidelines_model.startswith("groq/") in clustering.py and
guidelines.py to call the new helper and adjust imports accordingly.

In `@altk_evolve/llm/guidelines/guidelines.py`:
- Line 184: The assignment to constrained_decoding_supported wraps a boolean
expression in an unnecessary bool() call; remove the redundant bool() wrapper
and assign the expression directly (constrained_decoding_supported = not is_groq
and supports_response_format and response_schema_enabled) in the guidelines.py
location where constrained_decoding_supported is defined to keep the code
concise and equivalent.
- Line 174: The Groq-provider detection is case-sensitive and can miss variants
like "Groq" and similar; update the logic that computes is_groq (using
llm_settings.custom_llm_provider and llm_settings.guidelines_model) to perform a
case-insensitive comparison and guard against None by lowercasing values (e.g.,
compare custom_llm_provider.lower() == "groq" and
guidelines_model.lower().startswith("groq/")). Apply the same change in the
analogous check in clustering.py so constrained_decoding_supported uses the
case-insensitive test and is robust if either setting is unexpectedly falsy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0bb80a9e-b620-4ee2-8b9f-e7e1a135b19f

📥 Commits

Reviewing files that changed from the base of the PR and between 606e906 and a69ffa0.

📒 Files selected for processing (4)
  • altk_evolve/llm/guidelines/clustering.py
  • altk_evolve/llm/guidelines/guidelines.py
  • tests/unit/test_combine_guidelines.py
  • tests/unit/test_guidelines.py

@gaodan-fang gaodan-fang merged commit 5b7bc7c into main Jun 2, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant