fix(llm): avoid Groq schema-tool failures during guideline generation by gaodan-fang · Pull Request #264 · AgentToolkit/altk-evolve

gaodan-fang · 2026-06-02T03:02:49Z

Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably.

Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced
Constraint: Upstream renamed tip generation paths to guideline generation during rebase
Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing
Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode
Confidence: high
Scope-risk: narrow
Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior
Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v
Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py
Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py

Summary by CodeRabbit

Bug Fixes
- Improved compatibility with Groq LLM provider by automatically adjusting the guideline generation approach to work with Groq's constraints.
Tests
- Added test coverage for Groq-specific behavior to ensure proper guideline generation across different LLM providers.

Groq gpt-oss can report response-schema support through LiteLLM while still failing when the model returns normal text instead of a required tool call. Guideline generation and guideline consolidation now keep Groq on the existing JSON prompt-and-parse path while leaving structured outputs enabled for providers that handle them reliably. Constraint: Groq openai/gpt-oss-120b may reject schema/tool-backed responses when no tool call is produced Constraint: Upstream renamed tip generation paths to guideline generation during rebase Rejected: Add a Groq SDK client | LiteLLM already fronts all generation paths and the failure is limited to schema routing Rejected: Disable structured outputs globally | OpenAI-compatible providers still benefit from strict schema mode Confidence: high Scope-risk: narrow Directive: Do not re-enable response_format for Groq guideline generation without live testing groq/openai/gpt-oss-120b tool/schema behavior Tested: uv run pytest tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py -v Tested: uv run ruff check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py Tested: uv run ruff format --check altk_evolve/llm/guidelines/guidelines.py altk_evolve/llm/guidelines/clustering.py tests/unit/test_guidelines.py tests/unit/test_combine_guidelines.py

coderabbitai · 2026-06-02T03:03:02Z

📝 Walkthrough

Walkthrough

The PR disables constrained decoding (JSON-schema validation and structured response formats) for the Groq LLM provider in both the clustering and guidelines modules, implementing a fallback to prompt-based JSON format markers and response cleaning instead of structured outputs.

Changes

Disable Constrained Decoding for Groq Provider

Layer / File(s)	Summary
Groq provider detection in clustering `altk_evolve/llm/guidelines/clustering.py`	`combine_cluster()` detects Groq via `custom_llm_provider` or `guidelines_model` prefix and adds `not is_groq` to `constrained_decoding_supported` condition, disabling structured response handling for Groq.
Test coverage for clustering Groq behavior `tests/unit/test_combine_guidelines.py`	Imports are updated to access clustering `llm_settings`. Existing structured-output test is patched with Groq settings. New test verifies that Groq uses "Output Format (JSON)" in prompts and does not receive `response_format` parameter.
Groq provider detection in guidelines `altk_evolve/llm/guidelines/guidelines.py`	`generate_guidelines()` detects Groq via `custom_llm_provider` or `guidelines_model` prefix and adds `not is_groq` to `constrained_decoding_supported` condition, mirroring the clustering implementation.
Test coverage for guidelines Groq behavior `tests/unit/test_guidelines.py`	Imports expanded to include mocking utilities. Helper `_mock_completion_response()` fabricates mock completions with JSON content. New test verifies `generate_guidelines()` omits `response_format` for Groq and includes "Output Format (JSON)" in prompt messages.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A Groq, a Groq, constrained you are not!
We've taught both functions to forget what they got—
No schemas, no structures, just JSON so plain,
Prompt markers instead shall forever remain.
With tests as our witness, the fallback holds true! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(llm): avoid Groq schema-tool failures during guideline generation' directly and clearly describes the main change: disabling constrained decoding for Groq to prevent schema failures.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-groq-schema-tool-guidelines

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

altk_evolve/llm/guidelines/clustering.py (1)
144-144: ⚡ Quick win

Centralize the Groq detection predicate used for is_groq.
The exact is_groq logic is duplicated in altk_evolve/llm/guidelines/clustering.py (line 144) and altk_evolve/llm/guidelines/guidelines.py (line 174):
is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/")
Extract this into a small shared helper (e.g., on llm_settings or a shared util) so both call sites stay consistent if Groq naming conventions change.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/clustering.py` at line 144, Extract the duplicated
Groq-detection logic into a single helper and replace both inline checks with
it: add a method or function (e.g., llm_settings.is_groq() or util function
is_groq_provider(llm_settings)) that returns the boolean computed by the
existing predicate, implement it defensively (handle None and use startswith
safely), then update the usages that currently set is_groq =
llm_settings.custom_llm_provider == "groq" or
llm_settings.guidelines_model.startswith("groq/") in clustering.py and
guidelines.py to call the new helper and adjust imports accordingly.
altk_evolve/llm/guidelines/guidelines.py (2)
184-184: 💤 Low value

Remove redundant bool() wrapper.

The bool() wrapper is unnecessary since all operands already evaluate to boolean values. The expression not is_groq and supports_response_format and response_schema_enabled already yields a bool.
♻️ Simplified expression
-    constrained_decoding_supported = bool(not is_groq and supports_response_format and response_schema_enabled)
+    constrained_decoding_supported = not is_groq and supports_response_format and response_schema_enabled
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/guidelines.py` at line 184, The assignment to
constrained_decoding_supported wraps a boolean expression in an unnecessary
bool() call; remove the redundant bool() wrapper and assign the expression
directly (constrained_decoding_supported = not is_groq and
supports_response_format and response_schema_enabled) in the guidelines.py
location where constrained_decoding_supported is defined to keep the code
concise and equivalent.
174-174: ⚡ Quick win

Make Groq detection case-insensitive (avoid false negatives); no None risk from settings

llm_settings.guidelines_model is declared as str in altk_evolve/config/llm.py with a default_factory that always returns a string, and there’s no evidence in code/tests setting it to None, so .startswith() shouldn’t hit an AttributeError in normal operation.

is_groq checks are case-sensitive (== "groq" and startswith("groq/")). If EVOLVE_CUSTOM_LLM_PROVIDER / EVOLVE_GUIDELINES_MODEL are set with different casing (e.g., Groq), Groq detection will fail and constrained_decoding_supported will flip behavior. Same pattern exists in altk_evolve/llm/guidelines/clustering.py.
🛡️ Proposed fix with None guard and case-insensitive check
-    is_groq = llm_settings.custom_llm_provider == "groq" or llm_settings.guidelines_model.startswith("groq/")
+    is_groq = (
+        (llm_settings.custom_llm_provider or "").lower() == "groq"
+        or (llm_settings.guidelines_model or "").lower().startswith("groq/")
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@altk_evolve/llm/guidelines/guidelines.py` at line 174, The Groq-provider
detection is case-sensitive and can miss variants like "Groq" and similar;
update the logic that computes is_groq (using llm_settings.custom_llm_provider
and llm_settings.guidelines_model) to perform a case-insensitive comparison and
guard against None by lowercasing values (e.g., compare
custom_llm_provider.lower() == "groq" and
guidelines_model.lower().startswith("groq/")). Apply the same change in the
analogous check in clustering.py so constrained_decoding_supported uses the
case-insensitive test and is robust if either setting is unexpectedly falsy.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@altk_evolve/llm/guidelines/clustering.py`:
- Line 144: Extract the duplicated Groq-detection logic into a single helper and
replace both inline checks with it: add a method or function (e.g.,
llm_settings.is_groq() or util function is_groq_provider(llm_settings)) that
returns the boolean computed by the existing predicate, implement it defensively
(handle None and use startswith safely), then update the usages that currently
set is_groq = llm_settings.custom_llm_provider == "groq" or
llm_settings.guidelines_model.startswith("groq/") in clustering.py and
guidelines.py to call the new helper and adjust imports accordingly.

In `@altk_evolve/llm/guidelines/guidelines.py`:
- Line 184: The assignment to constrained_decoding_supported wraps a boolean
expression in an unnecessary bool() call; remove the redundant bool() wrapper
and assign the expression directly (constrained_decoding_supported = not is_groq
and supports_response_format and response_schema_enabled) in the guidelines.py
location where constrained_decoding_supported is defined to keep the code
concise and equivalent.
- Line 174: The Groq-provider detection is case-sensitive and can miss variants
like "Groq" and similar; update the logic that computes is_groq (using
llm_settings.custom_llm_provider and llm_settings.guidelines_model) to perform a
case-insensitive comparison and guard against None by lowercasing values (e.g.,
compare custom_llm_provider.lower() == "groq" and
guidelines_model.lower().startswith("groq/")). Apply the same change in the
analogous check in clustering.py so constrained_decoding_supported uses the
case-insensitive test and is robust if either setting is unexpectedly falsy.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0bb80a9e-b620-4ee2-8b9f-e7e1a135b19f

📥 Commits

Reviewing files that changed from the base of the PR and between 606e906 and a69ffa0.

📒 Files selected for processing (4)

altk_evolve/llm/guidelines/clustering.py
altk_evolve/llm/guidelines/guidelines.py
tests/unit/test_combine_guidelines.py
tests/unit/test_guidelines.py

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

gaodan-fang merged commit 5b7bc7c into main Jun 2, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): avoid Groq schema-tool failures during guideline generation#264

fix(llm): avoid Groq schema-tool failures during guideline generation#264
gaodan-fang merged 1 commit into
mainfrom
fix-groq-schema-tool-guidelines

gaodan-fang commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaodan-fang commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gaodan-fang commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading