Skip to content

Normalize eval enable_thinking sampling args#682

Open
peter941221 wants to merge 3 commits into
PrimeIntellect-ai:mainfrom
peter941221:fix/eval-enable-thinking-compat
Open

Normalize eval enable_thinking sampling args#682
peter941221 wants to merge 3 commits into
PrimeIntellect-ai:mainfrom
peter941221:fix/eval-enable-thinking-compat

Conversation

@peter941221
Copy link
Copy Markdown

@peter941221 peter941221 commented May 25, 2026

Summary

  • normalize local prime eval run --sampling-args payloads that use top-level enable_thinking
  • rewrite that flag into extra_body.chat_template_kwargs.enable_thinking before handing off to verifiers
  • add a regression test that checks the rewritten command-line payload

Why

Users currently follow the documented --sampling-args '{"enable_thinking": false}' form and hit AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking' during local eval runs. Hosted and training paths already model enable_thinking as a first-class sampling control, so the local eval bridge should preserve that user-facing contract.

Testing

  • python -m py_compile packages/prime/src/prime_cli/verifiers_bridge.py packages/prime/tests/test_eval_billing.py
  • uv run ruff check packages/prime/src/prime_cli/verifiers_bridge.py packages/prime/tests/test_eval_billing.py
  • uv run pytest packages/prime/tests/test_eval_billing.py -q

Note

Low Risk
CLI-only sampling-arg normalization and short-lived temp config files; behavior is additive for users who already pass nested extra_body, with regression tests.

Overview
Local prime eval run now accepts top-level enable_thinking in --sampling-args JSON and in eval TOML sampling_args, rewriting it to extra_body.chat_template_kwargs.enable_thinking before verifiers runs so inference no longer rejects an unknown keyword.

CLI passthrough rewrites every --sampling-args occurrence; config-driven runs write a temporary TOML when needed and delete it in a finally block. Tests cover CLI rewrite (including multiple flags), config rewrite, and temp-file cleanup on failure.

Reviewed by Cursor Bugbot for commit 3576156. Bugbot is set up for automated code reviews on this repo. Configure here.

@peter941221 peter941221 marked this pull request as ready for review May 26, 2026 10:41
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6281eeb. Configure here.

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated
Comment thread packages/prime/src/prime_cli/verifiers_bridge.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6281eeb0dd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant