Normalize eval enable_thinking sampling args by peter941221 · Pull Request #682 · PrimeIntellect-ai/prime

peter941221 · 2026-05-25T00:57:08Z

Summary

normalize local prime eval run --sampling-args payloads that use top-level enable_thinking
rewrite that flag into extra_body.chat_template_kwargs.enable_thinking before handing off to verifiers
add a regression test that checks the rewritten command-line payload

Why

Users currently follow the documented --sampling-args '{"enable_thinking": false}' form and hit AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking' during local eval runs. Hosted and training paths already model enable_thinking as a first-class sampling control, so the local eval bridge should preserve that user-facing contract.

Testing

python -m py_compile packages/prime/src/prime_cli/verifiers_bridge.py packages/prime/tests/test_eval_billing.py
uv run ruff check packages/prime/src/prime_cli/verifiers_bridge.py packages/prime/tests/test_eval_billing.py
uv run pytest packages/prime/tests/test_eval_billing.py -q

Note

Low Risk
CLI-only sampling-arg normalization and short-lived temp config files; behavior is additive for users who already pass nested extra_body, with regression tests.

Overview
Local prime eval run now accepts top-level enable_thinking in --sampling-args JSON and in eval TOML sampling_args, rewriting it to extra_body.chat_template_kwargs.enable_thinking before verifiers runs so inference no longer rejects an unknown keyword.

CLI passthrough rewrites every --sampling-args occurrence; config-driven runs write a temporary TOML when needed and delete it in a finally block. Tests cover CLI rewrite (including multiple flags), config rewrite, and temp-file cleanup on failure.

^{Reviewed by Cursor Bugbot for commit 3576156. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 6281eeb. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6281eeb0dd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

peter941221 added 2 commits May 25, 2026 08:56

Normalize eval enable_thinking sampling args

f60111f

Normalize config eval enable_thinking args

6281eeb

peter941221 marked this pull request as ready for review May 26, 2026 10:41

peter941221 requested review from JannikSt, JohannesHa, burnpiro, d42me, kcoopermiller and willccbb as code owners May 26, 2026 10:41

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py Outdated

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Comment thread packages/prime/src/prime_cli/verifiers_bridge.py

Handle all eval sampling args rewrites

3576156

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize eval enable_thinking sampling args#682

Normalize eval enable_thinking sampling args#682
peter941221 wants to merge 3 commits into
PrimeIntellect-ai:mainfrom
peter941221:fix/eval-enable-thinking-compat

peter941221 commented May 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peter941221 commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Testing

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peter941221 commented May 25, 2026 •

edited by cursor Bot

Loading