Skip to content

Rework v1 harness and taskset config classes#1392

Open
xeophon wants to merge 38 commits into
mainfrom
codex/remove-v1-config-classes
Open

Rework v1 harness and taskset config classes#1392
xeophon wants to merge 38 commits into
mainfrom
codex/remove-v1-config-classes

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented May 15, 2026

Summary

  • rebuild v1 config models on strict Pydantic config models with explicit nested defaults (taskset: MyTasksetConfig = MyTasksetConfig(), harness: MyHarnessConfig = MyHarnessConfig())
  • make v1 taskset/harness construction config-only and fresh by default: public constructors/loaders take config=None instead of constructing config objects in function signatures
  • bind taskset/harness config types through generics (Taskset[MyTasksetConfig], Harness[MyHarnessConfig]) instead of a public config_type hook
  • keep configs serializable by using primitives, containers, config models, and import-ref strings only
  • keep runtime defaults on taskset/harness classes (_default_rewards, _default_program, _default_toolsets, etc.) so users do not have to write source/reward/program paths in every config
  • centralize repeated runtime-owner setup for tasksets/harnesses: config coercion, default precedence, toolset merging, lifecycle handler merging, and add_* mutation APIs now share one internal path
  • centralize packaged command harness setup in CommandHarness, so OpenCode/MiniSWEAgent/Pi/Terminus2 share command-program and sandbox wiring
  • add vf.Env.config(...), vf.Env.loader(...), and vf.Env.from_config(...) for the common typed env wiring paths, and use them in examples and simple v1 envs
  • update TOML/env loading so raw config mappings validate through the annotated EnvConfig child config types at loader boundaries

Concrete env shape

Config classes expose the knobs users change; taskset/harness classes own the implementation. vf.Env.config(...) builds the typed envelope from those classes, and vf.Env.loader(...) gives the package its load_environment entrypoint.

import verifiers as vf


@vf.reward(weight=1.0)
async def exact_answer(task, state) -> float:
    return float(task["answer"] in str(state.get("completion") or ""))


class ReverseTasksetConfig(vf.TasksetConfig):
    split: str = "train"


class ReverseTaskset(vf.Taskset[ReverseTasksetConfig]):
    _default_rewards = (exact_answer,)

    def rows(self) -> list[dict[str, object]]:
        rows = [
            {
                "prompt": [{"role": "user", "content": "Reverse abc."}],
                "answer": "cba",
                "split": "train",
            },
            {
                "prompt": [{"role": "user", "content": "Reverse prime."}],
                "answer": "emirp",
                "split": "test",
            },
        ]
        return [row for row in rows if row["split"] == self.config.split]


class ReverseHarnessConfig(vf.HarnessConfig):
    max_turns: int = 2


class ReverseHarness(vf.Harness[ReverseHarnessConfig]):
    pass


ReverseEnvConfig = vf.Env.config(taskset=ReverseTaskset, harness=ReverseHarness)
load_environment = vf.Env.loader(
    taskset=ReverseTaskset,
    harness=ReverseHarness,
    env_config=ReverseEnvConfig,
)

For hand-written loaders, keep config defaults fresh and route through the typed envelope:

def load_environment(config: ReverseEnvConfig | None = None) -> vf.Env:
    return vf.Env.from_config(
        config,
        taskset=ReverseTaskset,
        harness=ReverseHarness,
        env_config=ReverseEnvConfig,
    )

Building configs in Python

# Use defaults.
default_config = ReverseEnvConfig()

# Override typed child configs directly.
eval_config = ReverseEnvConfig(
    taskset=ReverseTasksetConfig(split="test"),
    harness=ReverseHarnessConfig(max_turns=4),
)

# Validate raw mappings at the config-loader boundary.
raw_config = ReverseEnvConfig.from_config(
    {
        "taskset": {"split": "test"},
        "harness": {"max_turns": 4},
    }
)

env = load_environment(raw_config)

TOML override shape

Users only write the fields they want to override. The environment implementation remains in Python.

[[eval]]
id = "reverse-v1"
model = "openai/gpt-5.4-mini"
sampling_args = { max_tokens = 1024 }

[eval.taskset]
split = "test"

[eval.harness]
max_turns = 4

Hosted training/RL uses the same child sections:

model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
rollouts_per_example = 8

[[env]]
id = "primeintellect/reverse-v1"

[env.taskset]
split = "train"

[env.harness]
max_turns = 2

Review follow-up

  • route legacy v1=True wrapper branches through typed v1 env configs instead of old mirrored kwargs
  • normalize raw mapping show/hide strings as single tool names
  • preserve full config aliases in source-loader kwargs and avoid passing positional-only parameters by keyword
  • register file-loaded modules in sys.modules where import-ref config defaults need them during tests
  • bind taskset/harness config types through generics and share that inference internally
  • move default source/reward/program/toolset behavior out of config fields and onto taskset/harness classes
  • replace repeated loader boilerplate with vf.Env.from_config(...) / vf.Env.loader(...) where construction is standard
  • remove signature-time config construction and enforce config=None with an updated v1 semgrep policy
  • factor shared taskset/harness lifecycle setup into RuntimeOwnerMixin
  • factor packaged CLI harness runtime setup into CommandHarness

Validation

  • uv run --frozen ruff check --fix .
  • uv run --frozen ruff format
  • uv run --frozen pytest tests/test_v1_config_extension.py tests/test_v1_harbor_cli.py tests/test_v1_runtime_lifecycle.py tests/test_v1_taskset_bindings.py tests/test_v1_bfcl.py tests/test_mcp_search_env.py tests/test_langchain_deep_agents_wikispeedia.py tests/test_v1_rlm_swe.py (225 passed)
  • uv run --frozen pre-commit run --all-files
  • push hooks: ruff, format, Semgrep v1 policy, AGENTS sync, ty

Notes

  • Backwards compatibility is intentionally not preserved for the old mirrored v1 constructor kwargs.
  • Config objects are the serialization boundary. Runtime APIs such as add_reward(...), class defaults such as _default_rewards, and methods such as rows() may still use live Python objects because they are not serialized config.
  • pyproject.toml and uv.lock are intentionally untouched by the latest refactor pass.

Note

Medium Risk
Medium risk because it changes the public v1 environment loading/constructor surface (config defaults, loader signatures, and class-based defaults), which can break downstream environments that still pass kwargs or build config objects in function signatures.

Overview
Standardizes the v1 authoring/loading pattern around typed config envelopes: environments now define TasksetConfig/HarnessConfig/EnvConfig defaults explicitly, implement tasksets/harnesses as Taskset[Config] and Harness[Config], and construct envs via vf.Env.from_config(...) or vf.Env.loader(...) rather than bespoke load_taskset/load_harness wrappers.

Updates multiple bundled environments (e.g., alphabet_sort_v1, math_python_v1, wiki_search_v1, reverse_text_v1, BFCL, MCP search, nested harness, etc.) to move runtime defaults onto class attributes (_default_source, _default_rewards, _default_program, _default_toolsets, etc.), tighten config validation/derivation (including rejecting unknown v1 wrapper kwargs), and adjust reward/toolset wiring to be config-serializable.

Docs/examples/tests are rewritten to match the new API (including Env.config usage and explicit nested config defaults), Semgrep adds a new rule forbidding config-object construction in function signatures, and the project adds prime-pydantic-config[toml] as a dependency.

Reviewed by Cursor Bugbot for commit eddd017. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Require config objects for Taskset and Harness construction in v1

  • Taskset and Harness constructors now accept only a single typed config object (TasksetConfig/HarnessConfig); all previous per-argument overrides are removed.
  • Introduces CallableConfig and SignalConfig for declaring callables and scoring entries declaratively in config, replacing direct callable/object injection.
  • All built-in environments (alphabet_sort, math_python, wiki_search, reverse_text, bfcl, harbor, etc.) are updated to define TasksetConfig/HarnessConfig/EnvConfig subclasses with defaults, and their loader functions now accept typed config objects.
  • All harness subclasses (OpenCode, MiniSWEAgent, Pi, RLM, Terminus2) are refactored to use _configure_runtime for post-construction runtime setup instead of constructor kwargs.
  • Config classes now enforce serializable values via validate_serializable_value; callables and PathLike objects are rejected at construction time and must be provided as import-ref strings.
  • Risk: any code constructing Taskset, Harness, or harness subclasses with keyword arguments will break; configs containing callables or Path objects will raise TypeError at validation time.

Changes since #1392 opened

  • Modified constructors of MiniSWEAgent, OpenCode, Pi, RLM, and Terminus2 harnesses to defer program override handling [00be9b2]
  • Added base_harness_config utility function to verifiers.v1.packages.harnesses.command module [00be9b2]
  • Changed prepare_typed_env_config utility to use from_config instead of model_validate for config coercion [00be9b2]
  • Added test coverage for environment loading config coercion and packaged command harness program override behavior [00be9b2]
  • Converted Taskset and Harness base classes to generic classes [2c0691b]
  • Migrated taskset implementations to use generic type binding [2c0691b]
  • Migrated harness implementations to use generic type binding [2c0691b]
  • Replaced dynamic config type resolution with direct class references in config coercion [2c0691b]
  • Added tests validating generic config binding behavior [2c0691b]
  • Updated test utilities and fixtures to use generic type binding [2c0691b]
  • Updated documentation and code examples to demonstrate generic-based config binding [2c0691b]
  • Refactored Harness.__init_subclass__ and Taskset.__init_subclass__ to extract generic type arguments from __orig_bases__, inherit _config_cls from base classes with defaults to HarnessConfig and TasksetConfig respectively, and validate that the resolved config class is a subclass of the appropriate base config type, raising TypeError if validation fails [25857e6]
  • Changed Harness.__init__ and Taskset.__init__ constructors to always coerce config through type(self)._config_cls.from_config(config), removing the previous heuristic that would switch to type(config) when _config_cls was the base config class [25857e6]
  • Reworked make_taskset and make_harness test helpers to normalize input config through TasksetConfig.from_config and HarnessConfig.from_config respectively, build data dictionaries from base_config.model_dump(exclude_none=True) overlaid with runtime values, and instantiate using type(base_config).from_config(config_value(data)) instead of dynamically selecting config classes and calling model_validate [25857e6]
  • Removed config_data utility helper from test infrastructure [25857e6]
  • Added explicit type annotation to local variable in verifiers.v1.harness.Harness.__init__ method [0d70fe0]
  • Implemented class-level default mechanism in v1 framework base classes [264ca51]
  • Modified config serialization to preserve child class defaults [264ca51]
  • Migrated environment modules to use Taskset and Harness subclasses with class-level defaults [264ca51]
  • Refactored toolset configuration from embedded mappings to scalar fields with auto-addition [264ca51]
  • Removed dependency injection from reward functions [264ca51]
  • Added validation to reject unsupported configuration parameters [264ca51]
  • Updated documentation and examples to demonstrate class-level defaults pattern [264ca51]
  • Added test coverage for config extension and default behavior [264ca51]
  • Moved program configuration from config class to harness class [e8a0bff]
  • Modified toolset.Toolset.__init__ constructor to resolve config objects into actual instances [58105f6]
  • Introduced local BaseConfig class with Pydantic extra field validation [58105f6]
  • Moved default rewards from EnvTasksetConfig field to EnvTaskset class attribute [58105f6]
  • Updated pydantic-config dependency specification [58105f6]
  • Refactored tag extraction in lcs_reward_func to use module-level constant [58105f6]
  • Added test for inline toolset object reference resolution [58105f6]
  • Changed harness instantiation class in documentation example from vf.Harness to ReplayHarness [fa30096]
  • Refactored WikiSearchTasksetConfig initialization to use direct field assignment instead of explicit toolsets mapping, and added conditional logic to wiki_search_v1.load_taskset to only register default 'wiki' toolset when toolsets field is not present in config [fdab488]
  • Replaced local BaseConfig class implementation with BaseConfig imported from pydantic_config package in verifiers v1 config module [fdab488]
  • Changed pydantic-config dependency from version-constrained package to Git repository source [fdab488]
  • Added test utilities and test cases for WikiSearch v1 taskset behavior including stub installation for external dependencies and validation of default and explicit toolset registration [fdab488]
  • Modified user resolution logic in Harness.__init__ and Taskset.__init__ constructors to honor non-None default user values specified on config models [c418244]
  • Added tests to verify that default users specified in config models are properly activated in Taskset and Harness instances [c418244]
  • Introduced ConfigBound[ConfigT] base class and refactored Taskset and Harness to inherit from it [a0210fd]
  • Added Env.from_config classmethod for standardized environment construction from config and class references [a0210fd]
  • Removed load_taskset and load_harness functions from all environment modules and reimplemented load_environment to delegate to Env.from_config [a0210fd]
  • Updated documentation, examples, and scaffolding templates to use Env.from_config construction pattern [a0210fd]
  • Reworked test cases to validate generic config binding inference and Env.from_config construction behavior [a0210fd]
  • Changed taskset and harness instantiation pattern to allow direct construction without requiring separate config classes or a rows method [f157d46]
  • Removed base config import and field overlay functionality from configuration system [f157d46]
  • Consolidated test validation of generic config binding, type inference, and Env.from_config behavior [f157d46]
  • Refactored test stub infrastructure to use a generic kwargs-accepting stub class [f157d46]
  • Consolidated wiki search test validation of default and explicit toolset behavior [f157d46]
  • Added factory methods Env.config and Env.loader to the Env class in verifiers.v1.env module [dba9f3f]
  • Extended Env.from_config classmethod to accept mapping-based configs and widened taskset and harness type signatures [dba9f3f]
  • Replaced environment loader implementations across multiple environment modules to use vf.Env.loader factory [dba9f3f]
  • Replaced or introduced EnvConfig classes using vf.Env.config factory in environment modules [dba9f3f]
  • Removed generic type parameters from Taskset and Harness subclasses across multiple environment modules [dba9f3f]
  • Updated test cases to pass mapping configs instead of config object instances to environment loaders [dba9f3f]
  • Removed utility functions config_model_mapping and omit_none from verifiers.v1 modules [dba9f3f]
  • Removed tau2-related test cases and helper stubs from test suite [dba9f3f]
  • Modified default resolution logic in Harness.__init__ and Taskset.__init__ to prioritize config class defaults over class-level defaults [3d82213]
  • Refactored judge configuration in wiki_search_v1 from task-level to factory-level by introducing judge_reward_factory and updating load_taskset [3d82213]
  • Added tests verifying config class defaults take precedence over class-level defaults for both taskset and harness configurations [3d82213]
  • Extended test_wiki_search_v1_default_and_explicit_toolsets to verify default reward presence and task row structure [3d82213]
  • Changed model_dump calls in environments.bfcl_v3.load_environment to pass exclude_none=True instead of exclude_unset=True when converting base_taskset_config and base_harness_config to dictionaries for validation into BFCLTasksetConfig and BFCLHarnessConfig [ec0d15d]
  • Refactored verifiers.v1.harness.Harness.__init__ to accept an optional config parameter instead of instantiating a default HarnessConfig at function definition time [ec0d15d]
  • Introduced RuntimeOwnerMixin class in verifiers.v1.utils.runtime_owner_utils and refactored Taskset and Harness initialization to use mixin-based configuration [11dfe5d]
  • Changed loader function and constructor signatures across the framework to accept Optional[Config] | None instead of constructing config objects as default values, and updated Env.from_config and Env.loader to accept env_config type parameter [11dfe5d]
  • Removed public add_metric, add_reward, add_advantage, add_toolset, add_stop, add_setup, add_update, and add_cleanup methods from Taskset and Harness classes [11dfe5d]
  • Added explicit_config_data and resolved_config_data functions to verifiers.v1.utils.config_utils and updated config data extraction logic [11dfe5d]
  • Introduced generic CommandHarness base class in verifiers.v1.packages.harnesses.command and refactored specific command-based harnesses to extend it [11dfe5d]
  • Implemented _configure_from_config hooks in environment-specific tasksets and harnesses to add default toolsets and rewards when not explicitly provided in config [11dfe5d]
  • Refactored HarborTaskset to read runtime values directly from self.config instead of mirrored instance attributes [11dfe5d]
  • Updated all documentation files and examples to reflect new loader signatures with Optional config parameters defaulting to None and explicit env_config parameter in vf.Env.from_config [11dfe5d]
  • Replaced verifiers-v1-loaders-require-config Semgrep rule with verifiers-v1-no-config-object-defaults rule in .semgrep/verifiers.yml [11dfe5d]
  • Removed CallableConfigEntry type alias and updated references to use CallableEntry directly [11dfe5d]
  • Updated test imports and helpers in tests/test_v1_harbor_cli.py to reference renamed default constants from verifiers.v1.packages.harnesses.configs [11dfe5d]
  • Simplified environment loaders to directly reference taskset and harness classes instead of wrapper functions [11dfe5d]
  • Removed load_taskset loader functions across environments and updated callers to directly instantiate taskset classes [4e0caee]
  • Moved default runtime owner attribute initializations from Harness and Taskset classes to RuntimeOwnerMixin [4e0caee]
  • Refactored CommandHarness to remove hook methods and change runtime configuration [4e0caee]
  • Replaced config_data and model_config_data wrapper functions with their explicit counterparts [4e0caee]
  • Changed harness utility function signatures to require explicit parameters instead of defaults [4e0caee]
  • Refactored RLM harness initialization and build script to use direct constant references [4e0caee]
  • Removed helper functions and changed default data sources in mcp_search_env [4e0caee]
  • Removed CallableConfigEntry type alias and its dependency [4e0caee]
  • Updated test assertions to work with direct taskset instantiation and removed loader function monkeypatching [4e0caee]
  • Modified verifiers.v1.env.Env.config classmethod to validate that at least one configuration type can be inferred or provided [b38f1f2]
  • Added test coverage for configuration type validation when using plain builder functions without _config_cls attributes [b38f1f2]
  • Added task_names property to HarborTaskset class [9684c24]
  • Added cpu_cores property to HarborTaskset class [9684c24]
  • Modified Env.config classmethod to conditionally require taskset and harness fields based on whether their respective config classes declare required fields [d5cef6e]
  • Added test test_env_config_allows_required_child_configs to verify conditional requirement behavior for nested config fields [d5cef6e]
  • Changed configuration object instantiation in bfcl_v3.load_environment to use explicit_config_data() instead of model_dump(exclude_none=True) [2580825]
  • Added test assertions to verify rewards field is not in model_fields_set and validates resolved reward name [2580825]
  • Migrated from pydantic-config to prime-pydantic-config package dependency [b61b2cf]
  • Reworked vf.Harness construction to use vf.HarnessConfig passed via a 'config' parameter instead of passing program and sandbox parameters directly [41ce81c]
  • Changed program bindings and channel definitions to reference callable functions via 'fn' string identifiers [41ce81c]
  • Updated TasksetConfig examples to declare objects and bindings for answer extractor [41ce81c]
  • Added 'index' object entries and bindings to search toolset TOML examples [41ce81c]
  • Added child tool example defining a Toolset with object factory and binding for a child Harness [41ce81c]
  • Added generic type parameters to base class declarations for vf.Taskset and vf.Harness [33be5ca]
  • Implemented automatic derivation of system_prompt in MathPythonTasksetConfig from harness.pip_install_packages [33be5ca]
  • Added tests verifying MathPython v1 environment system_prompt derivation behavior [33be5ca]
  • Added validation to enforce that the write parameter in Toolset.__init__ must be a boolean value, raising a TypeError with the message 'Toolset write must be a boolean.' when a non-boolean value is provided [4a0b0eb]
  • Added runtime type validation for the write parameter in verifiers.v1.toolset.toolset_from_mapping [4d82131]
  • Added teardown handler configuration support to LifecycleConfig class and RuntimeOwnerMixin mixin [ce2dc61]
  • Added test coverage for teardown handler configuration and execution in v1 harness and taskset [ce2dc61]
  • Replaced concrete vf.Taskset and vf.Harness classes with generic parameterized classes vf.Taskset[Config] and vf.Harness[Config] and introduced typed config subclasses pattern requiring TasksetConfig, HarnessConfig, and MyEnvConfig subclasses bound via class definitions like class MyTaskset(vf.Taskset[MyTasksetConfig]) [ffdb48f]
  • Replaced load_taskset and load_harness loader functions with load_environment(config: MyEnvConfig | None = None) -> vf.Env entrypoint pattern using vf.Env.from_config(...) or vf.Env.loader(...) for environment construction [ffdb48f]
  • Introduced TasksetConfig.objects and TasksetConfig.bindings pattern for shared extractor and factory import references in config classes [ffdb48f]
  • Updated evaluation override patterns to distinguish between legacy v0 constructor kwargs and v1 config-based overrides via config.taskset and config.harness [ffdb48f]
  • Changed the MyEnvConfig parameter in the example function from a default instance to an optional parameter, and modified the vf.Env.from_config call to accept env_config=MyEnvConfig as an explicit argument alongside the existing taskset=MyTaskset argument [eddd017]

Macroscope summarized a1c64f8.

Comment thread verifiers/v1/config.py
@xeophon xeophon force-pushed the codex/remove-v1-config-classes branch from 39cffcd to 29c5099 Compare May 15, 2026 13:49
Comment thread environments/hello_rlm_v1/hello_rlm_v1.py Outdated
@xeophon xeophon force-pushed the codex/remove-v1-config-classes branch from 29c5099 to 1aac155 Compare May 15, 2026 14:02
Comment thread verifiers/v1/config.py
Comment thread verifiers/v1/taskset.py Outdated
@xeophon xeophon force-pushed the codex/remove-v1-config-classes branch from 1aac155 to 8f4c124 Compare May 15, 2026 14:19
Comment thread verifiers/v1/packages/harnesses/rlm.py
@xeophon xeophon force-pushed the codex/remove-v1-config-classes branch from 8f4c124 to e6a8fd4 Compare May 15, 2026 17:09
@xeophon xeophon force-pushed the codex/remove-v1-config-classes branch from e6a8fd4 to 43e463f Compare May 15, 2026 20:25
@xeophon xeophon changed the title Remove v1 harness and taskset config classes Rework v1 harness and taskset config classes May 15, 2026
Comment thread environments/math_python/math_python_v1.py Outdated
Comment thread verifiers/v1/utils/taskset_utils.py Outdated
Comment thread verifiers/v1/utils/taskset_utils.py
Comment thread verifiers/v1/toolset.py
@xeophon xeophon marked this pull request as ready for review May 16, 2026 18:53
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78e218e570

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
Comment thread environments/math_python/math_python.py
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 16, 2026

Approvability

Verdict: Needs human review

Diff is too large for automated approval analysis. A human reviewer should evaluate this PR.

You can customize Macroscope's approvability policy. Learn more.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fbba3e059

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/config.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d5ef6884

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/taskset.py Outdated
Comment thread docs/overview.md Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f39d0f0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/packages/harnesses/opencode.py Outdated
Comment thread verifiers/utils/env_utils.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8f8f1c3b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/config.py Outdated
mikasenghaas
mikasenghaas previously approved these changes May 17, 2026
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧑‍🍳🧑‍🍳

Comment thread assets/lab/environments/AGENTS.md Outdated
Comment thread assets/lab/environments/AGENTS.md
Comment thread environments/alphabet_sort/alphabet_sort_v1.py Outdated
Comment thread environments/hello_group_reward_v1/hello_group_reward_v1.py Outdated
Comment thread environments/bfcl_v3/bfcl_v3.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec0d15d9dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/taskset.py Outdated
Comment thread tests/test_envs.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9684c24c48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/env.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5cef6e1bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread environments/bfcl_v3/bfcl_v3.py
Comment thread docs/byo-harness.md
Comment thread environments/dspy_rlm/dspy_rlm.py
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41ce81c5cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread environments/math_python/math_python_v1.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33be5ca1ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/toolset.py Outdated
chatgpt-codex-connector[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce2dc6162c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

extra_config_specs: list[str] | None = None
install_python: bool = True
system_prompt: PromptInput | None = None
sandbox: SandboxConfig | None = SandboxConfig()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve command harness sandbox defaults

When these command harness configs are constructed with defaults, CommandHarness.sandbox_value() only falls back to True when config.sandbox is None; this SandboxConfig() value is instead merged over DEFAULT_COMMAND_SANDBOX, so MiniSWEAgent/Pi/Terminus2 default runs now inherit the generic sandbox timeout_minutes=60 instead of the command harness default 120. Long-running agent tasks that previously had the packaged 2-hour sandbox budget can be terminated after 1 hour unless users explicitly override the sandbox.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is fine, sandbox > harness

Comment thread docs/environments.md
Comment thread README.md Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit eddd017. Configure here.



def load_train_rows(num_train_examples: int):
return load_rows("train", num_train_examples)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source functions require parameters but have no defaults

Medium Severity

load_train_rows(num_train_examples: int) and load_eval_rows(num_eval_examples: int) have required parameters with no defaults. When used as _default_source / _default_eval_source, they're called via rows_from_source which injects matching config fields as kwargs. This works when the config has those fields. However, the parameter num_train_examples has no default, so if the config field name ever diverges or the source is called outside the config injection path, it will raise a TypeError. The old code used lambdas that closed over config values directly, making the coupling explicit. This applies identically to both dspy_rlm.py and openai_agents_env.py.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit eddd017. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants