Rework v1 harness and taskset config classes by xeophon · Pull Request #1392 · PrimeIntellect-ai/verifiers

xeophon · 2026-05-15T13:23:43Z

Summary

rebuild v1 config models on strict Pydantic config models with explicit nested defaults (taskset: MyTasksetConfig = MyTasksetConfig(), harness: MyHarnessConfig = MyHarnessConfig())
make v1 taskset/harness construction config-only and fresh by default: public constructors/loaders take config=None instead of constructing config objects in function signatures
bind taskset/harness config types through generics (Taskset[MyTasksetConfig], Harness[MyHarnessConfig]) instead of a public config_type hook
keep configs serializable by using primitives, containers, config models, and import-ref strings only
keep runtime defaults on taskset/harness classes (_default_rewards, _default_program, _default_toolsets, etc.) so users do not have to write source/reward/program paths in every config
centralize repeated runtime-owner setup for tasksets/harnesses: config coercion, default precedence, toolset merging, lifecycle handler merging, and add_* mutation APIs now share one internal path
centralize packaged command harness setup in CommandHarness, so OpenCode/MiniSWEAgent/Pi/Terminus2 share command-program and sandbox wiring
add vf.Env.config(...), vf.Env.loader(...), and vf.Env.from_config(...) for the common typed env wiring paths, and use them in examples and simple v1 envs
update TOML/env loading so raw config mappings validate through the annotated EnvConfig child config types at loader boundaries

Concrete env shape

Config classes expose the knobs users change; taskset/harness classes own the implementation. vf.Env.config(...) builds the typed envelope from those classes, and vf.Env.loader(...) gives the package its load_environment entrypoint.

import verifiers as vf


@vf.reward(weight=1.0)
async def exact_answer(task, state) -> float:
    return float(task["answer"] in str(state.get("completion") or ""))


class ReverseTasksetConfig(vf.TasksetConfig):
    split: str = "train"


class ReverseTaskset(vf.Taskset[ReverseTasksetConfig]):
    _default_rewards = (exact_answer,)

    def rows(self) -> list[dict[str, object]]:
        rows = [
            {
                "prompt": [{"role": "user", "content": "Reverse abc."}],
                "answer": "cba",
                "split": "train",
            },
            {
                "prompt": [{"role": "user", "content": "Reverse prime."}],
                "answer": "emirp",
                "split": "test",
            },
        ]
        return [row for row in rows if row["split"] == self.config.split]


class ReverseHarnessConfig(vf.HarnessConfig):
    max_turns: int = 2


class ReverseHarness(vf.Harness[ReverseHarnessConfig]):
    pass


ReverseEnvConfig = vf.Env.config(taskset=ReverseTaskset, harness=ReverseHarness)
load_environment = vf.Env.loader(
    taskset=ReverseTaskset,
    harness=ReverseHarness,
    env_config=ReverseEnvConfig,
)

For hand-written loaders, keep config defaults fresh and route through the typed envelope:

def load_environment(config: ReverseEnvConfig | None = None) -> vf.Env:
    return vf.Env.from_config(
        config,
        taskset=ReverseTaskset,
        harness=ReverseHarness,
        env_config=ReverseEnvConfig,
    )

Building configs in Python

# Use defaults.
default_config = ReverseEnvConfig()

# Override typed child configs directly.
eval_config = ReverseEnvConfig(
    taskset=ReverseTasksetConfig(split="test"),
    harness=ReverseHarnessConfig(max_turns=4),
)

# Validate raw mappings at the config-loader boundary.
raw_config = ReverseEnvConfig.from_config(
    {
        "taskset": {"split": "test"},
        "harness": {"max_turns": 4},
    }
)

env = load_environment(raw_config)

TOML override shape

Users only write the fields they want to override. The environment implementation remains in Python.

[[eval]]
id = "reverse-v1"
model = "openai/gpt-5.4-mini"
sampling_args = { max_tokens = 1024 }

[eval.taskset]
split = "test"

[eval.harness]
max_turns = 4

Hosted training/RL uses the same child sections:

model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
rollouts_per_example = 8

[[env]]
id = "primeintellect/reverse-v1"

[env.taskset]
split = "train"

[env.harness]
max_turns = 2

Review follow-up

route legacy v1=True wrapper branches through typed v1 env configs instead of old mirrored kwargs
normalize raw mapping show/hide strings as single tool names
preserve full config aliases in source-loader kwargs and avoid passing positional-only parameters by keyword
register file-loaded modules in sys.modules where import-ref config defaults need them during tests
bind taskset/harness config types through generics and share that inference internally
move default source/reward/program/toolset behavior out of config fields and onto taskset/harness classes
replace repeated loader boilerplate with vf.Env.from_config(...) / vf.Env.loader(...) where construction is standard
remove signature-time config construction and enforce config=None with an updated v1 semgrep policy
factor shared taskset/harness lifecycle setup into RuntimeOwnerMixin
factor packaged CLI harness runtime setup into CommandHarness

Validation

uv run --frozen ruff check --fix .
uv run --frozen ruff format
uv run --frozen pytest tests/test_v1_config_extension.py tests/test_v1_harbor_cli.py tests/test_v1_runtime_lifecycle.py tests/test_v1_taskset_bindings.py tests/test_v1_bfcl.py tests/test_mcp_search_env.py tests/test_langchain_deep_agents_wikispeedia.py tests/test_v1_rlm_swe.py (225 passed)
uv run --frozen pre-commit run --all-files
push hooks: ruff, format, Semgrep v1 policy, AGENTS sync, ty

Notes

Backwards compatibility is intentionally not preserved for the old mirrored v1 constructor kwargs.
Config objects are the serialization boundary. Runtime APIs such as add_reward(...), class defaults such as _default_rewards, and methods such as rows() may still use live Python objects because they are not serialized config.
pyproject.toml and uv.lock are intentionally untouched by the latest refactor pass.

Note

Medium Risk
Medium risk because it changes the public v1 environment loading/constructor surface (config defaults, loader signatures, and class-based defaults), which can break downstream environments that still pass kwargs or build config objects in function signatures.

Overview
Standardizes the v1 authoring/loading pattern around typed config envelopes: environments now define TasksetConfig/HarnessConfig/EnvConfig defaults explicitly, implement tasksets/harnesses as Taskset[Config] and Harness[Config], and construct envs via vf.Env.from_config(...) or vf.Env.loader(...) rather than bespoke load_taskset/load_harness wrappers.

Updates multiple bundled environments (e.g., alphabet_sort_v1, math_python_v1, wiki_search_v1, reverse_text_v1, BFCL, MCP search, nested harness, etc.) to move runtime defaults onto class attributes (_default_source, _default_rewards, _default_program, _default_toolsets, etc.), tighten config validation/derivation (including rejecting unknown v1 wrapper kwargs), and adjust reward/toolset wiring to be config-serializable.

Docs/examples/tests are rewritten to match the new API (including Env.config usage and explicit nested config defaults), Semgrep adds a new rule forbidding config-object construction in function signatures, and the project adds prime-pydantic-config[toml] as a dependency.

^{Reviewed by Cursor Bugbot for commit eddd017. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Require config objects for Taskset and Harness construction in v1

Taskset and Harness constructors now accept only a single typed config object (TasksetConfig/HarnessConfig); all previous per-argument overrides are removed.
Introduces CallableConfig and SignalConfig for declaring callables and scoring entries declaratively in config, replacing direct callable/object injection.
All built-in environments (alphabet_sort, math_python, wiki_search, reverse_text, bfcl, harbor, etc.) are updated to define TasksetConfig/HarnessConfig/EnvConfig subclasses with defaults, and their loader functions now accept typed config objects.
All harness subclasses (OpenCode, MiniSWEAgent, Pi, RLM, Terminus2) are refactored to use _configure_runtime for post-construction runtime setup instead of constructor kwargs.
Config classes now enforce serializable values via validate_serializable_value; callables and PathLike objects are rejected at construction time and must be provided as import-ref strings.
Risk: any code constructing Taskset, Harness, or harness subclasses with keyword arguments will break; configs containing callables or Path objects will raise TypeError at validation time.

Changes since #1392 opened

Modified constructors of MiniSWEAgent, OpenCode, Pi, RLM, and Terminus2 harnesses to defer program override handling [00be9b2]
Added base_harness_config utility function to verifiers.v1.packages.harnesses.command module [00be9b2]
Changed prepare_typed_env_config utility to use from_config instead of model_validate for config coercion [00be9b2]
Added test coverage for environment loading config coercion and packaged command harness program override behavior [00be9b2]
Converted Taskset and Harness base classes to generic classes [2c0691b]
Migrated taskset implementations to use generic type binding [2c0691b]
Migrated harness implementations to use generic type binding [2c0691b]
Replaced dynamic config type resolution with direct class references in config coercion [2c0691b]
Added tests validating generic config binding behavior [2c0691b]
Updated test utilities and fixtures to use generic type binding [2c0691b]
Updated documentation and code examples to demonstrate generic-based config binding [2c0691b]
Refactored Harness.__init_subclass__ and Taskset.__init_subclass__ to extract generic type arguments from __orig_bases__, inherit _config_cls from base classes with defaults to HarnessConfig and TasksetConfig respectively, and validate that the resolved config class is a subclass of the appropriate base config type, raising TypeError if validation fails [25857e6]
Changed Harness.__init__ and Taskset.__init__ constructors to always coerce config through type(self)._config_cls.from_config(config), removing the previous heuristic that would switch to type(config) when _config_cls was the base config class [25857e6]
Reworked make_taskset and make_harness test helpers to normalize input config through TasksetConfig.from_config and HarnessConfig.from_config respectively, build data dictionaries from base_config.model_dump(exclude_none=True) overlaid with runtime values, and instantiate using type(base_config).from_config(config_value(data)) instead of dynamically selecting config classes and calling model_validate [25857e6]
Removed config_data utility helper from test infrastructure [25857e6]
Added explicit type annotation to local variable in verifiers.v1.harness.Harness.__init__ method [0d70fe0]
Implemented class-level default mechanism in v1 framework base classes [264ca51]
Modified config serialization to preserve child class defaults [264ca51]
Migrated environment modules to use Taskset and Harness subclasses with class-level defaults [264ca51]
Refactored toolset configuration from embedded mappings to scalar fields with auto-addition [264ca51]
Removed dependency injection from reward functions [264ca51]
Added validation to reject unsupported configuration parameters [264ca51]
Updated documentation and examples to demonstrate class-level defaults pattern [264ca51]
Added test coverage for config extension and default behavior [264ca51]
Moved program configuration from config class to harness class [e8a0bff]
Modified toolset.Toolset.__init__ constructor to resolve config objects into actual instances [58105f6]
Introduced local BaseConfig class with Pydantic extra field validation [58105f6]
Moved default rewards from EnvTasksetConfig field to EnvTaskset class attribute [58105f6]
Updated pydantic-config dependency specification [58105f6]
Refactored tag extraction in lcs_reward_func to use module-level constant [58105f6]
Added test for inline toolset object reference resolution [58105f6]
Changed harness instantiation class in documentation example from vf.Harness to ReplayHarness [fa30096]
Refactored WikiSearchTasksetConfig initialization to use direct field assignment instead of explicit toolsets mapping, and added conditional logic to wiki_search_v1.load_taskset to only register default 'wiki' toolset when toolsets field is not present in config [fdab488]
Replaced local BaseConfig class implementation with BaseConfig imported from pydantic_config package in verifiers v1 config module [fdab488]
Changed pydantic-config dependency from version-constrained package to Git repository source [fdab488]
Added test utilities and test cases for WikiSearch v1 taskset behavior including stub installation for external dependencies and validation of default and explicit toolset registration [fdab488]
Modified user resolution logic in Harness.__init__ and Taskset.__init__ constructors to honor non-None default user values specified on config models [c418244]
Added tests to verify that default users specified in config models are properly activated in Taskset and Harness instances [c418244]
Introduced ConfigBound[ConfigT] base class and refactored Taskset and Harness to inherit from it [a0210fd]
Added Env.from_config classmethod for standardized environment construction from config and class references [a0210fd]
Removed load_taskset and load_harness functions from all environment modules and reimplemented load_environment to delegate to Env.from_config [a0210fd]
Updated documentation, examples, and scaffolding templates to use Env.from_config construction pattern [a0210fd]
Reworked test cases to validate generic config binding inference and Env.from_config construction behavior [a0210fd]
Changed taskset and harness instantiation pattern to allow direct construction without requiring separate config classes or a rows method [f157d46]
Removed base config import and field overlay functionality from configuration system [f157d46]
Consolidated test validation of generic config binding, type inference, and Env.from_config behavior [f157d46]
Refactored test stub infrastructure to use a generic kwargs-accepting stub class [f157d46]
Consolidated wiki search test validation of default and explicit toolset behavior [f157d46]
Added factory methods Env.config and Env.loader to the Env class in verifiers.v1.env module [dba9f3f]
Extended Env.from_config classmethod to accept mapping-based configs and widened taskset and harness type signatures [dba9f3f]
Replaced environment loader implementations across multiple environment modules to use vf.Env.loader factory [dba9f3f]
Replaced or introduced EnvConfig classes using vf.Env.config factory in environment modules [dba9f3f]
Removed generic type parameters from Taskset and Harness subclasses across multiple environment modules [dba9f3f]
Updated test cases to pass mapping configs instead of config object instances to environment loaders [dba9f3f]
Removed utility functions config_model_mapping and omit_none from verifiers.v1 modules [dba9f3f]
Removed tau2-related test cases and helper stubs from test suite [dba9f3f]
Modified default resolution logic in Harness.__init__ and Taskset.__init__ to prioritize config class defaults over class-level defaults [3d82213]
Refactored judge configuration in wiki_search_v1 from task-level to factory-level by introducing judge_reward_factory and updating load_taskset [3d82213]
Added tests verifying config class defaults take precedence over class-level defaults for both taskset and harness configurations [3d82213]
Extended test_wiki_search_v1_default_and_explicit_toolsets to verify default reward presence and task row structure [3d82213]
Changed model_dump calls in environments.bfcl_v3.load_environment to pass exclude_none=True instead of exclude_unset=True when converting base_taskset_config and base_harness_config to dictionaries for validation into BFCLTasksetConfig and BFCLHarnessConfig [ec0d15d]
Refactored verifiers.v1.harness.Harness.__init__ to accept an optional config parameter instead of instantiating a default HarnessConfig at function definition time [ec0d15d]
Introduced RuntimeOwnerMixin class in verifiers.v1.utils.runtime_owner_utils and refactored Taskset and Harness initialization to use mixin-based configuration [11dfe5d]
Changed loader function and constructor signatures across the framework to accept Optional[Config] | None instead of constructing config objects as default values, and updated Env.from_config and Env.loader to accept env_config type parameter [11dfe5d]
Removed public add_metric, add_reward, add_advantage, add_toolset, add_stop, add_setup, add_update, and add_cleanup methods from Taskset and Harness classes [11dfe5d]
Added explicit_config_data and resolved_config_data functions to verifiers.v1.utils.config_utils and updated config data extraction logic [11dfe5d]
Introduced generic CommandHarness base class in verifiers.v1.packages.harnesses.command and refactored specific command-based harnesses to extend it [11dfe5d]
Implemented _configure_from_config hooks in environment-specific tasksets and harnesses to add default toolsets and rewards when not explicitly provided in config [11dfe5d]
Refactored HarborTaskset to read runtime values directly from self.config instead of mirrored instance attributes [11dfe5d]
Updated all documentation files and examples to reflect new loader signatures with Optional config parameters defaulting to None and explicit env_config parameter in vf.Env.from_config [11dfe5d]
Replaced verifiers-v1-loaders-require-config Semgrep rule with verifiers-v1-no-config-object-defaults rule in .semgrep/verifiers.yml [11dfe5d]
Removed CallableConfigEntry type alias and updated references to use CallableEntry directly [11dfe5d]
Updated test imports and helpers in tests/test_v1_harbor_cli.py to reference renamed default constants from verifiers.v1.packages.harnesses.configs [11dfe5d]
Simplified environment loaders to directly reference taskset and harness classes instead of wrapper functions [11dfe5d]
Removed load_taskset loader functions across environments and updated callers to directly instantiate taskset classes [4e0caee]
Moved default runtime owner attribute initializations from Harness and Taskset classes to RuntimeOwnerMixin [4e0caee]
Refactored CommandHarness to remove hook methods and change runtime configuration [4e0caee]
Replaced config_data and model_config_data wrapper functions with their explicit counterparts [4e0caee]
Changed harness utility function signatures to require explicit parameters instead of defaults [4e0caee]
Refactored RLM harness initialization and build script to use direct constant references [4e0caee]
Removed helper functions and changed default data sources in mcp_search_env [4e0caee]
Removed CallableConfigEntry type alias and its dependency [4e0caee]
Updated test assertions to work with direct taskset instantiation and removed loader function monkeypatching [4e0caee]
Modified verifiers.v1.env.Env.config classmethod to validate that at least one configuration type can be inferred or provided [b38f1f2]
Added test coverage for configuration type validation when using plain builder functions without _config_cls attributes [b38f1f2]
Added task_names property to HarborTaskset class [9684c24]
Added cpu_cores property to HarborTaskset class [9684c24]
Modified Env.config classmethod to conditionally require taskset and harness fields based on whether their respective config classes declare required fields [d5cef6e]
Added test test_env_config_allows_required_child_configs to verify conditional requirement behavior for nested config fields [d5cef6e]
Changed configuration object instantiation in bfcl_v3.load_environment to use explicit_config_data() instead of model_dump(exclude_none=True) [2580825]
Added test assertions to verify rewards field is not in model_fields_set and validates resolved reward name [2580825]
Migrated from pydantic-config to prime-pydantic-config package dependency [b61b2cf]
Reworked vf.Harness construction to use vf.HarnessConfig passed via a 'config' parameter instead of passing program and sandbox parameters directly [41ce81c]
Changed program bindings and channel definitions to reference callable functions via 'fn' string identifiers [41ce81c]
Updated TasksetConfig examples to declare objects and bindings for answer extractor [41ce81c]
Added 'index' object entries and bindings to search toolset TOML examples [41ce81c]
Added child tool example defining a Toolset with object factory and binding for a child Harness [41ce81c]
Added generic type parameters to base class declarations for vf.Taskset and vf.Harness [33be5ca]
Implemented automatic derivation of system_prompt in MathPythonTasksetConfig from harness.pip_install_packages [33be5ca]
Added tests verifying MathPython v1 environment system_prompt derivation behavior [33be5ca]
Added validation to enforce that the write parameter in Toolset.__init__ must be a boolean value, raising a TypeError with the message 'Toolset write must be a boolean.' when a non-boolean value is provided [4a0b0eb]
Added runtime type validation for the write parameter in verifiers.v1.toolset.toolset_from_mapping [4d82131]
Added teardown handler configuration support to LifecycleConfig class and RuntimeOwnerMixin mixin [ce2dc61]
Added test coverage for teardown handler configuration and execution in v1 harness and taskset [ce2dc61]
Replaced concrete vf.Taskset and vf.Harness classes with generic parameterized classes vf.Taskset[Config] and vf.Harness[Config] and introduced typed config subclasses pattern requiring TasksetConfig, HarnessConfig, and MyEnvConfig subclasses bound via class definitions like class MyTaskset(vf.Taskset[MyTasksetConfig]) [ffdb48f]
Replaced load_taskset and load_harness loader functions with load_environment(config: MyEnvConfig | None = None) -> vf.Env entrypoint pattern using vf.Env.from_config(...) or vf.Env.loader(...) for environment construction [ffdb48f]
Introduced TasksetConfig.objects and TasksetConfig.bindings pattern for shared extractor and factory import references in config classes [ffdb48f]
Updated evaluation override patterns to distinguish between legacy v0 constructor kwargs and v1 config-based overrides via config.taskset and config.harness [ffdb48f]
Changed the MyEnvConfig parameter in the example function from a default instance to an optional parameter, and modified the vf.Env.from_config call to accept env_config=MyEnvConfig as an explicit argument alongside the existing taskset=MyTaskset argument [eddd017]

^{Macroscope summarized a1c64f8.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78e218e570

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

macroscopeapp · 2026-05-16T19:02:04Z

Approvability

Verdict: Needs human review

Diff is too large for automated approval analysis. A human reviewer should evaluate this PR.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6fbba3e059

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19d5ef6884

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f39d0f0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8f8f1c3b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mikasenghaas

🧑‍🍳🧑‍🍳

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec0d15d9dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9684c24c48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5cef6e1bc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41ce81c5cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33be5ca1ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce2dc6162c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T09:24:26Z

+    extra_config_specs: list[str] | None = None
+    install_python: bool = True
+    system_prompt: PromptInput | None = None
+    sandbox: SandboxConfig | None = SandboxConfig()


Preserve command harness sandbox defaults

When these command harness configs are constructed with defaults, CommandHarness.sandbox_value() only falls back to True when config.sandbox is None; this SandboxConfig() value is instead merged over DEFAULT_COMMAND_SANDBOX, so MiniSWEAgent/Pi/Terminus2 default runs now inherit the generic sandbox timeout_minutes=60 instead of the command harness default 120. Long-running agent tasks that previously had the packaged 2-hour sandbox budget can be terminated after 1 hour unless users explicitly override the sandbox.

Useful? React with 👍 / 👎.

i think this is fine, sandbox > harness

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit eddd017. Configure here.}

cursor · 2026-05-19T09:51:28Z



+def load_train_rows(num_train_examples: int):
+    return load_rows("train", num_train_examples)


Source functions require parameters but have no defaults

Medium Severity

load_train_rows(num_train_examples: int) and load_eval_rows(num_eval_examples: int) have required parameters with no defaults. When used as _default_source / _default_eval_source, they're called via rows_from_source which injects matching config fields as kwargs. This works when the config has those fields. However, the parameter num_train_examples has no default, so if the config field name ever diverges or the source is called outside the config injection path, it will raise a TypeError. The old code used lambdas that closed over config values directly, making the coupling explicit. This applies identically to both dspy_rlm.py and openai_agents_env.py.

Additional Locations (1)

environments/openai_agents_env/openai_agents_env.py#L75-L77

^{Reviewed by Cursor Bugbot for commit eddd017. Configure here.}

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread verifiers/v1/config.py

xeophon force-pushed the codex/remove-v1-config-classes branch from 39cffcd to 29c5099 Compare May 15, 2026 13:49

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread environments/hello_rlm_v1/hello_rlm_v1.py Outdated

xeophon force-pushed the codex/remove-v1-config-classes branch from 29c5099 to 1aac155 Compare May 15, 2026 14:02

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread verifiers/v1/config.py

Comment thread verifiers/v1/taskset.py Outdated

xeophon force-pushed the codex/remove-v1-config-classes branch from 1aac155 to 8f4c124 Compare May 15, 2026 14:19

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread verifiers/v1/packages/harnesses/rlm.py

xeophon force-pushed the codex/remove-v1-config-classes branch from 8f4c124 to e6a8fd4 Compare May 15, 2026 17:09

Refactor v1 configs to config-only

43e463f

xeophon force-pushed the codex/remove-v1-config-classes branch from e6a8fd4 to 43e463f Compare May 15, 2026 20:25

xeophon changed the title ~~Remove v1 harness and taskset config classes~~ Rework v1 harness and taskset config classes May 15, 2026

macroscopeapp Bot reviewed May 15, 2026

View reviewed changes

Comment thread environments/math_python/math_python_v1.py Outdated

Comment thread verifiers/v1/utils/taskset_utils.py Outdated

Comment thread verifiers/v1/utils/taskset_utils.py

Comment thread verifiers/v1/toolset.py

Address v1 config PR comments

78e218e

xeophon marked this pull request as ready for review May 16, 2026 18:53

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

cursor Bot reviewed May 16, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

Comment thread environments/math_python/math_python.py

Reject unknown kwargs in v1 wrappers

6fbba3e

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread verifiers/v1/config.py Outdated

Keep v1 scoring config runtime-compatible

19d5ef6

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread verifiers/v1/taskset.py Outdated

xeophon added 2 commits May 17, 2026 15:43

Update v1 init templates to config-only

542c047

Remove init template tests

00f39d0

cursor Bot reviewed May 17, 2026

View reviewed changes

Comment thread docs/overview.md Outdated

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread verifiers/v1/packages/harnesses/opencode.py Outdated

Comment thread verifiers/utils/env_utils.py Outdated

xeophon added 3 commits May 17, 2026 16:00

Use taskset rows in v1 docs examples

7b5afe8

Restore init template tests

a1c64f8

Remove init template tests

f8f8f1c

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread verifiers/v1/config.py Outdated

mikasenghaas previously approved these changes May 17, 2026

View reviewed changes

Comment thread assets/lab/environments/AGENTS.md Outdated

Comment thread assets/lab/environments/AGENTS.md

Comment thread environments/alphabet_sort/alphabet_sort_v1.py Outdated

Comment thread environments/hello_group_reward_v1/hello_group_reward_v1.py Outdated

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread environments/bfcl_v3/bfcl_v3.py

xeophon mentioned this pull request May 18, 2026

VER-105 Add generic v1 Harbor Dockerfile environment setup support #1407

Closed

Address small v1 review comments

ec0d15d

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread verifiers/v1/taskset.py Outdated

Slim v1 config runtime wiring

11dfe5d

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread tests/test_envs.py

xeophon added 3 commits May 18, 2026 15:52

Remove stale v1 config wiring

4e0caee

Require config type for plain v1 env builders

b38f1f2

Restore Harbor config accessors

9684c24

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread verifiers/v1/env.py Outdated

Allow required v1 env child configs

d5cef6e

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread environments/bfcl_v3/bfcl_v3.py

xeophon added 2 commits May 18, 2026 19:56

Preserve BFCL default rewards when cloning configs

2580825

Use prime pydantic config package

b61b2cf

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread docs/byo-harness.md

Fix v1 docs hidden binding examples

41ce81c

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread environments/dspy_rlm/dspy_rlm.py

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread environments/math_python/math_python_v1.py Outdated

Address v1 config review comments

33be5ca

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

Comment thread verifiers/v1/toolset.py Outdated

xeophon added 2 commits May 19, 2026 10:58

Validate inline toolset write values

4a0b0eb

Validate mapped toolset write field

4d82131

This comment was marked as resolved.

Sign in to view

Run configured owner teardowns

ce2dc61

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread docs/environments.md

Update environment skills for v1 config API

ffdb48f

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread README.md Outdated

Fix README v1 config default

eddd017

cursor Bot reviewed May 19, 2026

View reviewed changes



		def load_train_rows(num_train_examples: int):
		return load_rows("train", num_train_examples)

Conversation

xeophon commented May 15, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Concrete env shape

Building configs in Python

TOML override shape

Review follow-up

Validation

Notes

Require config objects for Taskset and Harness construction in v1

Changes since #1392 opened

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

xeophon commented May 15, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented May 16, 2026 •

edited

Loading