Skip to content

Add NeMo Gym integration#1396

Open
cdreetz wants to merge 3 commits into
mainfrom
nemo-gym-integration
Open

Add NeMo Gym integration#1396
cdreetz wants to merge 3 commits into
mainfrom
nemo-gym-integration

Conversation

@cdreetz
Copy link
Copy Markdown
Collaborator

@cdreetz cdreetz commented May 16, 2026

Summary

Adds Verifiers v1 support for running PyPI nemo-gym environments through NeMoGymTaskset and NeMoGymHarness.

What changed

  • Adds NeMo Gym taskset/harness integration.
  • Loads packaged NeMo Gym config/data by nemo_env.
  • Routes NeMo Gym policy model calls through the active Verifiers rollout endpoint.
  • Registers the Verifiers proxy as NeMo Gym's policy_model without spawning an extra custom model-server package/process.
  • Adds environments/nemo_gym_env as a runnable example.
  • Adds the optional verifiers[nemogym] dependency extra.

Validation

  • uv run ruff check verifiers/v1/packages/harnesses/nemo_gym.py tests/test_v1_nemo_gym_harness.py verifiers/__init__.py verifiers/v1/packages/harnesses/__init__.py pyproject.toml
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run pytest -p pytest_asyncio.plugin tests/test_v1_nemo_gym_harness.py
  • uv build --out-dir /tmp/nemo-vf-clean-pr-build
  • prime eval run nemo-gym-env ... -r 100 -c 20 on example_single_tool_call: reward 1.0, 100/100 completed
  • prime eval run nemo-gym-env ... -r 100 -c 20 on example_session_state_mgmt: reward 1.0, 100/100 completed

Note

Medium Risk
Introduces a new v1 harness/taskset plus an HTTP proxy and global-process lifecycle management for nemo-gym, which could impact rollout execution and dependency resolution (including new uv conflict rules).

Overview
Adds NeMo Gym integration for Verifiers v1 via new NeMoGymTaskset/NeMoGymHarness exports, allowing rollouts to run packaged nemo-gym JSONL tasks and configs.

The harness starts/tears down a persistent NeMo Gym server stack and injects a local OpenAI Responses-compatible proxy (NeMoGymModelProxy) that routes each rollout’s model calls back through the active Verifiers endpoint (including support for concurrent rollouts via per-rollout model routing).

Includes a runnable example environment nemo-gym-env, a new optional dependency extra verifiers[nemogym] with uv conflict constraints, expanded OpenAI Responses usage serialization (token detail fields), and a small endpoint parsing tweak to treat role="developer" as a system message; adds comprehensive proxy/runner unit tests.

Reviewed by Cursor Bugbot for commit 2caa939. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add NeMo Gym integration with NeMoGymHarness, NeMoGymTaskset, and a local model proxy

  • Adds NeMoGymHarness and NeMoGymTaskset to run NeMo Gym tasks within the Verifiers harness lifecycle, including dataset ingestion, task normalization, and result mapping (completion, reward, metrics) back onto State.
  • Introduces PersistentNeMoGymRunner that starts a long-lived NeMo Gym server stack once per harness and routes each rollout through a per-rollout proxy model, avoiding repeated process startup overhead.
  • Adds NeMoGymModelProxy, an aiohttp-based local proxy exposing /v1 endpoints that dynamically routes OpenAI Responses API requests to the correct upstream server per active rollout using per-route api_key/model mappings.
  • Skips the NeMo Gym policy model process via skip_nemo_gym_policy_model_process and substitutes the Verifiers proxy endpoint instead, so the external LLM provider is used directly.
  • Adds a new nemo-gym-env environment package under environments/nemo_gym_env with a load_environment factory and a pyproject.toml with default eval config.
  • Risk: NeMo Gym integration requires Python ≥ 3.12 and conflicts with openenv and dev extras due to dependency incompatibilities.

Macroscope summarized 2caa939.

Comment thread verifiers/v1/packages/harnesses/__init__.py
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 16, 2026

Approvability

Verdict: Needs human review

Unable to check for correctness in 2caa939. This PR introduces a new NeMo Gym integration with substantial new code (~1000+ lines) including async proxy infrastructure and lifecycle management. Additionally, there is an unresolved review comment identifying potential concurrency issues with shared module-level globals that could cause race conditions across event loops.

You can customize Macroscope's approvability policy. Learn more.

Comment thread verifiers/v1/packages/harnesses/nemo_gym.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2caa939. Configure here.

PROXY_MODEL_NAME = "verifiers-nemo-gym-proxy"
_NEMO_GYM_GLOBALS_LOCK = asyncio.Lock()
_NEMO_GYM_ACTIVE_RUNNERS = 0
_NEMO_GYM_OWNS_AIOHTTP_CLIENT = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent runners share mutable globals without per-loop safety

Medium Severity

_NEMO_GYM_ACTIVE_RUNNERS and _NEMO_GYM_OWNS_AIOHTTP_CLIENT are module-level mutable globals modified via global declarations in both _ensure_started and teardown. While _NEMO_GYM_GLOBALS_LOCK guards some accesses, _ensure_started reads and writes _NEMO_GYM_ACTIVE_RUNNERS and _NEMO_GYM_OWNS_AIOHTTP_CLIENT in its error handler (lines 316–318) inside the lock, but teardown acquires the same lock separately. If the module is imported in a different event loop or process (e.g. via Ray workers), the module-level asyncio.Lock won't provide cross-loop protection, letting two runners corrupt the shared counter or double-close the aiohttp client.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2caa939. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant