Skip to content

fix: distribute code_gen to Ray workers via working_dir#1000

Draft
agronskiy wants to merge 1 commit intomainfrom
agronskiy/gym-ray-evaluator-bringup
Draft

fix: distribute code_gen to Ray workers via working_dir#1000
agronskiy wants to merge 1 commit intomainfrom
agronskiy/gym-ray-evaluator-bringup

Conversation

@agronskiy
Copy link
Copy Markdown

@agronskiy agronskiy commented Apr 3, 2026

The idea

We try to allow passing Gym's contents to head ray's object store

Summary

  • Replace py_executable=sys.executable in check_correctness_remote's runtime_env with working_dir pointing to the Gym root
  • py_executable pointed into the eval container's venv — an absolute path that doesn't exist on vLLM deployment worker nodes, causing all Ray tasks to fall back to the head node with no distribution
  • working_dir has Ray zip and distribute the Gym source to all workers; pip: [".[dev]"] installs nemo-gym from that distributed source (no internet needed); PYTHONPATH keeps lcb_integration importable (no pyproject.toml)

…_executable

py_executable pointed into the eval container's venv which doesn't exist on
vLLM deployment worker nodes, causing all check_correctness_remote tasks to
fall back to the head node with no distribution.

Replace with Ray working_dir (Gym root) + pip install from distributed source
so workers in the deployment container get lcb_integration and nemo_gym without
requiring a pre-installed Gym venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

"**/.venv",
"**/venv",
"docs/",
"resources/*.png",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this exclude also *jsonl, cache, results, maybe other test related things?

"docs/",
"resources/*.png",
],
"pip": [".[dev]"],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably doesnt need dev right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could adding a pip install here be slower (on first task i guess)? since before it just use sys.executable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants