Skip to content

Add benchmark validation script; tighten Ray worker env and add graceful shutdown#3

Open
ydjplus wants to merge 1 commit into
mainfrom
codex/investigate-thread-resource-leak-in-training-script-yuay2y
Open

Add benchmark validation script; tighten Ray worker env and add graceful shutdown#3
ydjplus wants to merge 1 commit into
mainfrom
codex/investigate-thread-resource-leak-in-training-script-yuay2y

Conversation

@ydjplus
Copy link
Copy Markdown
Owner

@ydjplus ydjplus commented Apr 23, 2026

Motivation

  • Add a standalone batch benchmark validator for Search-R1 checkpoints to run validation/generation outside training loops.
  • Make Ray worker runtime more robust by limiting BLAS/OpenMP thread fanout to avoid exhausting host thread limits when many Ray workers exist.
  • Ensure Ray actors and placement groups are best-effort released after training/validation to avoid leaking resources.

Description

  • Add benchmark_validate.py which implements a validation-only flow using RayPPOTrainer, an on-device RewardManager, LLMGenerationManager support for search/non-search modes, progress logging with tqdm, and per-datasource metric aggregation.
  • Introduce a shared _RAY_WORKER_ENV_VARS dict and use it in main_ppo.py and main_ppo_format.py to set ray.init(runtime_env={'env_vars': ...}) and replace the inline env dict.
  • Wrap calls to trainer.fit() in main_ppo.py and main_ppo_format.py with try/finally to call trainer.shutdown() on exit.
  • In verl/trainer/ppo/ray_trainer.py add import logging, an _is_shutdown flag, and a shutdown() method that attempts to ray.kill() worker actors and remove Ray placement groups via remove_placement_group to free resources.

Testing

  • No automated tests were run as part of this change.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant