Add benchmark validation script; tighten Ray worker env and add graceful shutdown by ydjplus · Pull Request #3 · ydjplus/Search-R1

ydjplus · 2026-04-23T11:32:28Z

Add a standalone batch benchmark validator for Search-R1 checkpoints to run validation/generation outside training loops.
Make Ray worker runtime more robust by limiting BLAS/OpenMP thread fanout to avoid exhausting host thread limits when many Ray workers exist.
Ensure Ray actors and placement groups are best-effort released after training/validation to avoid leaking resources.

Add benchmark_validate.py which implements a validation-only flow using RayPPOTrainer, an on-device RewardManager, LLMGenerationManager support for search/non-search modes, progress logging with tqdm, and per-datasource metric aggregation.
Introduce a shared _RAY_WORKER_ENV_VARS dict and use it in main_ppo.py and main_ppo_format.py to set ray.init(runtime_env={'env_vars': ...}) and replace the inline env dict.
Wrap calls to trainer.fit() in main_ppo.py and main_ppo_format.py with try/finally to call trainer.shutdown() on exit.
In verl/trainer/ppo/ray_trainer.py add import logging, an _is_shutdown flag, and a shutdown() method that attempts to ray.kill() worker actors and remove Ray placement groups via remove_placement_group to free resources.

Add tqdm progress output for benchmark validation

c5b6471

ydjplus added the codex label Apr 23, 2026 — with ChatGPT Codex Connector

Provide feedback