add rl benchmark by liguilong256 · Pull Request #189 · DexForce/EmbodiChain

liguilong256 · 2026-03-19T02:28:19Z

Description

add rl benchmark and log the evaluation results to wandb.By running python run_benchmark.py, you can obtain the final report(including Benchmark Overview Algorithm Performance Some Plots Stability Analysis 、、)

Type of change

add rl benchmark
log the evaluation results to wandb
add the grpo to push_cube enviroment

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated, if applicable.

Copilot

Pull request overview

Adds a new RL benchmarking framework (tasks/suites/algorithms specs + runner/runtime + reporting/plots) and integrates trainer-side history collection to support generating benchmark artifacts via python run_benchmark.py.

Changes:

Introduces benchmark/ module with config loading, training/eval runtime, aggregation/leaderboard metrics, SVG plot generation, and markdown reporting.
Adds CLI entrypoint (run_benchmark.py) and benchmark suites/tasks/algorithm specs (cart_pole + push_cube; ppo + grpo).
Extends Trainer to record train/eval histories, checkpoint path, and log additional eval metrics (incl. success rate).

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/benchmark/test_reporting.py	Adds unit test coverage for markdown report generation sections/content.
tests/benchmark/test_plots.py	Adds unit test coverage for SVG plot artifact generation.
tests/benchmark/test_metrics.py	Adds unit tests for stability/threshold metrics and aggregation.
tests/benchmark/test_leaderboard.py	Adds unit tests for leaderboard ranking/fields.
run_benchmark.py	Adds top-level script entrypoint delegating to benchmark runner CLI.
embodichain/agents/rl/utils/trainer.py	Records histories, returns structured summary, adds eval success/metrics logging, adjusts rollout buffer handling.
configs/agents/rl/push_cube/train_config.json	Updates push_cube PPO training config (eval/save freq, wandb, video path).
configs/agents/rl/push_cube/train_config_grpo.json	Adds GRPO training config for push_cube.
configs/agents/rl/push_cube/gym_config.json	Adjusts push_cube rewards and object physical parameters.
benchmark/tasks/push_cube.yaml	Adds benchmark task spec for push_cube.
benchmark/tasks/cart_pole.yaml	Adds benchmark task spec for cart_pole.
benchmark/tasks/init.py	Adds package marker for benchmark tasks.
benchmark/suites/smoke.yaml	Adds a small smoke benchmark suite spec.
benchmark/suites/default.yaml	Adds default benchmark suite spec.
benchmark/suites/init.py	Adds package marker for benchmark suites.
benchmark/scripts/run_benchmark.py	Adds benchmark CLI implementation (arg parsing + orchestration).
benchmark/scripts/init.py	Adds package marker for benchmark scripts.
benchmark/runtime.py	Implements training/evaluation runtime helpers used by the runner.
benchmark/runner.py	Coordinates jobs, caching, evaluation, aggregation, plots, and report generation.
benchmark/reporting.py	Generates markdown report and leaderboard markdown artifacts.
benchmark/plots.py	Generates SVG line/bar plots for eval histories and leaderboard.
benchmark/metrics.py	Implements aggregation, stability, threshold, and leaderboard scoring utilities.
benchmark/config.py	Loads YAML specs and merges configs.
benchmark/algorithms/ppo.yaml	Adds PPO algorithm spec for benchmark runs.
benchmark/algorithms/grpo.yaml	Adds GRPO algorithm spec for benchmark runs.
benchmark/algorithms/init.py	Adds package marker for benchmark algorithms.
benchmark/init.py	Exposes `BenchmarkRunner` from the benchmark package.

Comments suppressed due to low confidence (1)

embodichain/agents/rl/utils/trainer.py:29

There are duplicate/unused imports at the top of this file: Dict is imported twice and logger is imported but no longer used. Please remove duplicates/unused imports to keep the module clean and avoid failing lint/static checks.

from typing import Any, Dict
import time
import numpy as np
import torch
import wandb
from embodichain.utils import logger
from torch.utils.tensorboard import SummaryWriter
from collections import deque
from tensordict import TensorDict
from typing import Dict

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T09:35:38Z

embodichain/agents/rl/utils/trainer.py

@@ -161,6 +166,7 @@ def train(self, total_timesteps: int):
                self._eval_once(num_episodes=self.num_eval_episodes)
            if self.global_step % self.save_freq == 0:
                self.save_checkpoint()
+        return self.get_summary()


Trainer.train() now prints/logs unconditionally on every rank. In distributed runs this will spam stdout and can break logging assumptions; the rest of the codebase generally restricts user-facing logs to rank 0 (e.g., embodichain/agents/rl/train.py logs only when rank == 0). Consider gating the start-of-training message behind self.rank == 0 (or using the existing logger with the same guard).

fixed,thanks

Copilot · 2026-04-01T09:35:39Z

embodichain/agents/rl/utils/trainer.py

@@ -299,9 +318,6 @@ def _log_train(self, losses: Dict[str, float]):
                    self.global_step,
                )
        # console
-        sps = self.global_step / max(1e-6, time.time() - self.start_time)
-        avgR = np.mean(self.ret_window) if len(self.ret_window) > 0 else float("nan")
-        avgL = np.mean(self.len_window) if len(self.len_window) > 0 else float("nan")
        print(
            f"[train] step={self.global_step} sps={sps:.0f} avgReward(100)={avgR:.3f} avgLength(100)={avgL:.1f}"
        )
@@ -317,7 +333,7 @@ def _log_train(self, losses: Dict[str, float]):
            wandb.log(log_dict, step=self.global_step)



_log_train() no longer returns early when rank != 0, but wandb.init(...) is only called on rank 0 in embodichain/agents/rl/train.py. This means non-zero ranks will still execute wandb.log(...) and print console lines, which can raise runtime errors (wandb not initialized) and produce duplicated metrics. Please restore the rank != 0 early-return (or guard wandb/console logging with self.rank == 0).

fixed,thanks

Copilot · 2026-04-01T09:35:39Z

embodichain/agents/rl/utils/trainer.py

@@ -422,4 +483,19 @@ def save_checkpoint(self):
            },
            path,
        )
+        self.latest_checkpoint_path = path
        print(f"Checkpoint saved: {path}")
+        return path


save_checkpoint() no longer checks rank != 0. In distributed training, embodichain/agents/rl/train.py only creates checkpoint_dir on rank 0, so non-zero ranks can fail with FileNotFoundError, and multiple ranks may race writing the same checkpoint path (corruption/partial writes). Please ensure only rank 0 saves (or give per-rank filenames + create dirs on all ranks + synchronize).

fixed,thanks

Copilot · 2026-04-01T09:35:39Z

embodichain/agents/rl/utils/trainer.py

+        rollout = self.buffer.start_rollout()
+        if hasattr(self.env, "set_rollout_buffer"):
+            self.env.set_rollout_buffer(rollout)
        rollout = self.collector.collect(
            num_steps=self.buffer_size,
-            rollout=self.buffer.start_rollout(),
+            rollout=rollout,
            on_step_callback=on_step,
        )


Trainer._collect_rollout() calls env.set_rollout_buffer(rollout) before invoking SyncCollector.collect(...), but SyncCollector.collect(...) already does this when supported. This is redundant and makes it easier for these two code paths to diverge. Consider removing the extra call here and letting the collector own rollout-buffer attachment.

fixed,thanks

Copilot · 2026-04-01T09:35:39Z

benchmark/runtime.py

+        policy = build_policy_from_env(policy_block, env, device)
+        algo = build_algo(algo_block["name"], algo_block["cfg"], policy, device)
+
+        events_dict = trainer_cfg.get("events", {})
+        trainer = Trainer(
+            policy=policy,
+            env=env,
+            algorithm=algo,
+            buffer_size=int(trainer_cfg.get("buffer_size", 2048)),
+            batch_size=int(algo_block["cfg"]["batch_size"]),
+            writer=writer,
+            eval_freq=int(trainer_cfg.get("eval_freq", 0)) if enable_eval else 0,
+            save_freq=int(trainer_cfg.get("save_freq", 0)) or 10**18,
+            checkpoint_dir=str(checkpoint_dir),
+            exp_name=str(trainer_cfg.get("exp_name", "benchmark_run")),
+            use_wandb=False,
+            eval_env=eval_env,
+            event_cfg=_parse_event_cfg(events_dict.get("train", {})),
+            eval_event_cfg=_parse_event_cfg(events_dict.get("eval", {}))
+            if enable_eval
+            else {},
+            num_eval_episodes=int(trainer_cfg.get("num_eval_episodes", 5)),
+        )


The PR description says benchmark evaluation results are logged to Weights & Biases, but the benchmark runtime forces use_wandb=False when constructing Trainer, which prevents any of the new wandb logging in Trainer from running during benchmark execution. If wandb logging is intended for run_benchmark.py, consider wiring a CLI/config flag through and/or honoring trainer.use_wandb from the suite/task config instead of hard-disabling it here.

Copilot · 2026-04-01T09:35:39Z

benchmark/runner.py

+        cfg["trainer"]["iterations"] = int(self.protocol["iterations"])
+        cfg["trainer"]["buffer_size"] = int(self.protocol["buffer_size"])
+        cfg["trainer"]["num_envs"] = int(self.protocol["num_envs"])
+        cfg["trainer"]["num_eval_envs"] = int(self.protocol["num_eval_envs"])
+        cfg["trainer"]["device"] = str(self.protocol["device"])
+        cfg["trainer"]["headless"] = bool(self.protocol["headless"])
+        cfg["trainer"]["save_freq"] = int(self.protocol["save_interval"])
+        cfg["trainer"]["use_wandb"] = False
+        return cfg


BenchmarkRunner.build_run_config() hard-sets cfg["trainer"]["use_wandb"] = False, which makes it impossible to enable wandb logging for benchmark runs even if requested via task/suite configs. If wandb logging is a goal of this PR, consider making this configurable (e.g., via CLI flag) rather than forcing it off.

add rl benchmark

7364249

liguilong256 marked this pull request as draft March 19, 2026 02:28

liguilong256 added 5 commits March 19, 2026 17:25

wip

1b55f92

wip

28ada3b

add some contents

64955b8

fix

7528f5d

simplify

a746240

yuecideng self-requested a review April 1, 2026 03:59

liguilong256 added 5 commits April 1, 2026 14:26

fix

ad19d9b

Merge remote-tracking branch 'origin/main' into guilong/add_rl_benchmark

7b547c5

fix

b022a88

fix

a482f05

fix

795ec18

yuecideng marked this pull request as ready for review April 1, 2026 09:28

Copilot AI review requested due to automatic review settings April 1, 2026 09:28

Copilot AI reviewed Apr 1, 2026

View reviewed changes

fix

4ce3a1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rl benchmark#189

add rl benchmark#189
liguilong256 wants to merge 12 commits intomainfrom
guilong/add_rl_benchmark

liguilong256 commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

liguilong256 Apr 2, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

liguilong256 Apr 2, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

liguilong256 Apr 2, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

liguilong256 Apr 2, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liguilong256 commented Mar 19, 2026

Description

Type of change

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

liguilong256 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

liguilong256 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

liguilong256 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

liguilong256 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants