Conversation
|
This PR refactors the render pipeline and implements Daphne’s render changes on 3.0 branch. The render code logic is now independent of drivenet.c and other extra steps, and is called directly using a common Evaluator class (for self play/human replay) and separate classes for WOSAC and safe_eval (directly from pufferl.py during training) The codebase now reads human-replay/self-play render requirements from eval section in drive.ini (as opposed to render before now). For rendering during code testing, we can use drive.c as before, or run puffer render puffer_drive (as a replacement for visualize.c) , which uses render() in pufferl.py (configs read from render section in drive.ini) We can now, also render 3 modes - the top-down view (now called sim_state), agent view (now called persp) and a new mode - bev (which is a dynamic topdown view of ego's observations) |
| 0, | ||
| 0, // Y is up | ||
| 1 // Z is depth | ||
| }; |
There was a problem hiding this comment.
flagging: is this intentional?
There was a problem hiding this comment.
Mm, mostly not. This is just to give the camera a default position to look at. Keeping target at (0,0) would automatically center the view (around the map centers)
|
|
||
| if num_maps > len(os.listdir(map_dir)): | ||
| num_maps = len(os.listdir(map_dir)) | ||
| bin_files = sorted(f for f in os.listdir(map_dir) if f.endswith(".bin")) |
There was a problem hiding this comment.
I don't think you need this. Render code can just use policy directly
There was a problem hiding this comment.
Yes, correct - puffer render handles policy directly. I guess this is for the map_files instead
| print(f"Error rendering {map_name}: {result.stderr}") | ||
| except subprocess.TimeoutExpired: | ||
| print(f"Timeout rendering {map_name}: exceeded 600 seconds") | ||
| # Fall back to CPU if the configured device is unavailable (e.g. no CUDA on this machine) |
There was a problem hiding this comment.
Redundant, is taken care of by rollout() in the Evaluator class
There was a problem hiding this comment.
I see. Calling evaluator is one option for this (probably a pretty clean option). As of now, puffer render directly calls driver.render() without calling evaluator
There was a problem hiding this comment.
Sure, there are different ways to implement this, what you did looks fine. There is a bit of redundancy still, but overall this looks good to me
| obs[13] = fminf(SPEED_LIMIT / MAX_SPEED, 1.0f); | ||
| obs[14] = lane_center_dist; | ||
| obs[15] = ego_entity->metrics_array[LANE_ANGLE_IDX]; | ||
| obs[16] = ego_entity->type / 3.0f; |
There was a problem hiding this comment.
I don't terribly mind this, but this should probably be in a different PR?
There was a problem hiding this comment.
Yeah I added that in my branch, I think it was accidentally copied.
Xvfb
|
There was a problem hiding this comment.
Pull request overview
This PR updates the PufferDrive 3.0 rendering/evaluation workflow by moving rendering into the in-process C render pipeline (ffmpeg/headless support) and routing training-time evaluation/render logging through a new Python Evaluator, removing prior subprocess/C-binary rendering paths.
Changes:
- Removes legacy subprocess-based human replay eval and C-binary video rendering helpers from
pufferlib/utils.py. - Adds a new
Evaluatorfor self-play/human-replay eval with optional multi-view rendering and wandb video logging. - Extends Drive’s C/Python bindings to support render modes, view selection, video filename suffixing, and per-env logging helpers.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pufferlib/utils.py | Removes deprecated subprocess human-replay eval and old C-binary video rendering wrappers. |
| pufferlib/pufferl.py | Wires training eval into the new Evaluator; adds an in-process render() command path and export-path CLI support. |
| pufferlib/ocean/env_config.h | Adds render_mode to the INI config struct (used by Drive demo/config parsing). |
| pufferlib/ocean/env_binding.h | Updates render binding signatures and adds env/vec helpers (video suffix, scenario IDs, env_log). |
| pufferlib/ocean/drive/drive.py | Adds RenderView, render/view controls, scenario IDs, per-env logging, and render suffix plumbing. |
| pufferlib/ocean/drive/drive.h | Implements headless rendering via ffmpeg piping, view modes, and rendering-related state in the C env. |
| pufferlib/ocean/drive/drive.c | Updates demo/CLI to use new view modes and traces flag. |
| pufferlib/ocean/drive/binding.c | Plumbs render_mode into C env init and fixes some cleanup paths. |
| pufferlib/ocean/benchmark/evaluator.py | Introduces a new Evaluator supporting eval rollouts, render selection, and wandb video logging. |
| pufferlib/config/ocean/drive.ini | Updates Drive config to enable headless render mode and adds new eval/render configuration knobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| typedef struct { | ||
| int render_mode; | ||
| int action_type; |
There was a problem hiding this comment.
render_mode was added to env_init_config, but the INI handler() never matches/consumes an env.render_mode key. With the new render_mode = 1 entry in drive.ini, ini_parse() will stop at that unknown key and leave all subsequent fields (e.g. reward randomization bounds) at defaults, breaking the C demo() configuration. Add a MATCH("env", "render_mode") case (and map values 0/1) so parsing succeeds.
| self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config) | ||
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | ||
| self.evaluator.hr_env.close() | ||
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | ||
| if self_play_eval: | ||
| self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config) |
There was a problem hiding this comment.
This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.
| self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config) | |
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | |
| self.evaluator.hr_env.close() | |
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | |
| if self_play_eval: | |
| self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config) | |
| self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config) | |
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | |
| self.evaluator.hr_env.close() | |
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | |
| if self_play_eval: | |
| self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config) |
| self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config) | ||
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | ||
| self.evaluator.hr_env.close() | ||
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | ||
| if self_play_eval: | ||
| self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config) |
There was a problem hiding this comment.
This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.
| self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config) | |
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | |
| self.evaluator.hr_env.close() | |
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | |
| if self_play_eval: | |
| self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config) | |
| self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config) | |
| self.evaluator.rollout(self.uncompiled_policy, mode="human_replay") | |
| self.evaluator.hr_env.close() | |
| self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch) | |
| if self_play_eval: | |
| self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config) |
| if self.config["eval"]["wosac_realism_eval"]: | ||
| pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step) |
There was a problem hiding this comment.
wosac_realism_eval is no longer gated by eval_interval (previously it only ran every N eval intervals or at the end). As written, it will run on every checkpoint save when enabled, which is likely a large and unintended slowdown. Restore the interval condition (and/or share the same eval scheduling block as the new Evaluator).
| if self.config["eval"]["wosac_realism_eval"]: | |
| pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step) | |
| if self.config["eval"]["wosac_realism_eval"]: | |
| pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step) |
| // Aggregate this env's per-agent logs (same as vec_log but for one env) | ||
| // Note: breaks horribly if you don't use floats | ||
| Log aggregate = {0}; | ||
| int num_keys = sizeof(Log) / sizeof(float); | ||
| for (int j = 0; j < num_keys; j++) { | ||
| ((float *)&aggregate)[j] += ((float *)&env->log)[j]; | ||
| } | ||
|
|
||
| PyObject *dict = PyDict_New(); | ||
| if (aggregate.n == 0.0f) { | ||
| return dict; | ||
| } | ||
|
|
||
| // Average across agents in env | ||
| float n = aggregate.n; | ||
| for (int i = 0; i < num_keys; i++) { | ||
| ((float *)&aggregate)[i] /= n; | ||
| } | ||
| aggregate.n = (float)env->active_agent_count; |
There was a problem hiding this comment.
env_log returns averaged metrics but never clears env->log afterward. Unlike vec_log, repeated calls will re-report cumulative data and can skew per-env selection (and make it impossible to interpret logs as per-interval). Consider resetting env->log after producing a non-empty dict (mirroring vec_log), and/or honoring the num_agents argument as a minimum-data threshold.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This PR adds and implements new render technique in 3.0 format. The render pipeline is now devoid of
visualize.canddrivenet.c.Rendering for testing purposes can now use
puffer render().Self play, human replay as well as safe_evals now work as intended.