Implementing New Renders in 3.0 by Aditya-Gupta26 · Pull Request #371 · Emerge-Lab/PufferDrive

Aditya-Gupta26 · 2026-03-28T04:16:31Z

This PR adds and implements new render technique in 3.0 format. The render pipeline is now devoid of visualize.c and drivenet.c.

Rendering for testing purposes can now use puffer render().

Self play, human replay as well as safe_evals now work as intended.

Aditya-Gupta26 · 2026-03-28T14:31:08Z

This PR refactors the render pipeline and implements Daphne’s render changes on 3.0 branch. The render code logic is now independent of drivenet.c and other extra steps, and is called directly using a common Evaluator class (for self play/human replay) and separate classes for WOSAC and safe_eval (directly from pufferl.py during training)

The codebase now reads human-replay/self-play render requirements from eval section in drive.ini (as opposed to render before now).

For rendering during code testing, we can use drive.c as before, or run puffer render puffer_drive (as a replacement for visualize.c) , which uses render() in pufferl.py (configs read from render section in drive.ini)

We can now, also render 3 modes - the top-down view (now called sim_state), agent view (now called persp) and a new mode - bev (which is a dynamic topdown view of ego's observations)

pufferlib/config/ocean/drive.ini

daphne-cornelisse · 2026-03-28T17:05:08Z

pufferlib/ocean/drive/drive.h

-        0,
-        0, // Y is up
-        1  // Z is depth
-    };


flagging: is this intentional?

Mm, mostly not. This is just to give the camera a default position to look at. Keeping target at (0,0) would automatically center the view (around the map centers)

daphne-cornelisse · 2026-03-28T17:07:09Z

pufferlib/pufferl.py


-    if num_maps > len(os.listdir(map_dir)):
-        num_maps = len(os.listdir(map_dir))
+    bin_files = sorted(f for f in os.listdir(map_dir) if f.endswith(".bin"))


I don't think you need this. Render code can just use policy directly

Yes, correct - puffer render handles policy directly. I guess this is for the map_files instead

daphne-cornelisse · 2026-03-28T17:07:56Z

pufferlib/pufferl.py

-                print(f"Error rendering {map_name}: {result.stderr}")
-        except subprocess.TimeoutExpired:
-            print(f"Timeout rendering {map_name}: exceeded 600 seconds")
+    # Fall back to CPU if the configured device is unavailable (e.g. no CUDA on this machine)


Redundant, is taken care of by rollout() in the Evaluator class

I see. Calling evaluator is one option for this (probably a pretty clean option). As of now, puffer render directly calls driver.render() without calling evaluator

Sure, there are different ways to implement this, what you did looks fine. There is a bit of redundancy still, but overall this looks good to me

pufferlib/config/ocean/drive.ini

pufferlib/ocean/drive/binding.c

pufferlib/ocean/drive/drive.h

eugenevinitsky · 2026-03-28T18:03:23Z

pufferlib/ocean/drive/drive.h

            obs[13] = fminf(SPEED_LIMIT / MAX_SPEED, 1.0f);
            obs[14] = lane_center_dist;
            obs[15] = ego_entity->metrics_array[LANE_ANGLE_IDX];
+            obs[16] = ego_entity->type / 3.0f;


I don't terribly mind this, but this should probably be in a different PR?

Yeah I added that in my branch, I think it was accidentally copied.

pufferlib/ocean/drive/drive.py

pufferlib/pufferl.py

pufferlib/ocean/drive/drive.py

eugenevinitsky · 2026-03-28T18:14:41Z

Xvfb `pkill -9` will break multi-job nodes

system("pkill -9 Xvfb") kills all Xvfb processes owned by the current user. Two training jobs with rendering on the same node will kill each other's Xvfb.

Fix: use a per-process display number and remove the pkill entirely:

client->xvfb_display_num = 100 + (getpid() % 900);
char display_str[16];
snprintf(display_str, sizeof(display_str), ":%d", client->xvfb_display_num);

Each process manages only its own Xvfb via client->xvfb_pid and cleans it up in close_client() with kill(client->xvfb_pid, SIGTERM).

pufferlib/pufferl.py

Copilot

Pull request overview

This PR updates the PufferDrive 3.0 rendering/evaluation workflow by moving rendering into the in-process C render pipeline (ffmpeg/headless support) and routing training-time evaluation/render logging through a new Python Evaluator, removing prior subprocess/C-binary rendering paths.

Changes:

Removes legacy subprocess-based human replay eval and C-binary video rendering helpers from pufferlib/utils.py.
Adds a new Evaluator for self-play/human-replay eval with optional multi-view rendering and wandb video logging.
Extends Drive’s C/Python bindings to support render modes, view selection, video filename suffixing, and per-env logging helpers.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pufferlib/utils.py	Removes deprecated subprocess human-replay eval and old C-binary video rendering wrappers.
pufferlib/pufferl.py	Wires training eval into the new `Evaluator`; adds an in-process `render()` command path and export-path CLI support.
pufferlib/ocean/env_config.h	Adds `render_mode` to the INI config struct (used by Drive demo/config parsing).
pufferlib/ocean/env_binding.h	Updates render binding signatures and adds env/vec helpers (video suffix, scenario IDs, env_log).
pufferlib/ocean/drive/drive.py	Adds `RenderView`, render/view controls, scenario IDs, per-env logging, and render suffix plumbing.
pufferlib/ocean/drive/drive.h	Implements headless rendering via ffmpeg piping, view modes, and rendering-related state in the C env.
pufferlib/ocean/drive/drive.c	Updates demo/CLI to use new view modes and traces flag.
pufferlib/ocean/drive/binding.c	Plumbs `render_mode` into C env init and fixes some cleanup paths.
pufferlib/ocean/benchmark/evaluator.py	Introduces a new `Evaluator` supporting eval rollouts, render selection, and wandb video logging.
pufferlib/config/ocean/drive.ini	Updates Drive config to enable headless render mode and adds new eval/render configuration knobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T00:09:52Z

pufferlib/ocean/env_config.h

 typedef struct {
+    int render_mode;
    int action_type;


render_mode was added to env_init_config, but the INI handler() never matches/consumes an env.render_mode key. With the new render_mode = 1 entry in drive.ini, ini_parse() will stop at that unknown key and leave all subsequent fields (e.g. reward randomization bounds) at defaults, breaking the C demo() configuration. Add a MATCH("env", "render_mode") case (and map values 0/1) so parsing succeeds.

pufferlib/pufferl.py

Copilot · 2026-03-30T00:09:53Z

pufferlib/pufferl.py

+                    self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
+                    self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
+                    self.evaluator.hr_env.close()
+                    self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
+                if self_play_eval:
+                    self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)


This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.

Suggested change

self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)

self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")

self.evaluator.hr_env.close()

self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)

if self_play_eval:

self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)

self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config)

self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")

self.evaluator.hr_env.close()

self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)

if self_play_eval:

self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config)

Copilot · 2026-03-30T00:09:53Z

pufferlib/pufferl.py

+                    self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
+                    self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
+                    self.evaluator.hr_env.close()
+                    self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
+                if self_play_eval:
+                    self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)


This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.

Suggested change

self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)

self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")

self.evaluator.hr_env.close()

self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)

if self_play_eval:

self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)

self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config)

self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")

self.evaluator.hr_env.close()

self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)

if self_play_eval:

self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config)

Copilot · 2026-03-30T00:09:54Z

pufferlib/pufferl.py

+            if self.config["eval"]["wosac_realism_eval"]:
+                pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)


wosac_realism_eval is no longer gated by eval_interval (previously it only ran every N eval intervals or at the end). As written, it will run on every checkpoint save when enabled, which is likely a large and unintended slowdown. Restore the interval condition (and/or share the same eval scheduling block as the new Evaluator).

Suggested change

if self.config["eval"]["wosac_realism_eval"]:

pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)

if self.config["eval"]["wosac_realism_eval"]:

pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)

Copilot · 2026-03-30T00:09:54Z

pufferlib/ocean/env_binding.h

+    // Aggregate this env's per-agent logs (same as vec_log but for one env)
+    // Note: breaks horribly if you don't use floats
+    Log aggregate = {0};
+    int num_keys = sizeof(Log) / sizeof(float);
+    for (int j = 0; j < num_keys; j++) {
+        ((float *)&aggregate)[j] += ((float *)&env->log)[j];
+    }
+
+    PyObject *dict = PyDict_New();
+    if (aggregate.n == 0.0f) {
+        return dict;
+    }
+
+    // Average across agents in env
+    float n = aggregate.n;
+    for (int i = 0; i < num_keys; i++) {
+        ((float *)&aggregate)[i] /= n;
+    }
+    aggregate.n = (float)env->active_agent_count;


env_log returns averaged metrics but never clears env->log afterward. Unlike vec_log, repeated calls will re-report cumulative data and can skew per-env selection (and make it impossible to interpret logs as per-interval). Consider resetting env->log after producing a non-empty dict (mirroring vec_log), and/or honoring the num_agents argument as a minimum-data threshold.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

daphne-cornelisse

LGTM

Aditya-Gupta26 added 11 commits March 26, 2026 08:25

Making changes

e8b33db

Updating drive.h

6425080

Further render related updates

8fb76cc

Fixing bug with top-down render

481789b

Updating over latest 2.0 first

4308cb7

Rebased over latest 3.0 with some minor additional changes

be0425f

Adding some render options

366b6b6

Adding some render options

6688a91

Adding full safe_eval functionality

e5d4652

Further bug fixes

ad46093

Cleaning up some commented dead code

6d8729a

daphne-cornelisse reviewed Mar 28, 2026

View reviewed changes