Skip to content

Implementing New Renders in 3.0#371

Open
Aditya-Gupta26 wants to merge 13 commits into3.0from
aditya/lean_renders
Open

Implementing New Renders in 3.0#371
Aditya-Gupta26 wants to merge 13 commits into3.0from
aditya/lean_renders

Conversation

@Aditya-Gupta26
Copy link
Copy Markdown

This PR adds and implements new render technique in 3.0 format. The render pipeline is now devoid of visualize.c and drivenet.c.

Rendering for testing purposes can now use puffer render().

Self play, human replay as well as safe_evals now work as intended.

@Aditya-Gupta26
Copy link
Copy Markdown
Author

This PR refactors the render pipeline and implements Daphne’s render changes on 3.0 branch. The render code logic is now independent of drivenet.c and other extra steps, and is called directly using a common Evaluator class (for self play/human replay) and separate classes for WOSAC and safe_eval (directly from pufferl.py during training)

The codebase now reads human-replay/self-play render requirements from eval section in drive.ini (as opposed to render before now).

For rendering during code testing, we can use drive.c as before, or run puffer render puffer_drive (as a replacement for visualize.c) , which uses render() in pufferl.py (configs read from render section in drive.ini)

We can now, also render 3 modes - the top-down view (now called sim_state), agent view (now called persp) and a new mode - bev (which is a dynamic topdown view of ego's observations)

0,
0, // Y is up
1 // Z is depth
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagging: is this intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, mostly not. This is just to give the camera a default position to look at. Keeping target at (0,0) would automatically center the view (around the map centers)


if num_maps > len(os.listdir(map_dir)):
num_maps = len(os.listdir(map_dir))
bin_files = sorted(f for f in os.listdir(map_dir) if f.endswith(".bin"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this. Render code can just use policy directly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct - puffer render handles policy directly. I guess this is for the map_files instead

print(f"Error rendering {map_name}: {result.stderr}")
except subprocess.TimeoutExpired:
print(f"Timeout rendering {map_name}: exceeded 600 seconds")
# Fall back to CPU if the configured device is unavailable (e.g. no CUDA on this machine)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant, is taken care of by rollout() in the Evaluator class

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Calling evaluator is one option for this (probably a pretty clean option). As of now, puffer render directly calls driver.render() without calling evaluator

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there are different ways to implement this, what you did looks fine. There is a bit of redundancy still, but overall this looks good to me

obs[13] = fminf(SPEED_LIMIT / MAX_SPEED, 1.0f);
obs[14] = lane_center_dist;
obs[15] = ego_entity->metrics_array[LANE_ANGLE_IDX];
obs[16] = ego_entity->type / 3.0f;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't terribly mind this, but this should probably be in a different PR?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I added that in my branch, I think it was accidentally copied.

@eugenevinitsky
Copy link
Copy Markdown

eugenevinitsky commented Mar 28, 2026

Xvfb pkill -9 will break multi-job nodes

system("pkill -9 Xvfb") kills all Xvfb processes owned by the current user. Two training jobs with rendering on the same node will kill each other's Xvfb.

Fix: use a per-process display number and remove the pkill entirely:

client->xvfb_display_num = 100 + (getpid() % 900);
char display_str[16];
snprintf(display_str, sizeof(display_str), ":%d", client->xvfb_display_num);

Each process manages only its own Xvfb via client->xvfb_pid and cleans it up in close_client() with kill(client->xvfb_pid, SIGTERM).

Copilot AI review requested due to automatic review settings March 30, 2026 00:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PufferDrive 3.0 rendering/evaluation workflow by moving rendering into the in-process C render pipeline (ffmpeg/headless support) and routing training-time evaluation/render logging through a new Python Evaluator, removing prior subprocess/C-binary rendering paths.

Changes:

  • Removes legacy subprocess-based human replay eval and C-binary video rendering helpers from pufferlib/utils.py.
  • Adds a new Evaluator for self-play/human-replay eval with optional multi-view rendering and wandb video logging.
  • Extends Drive’s C/Python bindings to support render modes, view selection, video filename suffixing, and per-env logging helpers.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pufferlib/utils.py Removes deprecated subprocess human-replay eval and old C-binary video rendering wrappers.
pufferlib/pufferl.py Wires training eval into the new Evaluator; adds an in-process render() command path and export-path CLI support.
pufferlib/ocean/env_config.h Adds render_mode to the INI config struct (used by Drive demo/config parsing).
pufferlib/ocean/env_binding.h Updates render binding signatures and adds env/vec helpers (video suffix, scenario IDs, env_log).
pufferlib/ocean/drive/drive.py Adds RenderView, render/view controls, scenario IDs, per-env logging, and render suffix plumbing.
pufferlib/ocean/drive/drive.h Implements headless rendering via ffmpeg piping, view modes, and rendering-related state in the C env.
pufferlib/ocean/drive/drive.c Updates demo/CLI to use new view modes and traces flag.
pufferlib/ocean/drive/binding.c Plumbs render_mode into C env init and fixes some cleanup paths.
pufferlib/ocean/benchmark/evaluator.py Introduces a new Evaluator supporting eval rollouts, render selection, and wandb video logging.
pufferlib/config/ocean/drive.ini Updates Drive config to enable headless render mode and adds new eval/render configuration knobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 10 to 12
typedef struct {
int render_mode;
int action_type;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

render_mode was added to env_init_config, but the INI handler() never matches/consumes an env.render_mode key. With the new render_mode = 1 entry in drive.ini, ini_parse() will stop at that unknown key and leave all subsequent fields (e.g. reward randomization bounds) at defaults, breaking the C demo() configuration. Add a MATCH("env", "render_mode") case (and map values 0/1) so parsing succeeds.

Copilot uses AI. Check for mistakes.
Comment on lines +516 to +521
self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.

Suggested change
self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)
self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config)

Copilot uses AI. Check for mistakes.
Comment on lines +516 to +521
self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-codes the env name to puffer_drive, which will break training/eval if PuffeRL is used with any other env_name. Use the current env (e.g. self.config["env"] or a stored env_name) instead of a constant here.

Suggested change
self.evaluator.hr_env = load_env("puffer_drive", self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env("puffer_drive", self.evaluator.sp_eval_config)
self.evaluator.hr_env = load_env(self.config["env"], self.evaluator.hr_eval_config)
self.evaluator.rollout(self.uncompiled_policy, mode="human_replay")
self.evaluator.hr_env.close()
self.evaluator.log_videos(eval_mode="human_replay", epoch=self.epoch)
if self_play_eval:
self.evaluator.sp_env = load_env(self.config["env"], self.evaluator.sp_eval_config)

Copilot uses AI. Check for mistakes.
Comment on lines +530 to +531
if self.config["eval"]["wosac_realism_eval"]:
pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wosac_realism_eval is no longer gated by eval_interval (previously it only ran every N eval intervals or at the end). As written, it will run on every checkpoint save when enabled, which is likely a large and unintended slowdown. Restore the interval condition (and/or share the same eval scheduling block as the new Evaluator).

Suggested change
if self.config["eval"]["wosac_realism_eval"]:
pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)
if self.config["eval"]["wosac_realism_eval"]:
pufferlib.utils.run_wosac_eval_in_subprocess(self.config, self.logger, self.global_step)

Copilot uses AI. Check for mistakes.
Comment on lines +705 to +723
// Aggregate this env's per-agent logs (same as vec_log but for one env)
// Note: breaks horribly if you don't use floats
Log aggregate = {0};
int num_keys = sizeof(Log) / sizeof(float);
for (int j = 0; j < num_keys; j++) {
((float *)&aggregate)[j] += ((float *)&env->log)[j];
}

PyObject *dict = PyDict_New();
if (aggregate.n == 0.0f) {
return dict;
}

// Average across agents in env
float n = aggregate.n;
for (int i = 0; i < num_keys; i++) {
((float *)&aggregate)[i] /= n;
}
aggregate.n = (float)env->active_agent_count;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env_log returns averaged metrics but never clears env->log afterward. Unlike vec_log, repeated calls will re-report cumulative data and can skew per-env selection (and make it impossible to interpret logs as per-interval). Consider resetting env->log after producing a non-empty dict (mirroring vec_log), and/or honoring the num_agents argument as a minimum-data threshold.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@daphne-cornelisse daphne-cornelisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants