[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure by eugenevinitsky · Pull Request #364 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-24T06:19:54Z

Summary

Add predicted trajectory visualization to the renderer
Policy outputs 20 timesteps of actions; C code rolls them out through classic dynamics and draws as red circles/lines
Top-down orthographic camera with mouse drag panning and scroll zoom
vec_set_trajectory binding: takes discrete action indices, rolls out through classic dynamics, stores x/y for rendering
set_predicted_trajectories() Python method slices actions per sub-env via agent_offsets
Fix: re-enter BeginMode3D after draw_scene (which calls EndMode3D internally) so trajectories render in world coordinates

How to use

# Get actor output from torch policy
logits = policy(obs)  # [num_agents, 20 * 91]
traj_actions = logits.reshape(num_agents, 20, 91).argmax(dim=-1).numpy()

# Set trajectories (C rolls out dynamics internally)
env.set_predicted_trajectories(traj_actions)

# Render with trajectories visible
env.render()

Test plan

Verified trajectory coordinates match agent positions via debug prints
Verified classic dynamics rollout matches move_dynamics in drive.h
Visual test on Town01 with model_puffer_drive_006000.pt

🤖 Generated with Claude Code

- Update DriveNet to match Ricky's actions_trajectory policy: - Add past_actions_encoder (Linear→ReLU→Linear, 40→64→64) - 4-way concat (ego, road, partner, past_actions) → shared_embedding - Actor outputs trajectory_len × action_size logits - Roll out predicted trajectory using classic dynamics model after each forward pass, store in predicted_trajectory_x/y - Draw predicted trajectories as sky-blue lines in renderTopDownView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add predicted_traj_x/y fields to Drive struct - Add vec_set_trajectory binding: takes action indices, rolls out through classic dynamics in C, stores x/y positions for rendering - c_render draws predicted trajectories as sky-blue lines/spheres - Add headless ffmpeg rendering functions (start/stop/write_render_pipe) - Python set_predicted_trajectories() method on Drive class Usage: after getting actor output from torch policy, extract argmax actions for each trajectory step, pass to env.set_predicted_trajectories(), then call env.render(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

draw_scene() calls EndMode3D() internally, so trajectory drawing was happening in 2D screen space instead of world coordinates. Now re-enters BeginMode3D with the same camera after draw_scene returns. Also: - Top-down orthographic camera centered on map - Mouse drag to pan, scroll to zoom - Per-sub-env trajectory slicing via agent_offsets - rlDisableDepthTest so trajectories render on top Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds infrastructure to visualize predicted action rollouts as trajectories in the Drive renderer, including Python→C bindings to set per-agent trajectory action sequences and new top-down camera controls for panning/zooming.

Changes:

Add vec_set_trajectory binding to roll out discrete actions through classic dynamics and store x/y points for rendering.
Add Python Drive.set_predicted_trajectories() helper and extend Drive.render() to support selecting env_id.
Extend C rendering/inference paths to draw predicted trajectories and (separately) add trajectory buffers to DriveNet/top-down visualization.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`pufferlib/ocean/env_binding.h`	Adds `vec_set_trajectory` C-extension method to roll out discrete actions into trajectory points.
`pufferlib/ocean/drive/visualize.c`	Extends top-down visualization to draw predicted trajectories and updates `forward()` call signature usage.
`pufferlib/ocean/drive/drivenet.h`	Changes network output shape to include trajectory-length action logits and rolls out predicted trajectories for rendering; adds past-actions encoder path.
`pufferlib/ocean/drive/drive.py`	Adds `set_predicted_trajectories()` and updates `render()` to accept `env_id`.
`pufferlib/ocean/drive/drive.h`	Adds predicted-trajectory buffers to `Drive` and updates `c_render()` with a top-down ortho camera + trajectory drawing; adds ffmpeg pipe helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T06:26:29Z

+    pid_t pid = fork();
+    if (pid == 0) { // child: run ffmpeg
+        close(pipefd[1]);
+        dup2(pipefd[0], STDIN_FILENO);
+        close(pipefd[0]);
+        for (int fd = 3; fd < 256; fd++)
+            close(fd);
+        char size_str[64];
+        snprintf(size_str, sizeof(size_str), "%dx%d", width, height);
+        execlp("ffmpeg", "ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgba", "-s", size_str, "-r", "30", "-i", "-",
+               "-c:v", "libx264", "-pix_fmt", "yuv420p", "-preset", "ultrafast", "-crf", "23", "-loglevel", "error",
+               "/tmp/pufferdrive_render.mp4", NULL);
+        fprintf(stderr, "execlp ffmpeg failed\n");
+        _exit(1);
+    }
+    close(pipefd[0]);
+    env->render_pid = pid;
+    env->render_pipe_fd = pipefd[1];
+}
+
+void stop_render_pipe(Drive *env) {
+    if (env->render_pipe_fd > 0) {
+        close(env->render_pipe_fd);
+        env->render_pipe_fd = 0;
+    }
+    if (env->render_pid > 0) {
+        waitpid(env->render_pid, NULL, 0);
+        env->render_pid = 0;


start_render_pipe/stop_render_pipe use fork()/waitpid() but (1) fork() failure (pid == -1) isn’t handled, and (2) waitpid() requires <sys/wait.h>, which isn’t included in drive.h today (only <unistd.h>). This can cause build failures or leaving zombie ffmpeg processes on fork/exec errors. Please handle pid==-1 and add the required include(s) (or move these helpers to a .c file that includes them).

Copilot · 2026-03-24T06:26:30Z

+    // First: draw GREEN circles at every agent position to verify coordinate alignment
+    for (int i = 0; i < env->active_agent_count; i++) {
+        int agent_idx = env->active_agent_indices[i];
+        Agent *agent = &env->agents[agent_idx];
+        DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f},
+                     5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN);
+    }


c_render currently always draws large GREEN debug circles at every agent position. This looks like temporary verification instrumentation and will clutter normal renders / videos and adds per-frame draw cost. Please guard this behind a debug flag (e.g., env->render_debug) or remove it once coordinate alignment is confirmed.

Suggested change

// First: draw GREEN circles at every agent position to verify coordinate alignment

for (int i = 0; i < env->active_agent_count; i++) {

int agent_idx = env->active_agent_indices[i];

Agent *agent = &env->agents[agent_idx];

DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f},

5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN);

}

Copilot · 2026-03-24T06:26:30Z

 }

-void forward(DriveNet *net, float *observations, int *actions) {
+void forward(DriveNet *net, Drive *env, float *observations, int *actions) {


Changing forward() to take an additional Drive *env parameter requires updating all call sites; currently pufferlib/ocean/drive/drive.c still calls forward(net, observations, actions) and will not compile. Please update remaining callers (or provide a backward-compatible wrapper, e.g., forward(net, NULL, observations, actions)).

Suggested change

void forward(DriveNet *net, Drive *env, float *observations, int *actions) {

static inline void forward_3(DriveNet *net, float *observations, int *actions) {

forward_with_env(net, NULL, observations, actions);

}

#define GET_FORWARDFUNC(_1,_2,_3,_4,NAME,...) NAME

#define forward(...) GET_FORWARDFUNC(__VA_ARGS__, forward_with_env, forward_3)(__VA_ARGS__)

void forward_with_env(DriveNet *net, Drive *env, float *observations, int *actions) {

Copilot · 2026-03-24T06:26:30Z

+        // Process past actions trajectory
+        for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
+            net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];


forward() now assumes each agent's observation includes an extra PAST_ACTIONS_DIM appended (obs_stride includes it and past_actions_offset is read), but Drive currently allocates/fills observations with max_obs = ego + partner + road only (see drive.h allocate()/compute_observations). This makes observations[past_actions_offset + i] read out-of-bounds and can corrupt memory. Please either (a) update the environment observation layout/allocation to include these dims, or (b) remove/guard the past-actions read path unless the buffer is known to contain it.

Suggested change

// Process past actions trajectory

for (int i = 0; i < PAST_ACTIONS_DIM; i++) {

net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];

// Process past actions trajectory (if present in the observation buffer)

int agent_obs_end = (b + 1) * obs_stride;

int past_actions_end = past_actions_offset + PAST_ACTIONS_DIM;

int has_past_actions = past_actions_end <= agent_obs_end;

if (has_past_actions) {

for (int i = 0; i < PAST_ACTIONS_DIM; i++) {

net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];

}

} else {

// If the buffer does not contain past actions, fall back to zeros

for (int i = 0; i < PAST_ACTIONS_DIM; i++) {

net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = 0.0f;

}

Copilot · 2026-03-24T06:26:31Z

+    // Roll out predicted trajectory using classic dynamics
+    if (env != NULL) {
+        int traj_len = net->trajectory_len;
+        int action_size = net->action_size;
+        int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
+
+        for (int b = 0; b < net->num_agents; b++) {
+            int agent_idx = env->active_agent_indices[b];
+            Agent *agent = &env->agents[agent_idx];
+
+            // Start from current agent state
+            float x = agent->sim_x;
+            float y = agent->sim_y;
+            float heading = agent->sim_heading;
+            float vx = agent->sim_vx;
+            float vy = agent->sim_vy;
+
+            float *actor_out = &net->actor->output[b * traj_len * action_size];
+
+            for (int t = 0; t < traj_len; t++) {
+                // Find argmax action for this timestep
+                float *logits = &actor_out[t * action_size];
+                int best_action = 0;
+                float best_val = logits[0];
+                for (int a = 1; a < action_size; a++) {
+                    if (logits[a] > best_val) {
+                        best_val = logits[a];
+                        best_action = a;
+                    }
+                }
+
+                // Decode action into acceleration + steering (classic model)
+                int accel_idx = best_action / num_steer;
+                int steer_idx = best_action % num_steer;
+                float acceleration = ACCELERATION_VALUES[accel_idx];
+                float steering = STEERING_VALUES[steer_idx];
+


The trajectory rollout section always decodes actions using classic ACCELERATION_VALUES/STEERING_VALUES, even when the env/net is configured for JERK dynamics. That will produce incorrect trajectories (and makes this code path implicitly classic-only). Please gate this rollout on env->dynamics_model == CLASSIC (or implement the corresponding JERK rollout using JERK_LONG/JERK_LAT) so rendering stays consistent across dynamics models.

Copilot · 2026-03-24T06:26:31Z


-    def render(self):
-        binding.vec_render(self.c_envs, 0)
+    def render(self, env_id=0):


render(env_id) forwards env_id straight into binding.vec_render, which does not bounds-check env_id before indexing vec->envs[env_id]. Passing an invalid env_id can segfault the process. Please add a Python-side check (0 <= env_id < self.num_envs) and raise a ValueError early (or add the bounds check in vec_render for defense-in-depth).

Suggested change

def render(self, env_id=0):

def render(self, env_id=0):

if not (0 <= env_id < self.num_envs):

raise ValueError(f"env_id ({env_id}) must be in range [0, {self.num_envs})")

Copilot · 2026-03-24T06:26:32Z

+static PyObject *vec_set_trajectory(PyObject *self, PyObject *args) {
+    if (PyTuple_Size(args) != 4) {
+        PyErr_SetString(PyExc_TypeError, "vec_set_trajectory requires 4 args: vec_env, env_id, action_traj, traj_len");
+        return NULL;
+    }
+    VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
+    int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
+    PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
+    int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));
+
+    Env *env = vec->envs[env_id];
+    int num_agents = env->active_agent_count;
+    int *actions = (int *)PyArray_DATA(action_arr);
+    int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);


vec_set_trajectory casts args directly (vec handle, env_id, action_traj) without using unpack_vecenv / type-checking / bounds-checking. An invalid handle, out-of-range env_id, non-numpy action_traj, non-contiguous array, or wrong dtype/size can lead to segfaults via vec->envs[env_id] and PyArray_DATA. Please validate vec!=NULL, 0<=env_idnum_envs, traj_len>0, and coerce/verify action_traj as a contiguous NPY_INT32 array with length == active_agent_count*traj_len (raising Python exceptions on mismatch).

Copilot · 2026-03-24T06:26:32Z

+        for (int t = 0; t < traj_len; t++) {
+            int action_val = actions[i * traj_len + t];
+            int accel_idx = action_val / num_steer;
+            int steer_idx = action_val % num_steer;
+            float acceleration = ACCELERATION_VALUES[accel_idx];
+            float steering = STEERING_VALUES[steer_idx];


action_val is used to index ACCELERATION_VALUES and STEERING_VALUES without validating bounds. If Python passes an out-of-range action index (or a negative value), accel_idx/steer_idx can go out of bounds and read invalid memory. Please validate 0 <= action_val < (num_accel * num_steer) (and handle invalid inputs by clamping or raising a Python ValueError).

Copilot · 2026-03-24T06:26:32Z

    // Draw agent trajs
    if (trajectories) {
        for (int i = 0; i < frame_count; i++) {


draw_scene() ends the 3D mode internally (see drive.h), but this function still calls EndMode3D() after draw_scene(). That double-EndMode3D breaks raylib render state and can cause the newly added predicted-trajectory drawing to render inconsistently. Please remove the redundant EndMode3D in this function (or refactor draw_scene to not manage 3D mode).

Copilot · 2026-03-24T06:26:32Z

+    // Predicted trajectory visualization (set from Python, drawn by c_render)
+    float *predicted_traj_x;  // [num_agents * traj_len]
+    float *predicted_traj_y;  // [num_agents * traj_len]
+    int predicted_traj_len;   // steps per agent (0 = no trajectory to draw)


predicted_traj_x/predicted_traj_y are heap-allocated from Python (vec_set_trajectory) but are not freed anywhere in Drive teardown (c_close/free_allocated). This introduces a persistent leak across env lifetimes and can accumulate if trajectories are updated/reset frequently. Please free these buffers (and reset predicted_traj_len/pointers) in c_close/free_allocated (and/or when resetting the env).

- Remove green debug circles from c_render (leftover instrumentation) - Fix comment: mouse drag uses left button, not right - Free predicted_traj_x/y in c_close (memory leak) - Fix forward() call sites in drive.c to use new 4-arg signature - Remove double EndMode3D in visualize.c renderTopDownView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ajectories - Eval loop now handles trajectory policies: concatenates past actions buffer to observations, uses sample_logits_action_sequence, and calls set_predicted_trajectories for visualization - Default actions_trajectory_length changed from 80 to 20 in torch.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eugenevinitsky and others added 3 commits March 23, 2026 22:06

Copilot AI review requested due to automatic review settings March 24, 2026 06:19

Copilot started reviewing on behalf of eugenevinitsky March 24, 2026 06:20 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

eugenevinitsky changed the title ~~Add trajectory rollout and rendering infrastructure~~ [WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure Mar 24, 2026

eugenevinitsky and others added 2 commits March 24, 2026 02:40

eugenevinitsky closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364

[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364
eugenevinitsky wants to merge 5 commits into
ricky/actions_trajectory_3.0from
ev/add_render

eugenevinitsky commented Mar 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Copilot AI Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-void forward(DriveNet *net, Drive *env, float *observations, int *actions) {
+static inline void forward_3(DriveNet *net, float *observations, int *actions) {
+    forward_with_env(net, NULL, observations, actions);
+}
+#define GET_FORWARDFUNC(_1,_2,_3,_4,NAME,...) NAME
+#define forward(...) GET_FORWARDFUNC(__VA_ARGS__, forward_with_env, forward_3)(__VA_ARGS__)
+void forward_with_env(DriveNet *net, Drive *env, float *observations, int *actions) {

-        // Process past actions trajectory
-        for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
-            net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];
+        // Process past actions trajectory (if present in the observation buffer)
+        int agent_obs_end = (b + 1) * obs_stride;
+        int past_actions_end = past_actions_offset + PAST_ACTIONS_DIM;
+        int has_past_actions = past_actions_end <= agent_obs_end;
+        if (has_past_actions) {
+            for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
+                net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];
+            }
+        } else {
+            // If the buffer does not contain past actions, fall back to zeros
+            for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
+                net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = 0.0f;
+            }

Conversation

eugenevinitsky commented Mar 24, 2026

Summary

How to use

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants