Skip to content

[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364

Open
eugenevinitsky wants to merge 5 commits intoricky/actions_trajectory_3.0from
ev/add_render
Open

[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364
eugenevinitsky wants to merge 5 commits intoricky/actions_trajectory_3.0from
ev/add_render

Conversation

@eugenevinitsky
Copy link
Copy Markdown

Summary

  • Add predicted trajectory visualization to the renderer
  • Policy outputs 20 timesteps of actions; C code rolls them out through classic dynamics and draws as red circles/lines
  • Top-down orthographic camera with mouse drag panning and scroll zoom
  • vec_set_trajectory binding: takes discrete action indices, rolls out through classic dynamics, stores x/y for rendering
  • set_predicted_trajectories() Python method slices actions per sub-env via agent_offsets
  • Fix: re-enter BeginMode3D after draw_scene (which calls EndMode3D internally) so trajectories render in world coordinates

How to use

# Get actor output from torch policy
logits = policy(obs)  # [num_agents, 20 * 91]
traj_actions = logits.reshape(num_agents, 20, 91).argmax(dim=-1).numpy()

# Set trajectories (C rolls out dynamics internally)
env.set_predicted_trajectories(traj_actions)

# Render with trajectories visible
env.render()

Test plan

  • Verified trajectory coordinates match agent positions via debug prints
  • Verified classic dynamics rollout matches move_dynamics in drive.h
  • Visual test on Town01 with model_puffer_drive_006000.pt

🤖 Generated with Claude Code

eugenevinitsky and others added 3 commits March 23, 2026 22:06
- Update DriveNet to match Ricky's actions_trajectory policy:
  - Add past_actions_encoder (Linear→ReLU→Linear, 40→64→64)
  - 4-way concat (ego, road, partner, past_actions) → shared_embedding
  - Actor outputs trajectory_len × action_size logits
- Roll out predicted trajectory using classic dynamics model
  after each forward pass, store in predicted_trajectory_x/y
- Draw predicted trajectories as sky-blue lines in renderTopDownView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add predicted_traj_x/y fields to Drive struct
- Add vec_set_trajectory binding: takes action indices, rolls out
  through classic dynamics in C, stores x/y positions for rendering
- c_render draws predicted trajectories as sky-blue lines/spheres
- Add headless ffmpeg rendering functions (start/stop/write_render_pipe)
- Python set_predicted_trajectories() method on Drive class

Usage: after getting actor output from torch policy, extract argmax
actions for each trajectory step, pass to env.set_predicted_trajectories(),
then call env.render().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
draw_scene() calls EndMode3D() internally, so trajectory drawing was
happening in 2D screen space instead of world coordinates. Now re-enters
BeginMode3D with the same camera after draw_scene returns.

Also:
- Top-down orthographic camera centered on map
- Mouse drag to pan, scroll to zoom
- Per-sub-env trajectory slicing via agent_offsets
- rlDisableDepthTest so trajectories render on top

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 24, 2026 06:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds infrastructure to visualize predicted action rollouts as trajectories in the Drive renderer, including Python→C bindings to set per-agent trajectory action sequences and new top-down camera controls for panning/zooming.

Changes:

  • Add vec_set_trajectory binding to roll out discrete actions through classic dynamics and store x/y points for rendering.
  • Add Python Drive.set_predicted_trajectories() helper and extend Drive.render() to support selecting env_id.
  • Extend C rendering/inference paths to draw predicted trajectories and (separately) add trajectory buffers to DriveNet/top-down visualization.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pufferlib/ocean/env_binding.h Adds vec_set_trajectory C-extension method to roll out discrete actions into trajectory points.
pufferlib/ocean/drive/visualize.c Extends top-down visualization to draw predicted trajectories and updates forward() call signature usage.
pufferlib/ocean/drive/drivenet.h Changes network output shape to include trajectory-length action logits and rolls out predicted trajectories for rendering; adds past-actions encoder path.
pufferlib/ocean/drive/drive.py Adds set_predicted_trajectories() and updates render() to accept env_id.
pufferlib/ocean/drive/drive.h Adds predicted-trajectory buffers to Drive and updates c_render() with a top-down ortho camera + trajectory drawing; adds ffmpeg pipe helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3934 to +3961
pid_t pid = fork();
if (pid == 0) { // child: run ffmpeg
close(pipefd[1]);
dup2(pipefd[0], STDIN_FILENO);
close(pipefd[0]);
for (int fd = 3; fd < 256; fd++)
close(fd);
char size_str[64];
snprintf(size_str, sizeof(size_str), "%dx%d", width, height);
execlp("ffmpeg", "ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgba", "-s", size_str, "-r", "30", "-i", "-",
"-c:v", "libx264", "-pix_fmt", "yuv420p", "-preset", "ultrafast", "-crf", "23", "-loglevel", "error",
"/tmp/pufferdrive_render.mp4", NULL);
fprintf(stderr, "execlp ffmpeg failed\n");
_exit(1);
}
close(pipefd[0]);
env->render_pid = pid;
env->render_pipe_fd = pipefd[1];
}

void stop_render_pipe(Drive *env) {
if (env->render_pipe_fd > 0) {
close(env->render_pipe_fd);
env->render_pipe_fd = 0;
}
if (env->render_pid > 0) {
waitpid(env->render_pid, NULL, 0);
env->render_pid = 0;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start_render_pipe/stop_render_pipe use fork()/waitpid() but (1) fork() failure (pid == -1) isn’t handled, and (2) waitpid() requires <sys/wait.h>, which isn’t included in drive.h today (only <unistd.h>). This can cause build failures or leaving zombie ffmpeg processes on fork/exec errors. Please handle pid==-1 and add the required include(s) (or move these helpers to a .c file that includes them).

Copilot uses AI. Check for mistakes.
Comment on lines +4024 to +4030
// First: draw GREEN circles at every agent position to verify coordinate alignment
for (int i = 0; i < env->active_agent_count; i++) {
int agent_idx = env->active_agent_indices[i];
Agent *agent = &env->agents[agent_idx];
DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f},
5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN);
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c_render currently always draws large GREEN debug circles at every agent position. This looks like temporary verification instrumentation and will clutter normal renders / videos and adds per-frame draw cost. Please guard this behind a debug flag (e.g., env->render_debug) or remove it once coordinate alignment is confirmed.

Suggested change
// First: draw GREEN circles at every agent position to verify coordinate alignment
for (int i = 0; i < env->active_agent_count; i++) {
int agent_idx = env->active_agent_indices[i];
Agent *agent = &env->agents[agent_idx];
DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f},
5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN);
}

Copilot uses AI. Check for mistakes.
}

void forward(DriveNet *net, float *observations, int *actions) {
void forward(DriveNet *net, Drive *env, float *observations, int *actions) {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing forward() to take an additional Drive *env parameter requires updating all call sites; currently pufferlib/ocean/drive/drive.c still calls forward(net, observations, actions) and will not compile. Please update remaining callers (or provide a backward-compatible wrapper, e.g., forward(net, NULL, observations, actions)).

Suggested change
void forward(DriveNet *net, Drive *env, float *observations, int *actions) {
static inline void forward_3(DriveNet *net, float *observations, int *actions) {
forward_with_env(net, NULL, observations, actions);
}
#define GET_FORWARDFUNC(_1,_2,_3,_4,NAME,...) NAME
#define forward(...) GET_FORWARDFUNC(__VA_ARGS__, forward_with_env, forward_3)(__VA_ARGS__)
void forward_with_env(DriveNet *net, Drive *env, float *observations, int *actions) {

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +233
// Process past actions trajectory
for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forward() now assumes each agent's observation includes an extra PAST_ACTIONS_DIM appended (obs_stride includes it and past_actions_offset is read), but Drive currently allocates/fills observations with max_obs = ego + partner + road only (see drive.h allocate()/compute_observations). This makes observations[past_actions_offset + i] read out-of-bounds and can corrupt memory. Please either (a) update the environment observation layout/allocation to include these dims, or (b) remove/guard the past-actions read path unless the buffer is known to contain it.

Suggested change
// Process past actions trajectory
for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];
// Process past actions trajectory (if present in the observation buffer)
int agent_obs_end = (b + 1) * obs_stride;
int past_actions_end = past_actions_offset + PAST_ACTIONS_DIM;
int has_past_actions = past_actions_end <= agent_obs_end;
if (has_past_actions) {
for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i];
}
} else {
// If the buffer does not contain past actions, fall back to zeros
for (int i = 0; i < PAST_ACTIONS_DIM; i++) {
net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = 0.0f;
}

Copilot uses AI. Check for mistakes.
Comment on lines +324 to +360
// Roll out predicted trajectory using classic dynamics
if (env != NULL) {
int traj_len = net->trajectory_len;
int action_size = net->action_size;
int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);

for (int b = 0; b < net->num_agents; b++) {
int agent_idx = env->active_agent_indices[b];
Agent *agent = &env->agents[agent_idx];

// Start from current agent state
float x = agent->sim_x;
float y = agent->sim_y;
float heading = agent->sim_heading;
float vx = agent->sim_vx;
float vy = agent->sim_vy;

float *actor_out = &net->actor->output[b * traj_len * action_size];

for (int t = 0; t < traj_len; t++) {
// Find argmax action for this timestep
float *logits = &actor_out[t * action_size];
int best_action = 0;
float best_val = logits[0];
for (int a = 1; a < action_size; a++) {
if (logits[a] > best_val) {
best_val = logits[a];
best_action = a;
}
}

// Decode action into acceleration + steering (classic model)
int accel_idx = best_action / num_steer;
int steer_idx = best_action % num_steer;
float acceleration = ACCELERATION_VALUES[accel_idx];
float steering = STEERING_VALUES[steer_idx];

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trajectory rollout section always decodes actions using classic ACCELERATION_VALUES/STEERING_VALUES, even when the env/net is configured for JERK dynamics. That will produce incorrect trajectories (and makes this code path implicitly classic-only). Please gate this rollout on env->dynamics_model == CLASSIC (or implement the corresponding JERK rollout using JERK_LONG/JERK_LAT) so rendering stays consistent across dynamics models.

Copilot uses AI. Check for mistakes.

def render(self):
binding.vec_render(self.c_envs, 0)
def render(self, env_id=0):
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

render(env_id) forwards env_id straight into binding.vec_render, which does not bounds-check env_id before indexing vec->envs[env_id]. Passing an invalid env_id can segfault the process. Please add a Python-side check (0 <= env_id < self.num_envs) and raise a ValueError early (or add the bounds check in vec_render for defense-in-depth).

Suggested change
def render(self, env_id=0):
def render(self, env_id=0):
if not (0 <= env_id < self.num_envs):
raise ValueError(f"env_id ({env_id}) must be in range [0, {self.num_envs})")

Copilot uses AI. Check for mistakes.
Comment on lines +546 to +559
static PyObject *vec_set_trajectory(PyObject *self, PyObject *args) {
if (PyTuple_Size(args) != 4) {
PyErr_SetString(PyExc_TypeError, "vec_set_trajectory requires 4 args: vec_env, env_id, action_traj, traj_len");
return NULL;
}
VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));

Env *env = vec->envs[env_id];
int num_agents = env->active_agent_count;
int *actions = (int *)PyArray_DATA(action_arr);
int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_set_trajectory casts args directly (vec handle, env_id, action_traj) without using unpack_vecenv / type-checking / bounds-checking. An invalid handle, out-of-range env_id, non-numpy action_traj, non-contiguous array, or wrong dtype/size can lead to segfaults via vec->envs[env_id] and PyArray_DATA. Please validate vec!=NULL, 0<=env_idnum_envs, traj_len>0, and coerce/verify action_traj as a contiguous NPY_INT32 array with length == active_agent_count*traj_len (raising Python exceptions on mismatch).

Copilot uses AI. Check for mistakes.
Comment on lines +579 to +584
for (int t = 0; t < traj_len; t++) {
int action_val = actions[i * traj_len + t];
int accel_idx = action_val / num_steer;
int steer_idx = action_val % num_steer;
float acceleration = ACCELERATION_VALUES[accel_idx];
float steering = STEERING_VALUES[steer_idx];
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action_val is used to index ACCELERATION_VALUES and STEERING_VALUES without validating bounds. If Python passes an out-of-range action index (or a negative value), accel_idx/steer_idx can go out of bounds and read invalid memory. Please validate 0 <= action_val < (num_accel * num_steer) (and handle invalid inputs by clamping or raising a Python ValueError).

Copilot uses AI. Check for mistakes.
Comment on lines 145 to 147
// Draw agent trajs
if (trajectories) {
for (int i = 0; i < frame_count; i++) {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draw_scene() ends the 3D mode internally (see drive.h), but this function still calls EndMode3D() after draw_scene(). That double-EndMode3D breaks raylib render state and can cause the newly added predicted-trajectory drawing to render inconsistently. Please remove the redundant EndMode3D in this function (or refactor draw_scene to not manage 3D mode).

Copilot uses AI. Check for mistakes.
Comment on lines +355 to +358
// Predicted trajectory visualization (set from Python, drawn by c_render)
float *predicted_traj_x; // [num_agents * traj_len]
float *predicted_traj_y; // [num_agents * traj_len]
int predicted_traj_len; // steps per agent (0 = no trajectory to draw)
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicted_traj_x/predicted_traj_y are heap-allocated from Python (vec_set_trajectory) but are not freed anywhere in Drive teardown (c_close/free_allocated). This introduces a persistent leak across env lifetimes and can accumulate if trajectories are updated/reset frequently. Please free these buffers (and reset predicted_traj_len/pointers) in c_close/free_allocated (and/or when resetting the env).

Copilot uses AI. Check for mistakes.
@eugenevinitsky eugenevinitsky changed the title Add trajectory rollout and rendering infrastructure [WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure Mar 24, 2026
eugenevinitsky and others added 2 commits March 24, 2026 02:40
- Remove green debug circles from c_render (leftover instrumentation)
- Fix comment: mouse drag uses left button, not right
- Free predicted_traj_x/y in c_close (memory leak)
- Fix forward() call sites in drive.c to use new 4-arg signature
- Remove double EndMode3D in visualize.c renderTopDownView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ajectories

- Eval loop now handles trajectory policies: concatenates past actions buffer
  to observations, uses sample_logits_action_sequence, and calls
  set_predicted_trajectories for visualization
- Default actions_trajectory_length changed from 80 to 20 in torch.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants