[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364
[WIP DO NOT MERGE OR REVIEW] Add trajectory rollout and rendering infrastructure#364eugenevinitsky wants to merge 5 commits intoricky/actions_trajectory_3.0from
Conversation
- Update DriveNet to match Ricky's actions_trajectory policy: - Add past_actions_encoder (Linear→ReLU→Linear, 40→64→64) - 4-way concat (ego, road, partner, past_actions) → shared_embedding - Actor outputs trajectory_len × action_size logits - Roll out predicted trajectory using classic dynamics model after each forward pass, store in predicted_trajectory_x/y - Draw predicted trajectories as sky-blue lines in renderTopDownView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add predicted_traj_x/y fields to Drive struct - Add vec_set_trajectory binding: takes action indices, rolls out through classic dynamics in C, stores x/y positions for rendering - c_render draws predicted trajectories as sky-blue lines/spheres - Add headless ffmpeg rendering functions (start/stop/write_render_pipe) - Python set_predicted_trajectories() method on Drive class Usage: after getting actor output from torch policy, extract argmax actions for each trajectory step, pass to env.set_predicted_trajectories(), then call env.render(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
draw_scene() calls EndMode3D() internally, so trajectory drawing was happening in 2D screen space instead of world coordinates. Now re-enters BeginMode3D with the same camera after draw_scene returns. Also: - Top-down orthographic camera centered on map - Mouse drag to pan, scroll to zoom - Per-sub-env trajectory slicing via agent_offsets - rlDisableDepthTest so trajectories render on top Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds infrastructure to visualize predicted action rollouts as trajectories in the Drive renderer, including Python→C bindings to set per-agent trajectory action sequences and new top-down camera controls for panning/zooming.
Changes:
- Add
vec_set_trajectorybinding to roll out discrete actions through classic dynamics and store x/y points for rendering. - Add Python
Drive.set_predicted_trajectories()helper and extendDrive.render()to support selectingenv_id. - Extend C rendering/inference paths to draw predicted trajectories and (separately) add trajectory buffers to
DriveNet/top-down visualization.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
pufferlib/ocean/env_binding.h |
Adds vec_set_trajectory C-extension method to roll out discrete actions into trajectory points. |
pufferlib/ocean/drive/visualize.c |
Extends top-down visualization to draw predicted trajectories and updates forward() call signature usage. |
pufferlib/ocean/drive/drivenet.h |
Changes network output shape to include trajectory-length action logits and rolls out predicted trajectories for rendering; adds past-actions encoder path. |
pufferlib/ocean/drive/drive.py |
Adds set_predicted_trajectories() and updates render() to accept env_id. |
pufferlib/ocean/drive/drive.h |
Adds predicted-trajectory buffers to Drive and updates c_render() with a top-down ortho camera + trajectory drawing; adds ffmpeg pipe helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pid_t pid = fork(); | ||
| if (pid == 0) { // child: run ffmpeg | ||
| close(pipefd[1]); | ||
| dup2(pipefd[0], STDIN_FILENO); | ||
| close(pipefd[0]); | ||
| for (int fd = 3; fd < 256; fd++) | ||
| close(fd); | ||
| char size_str[64]; | ||
| snprintf(size_str, sizeof(size_str), "%dx%d", width, height); | ||
| execlp("ffmpeg", "ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgba", "-s", size_str, "-r", "30", "-i", "-", | ||
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-preset", "ultrafast", "-crf", "23", "-loglevel", "error", | ||
| "/tmp/pufferdrive_render.mp4", NULL); | ||
| fprintf(stderr, "execlp ffmpeg failed\n"); | ||
| _exit(1); | ||
| } | ||
| close(pipefd[0]); | ||
| env->render_pid = pid; | ||
| env->render_pipe_fd = pipefd[1]; | ||
| } | ||
|
|
||
| void stop_render_pipe(Drive *env) { | ||
| if (env->render_pipe_fd > 0) { | ||
| close(env->render_pipe_fd); | ||
| env->render_pipe_fd = 0; | ||
| } | ||
| if (env->render_pid > 0) { | ||
| waitpid(env->render_pid, NULL, 0); | ||
| env->render_pid = 0; |
There was a problem hiding this comment.
start_render_pipe/stop_render_pipe use fork()/waitpid() but (1) fork() failure (pid == -1) isn’t handled, and (2) waitpid() requires <sys/wait.h>, which isn’t included in drive.h today (only <unistd.h>). This can cause build failures or leaving zombie ffmpeg processes on fork/exec errors. Please handle pid==-1 and add the required include(s) (or move these helpers to a .c file that includes them).
pufferlib/ocean/drive/drive.h
Outdated
| // First: draw GREEN circles at every agent position to verify coordinate alignment | ||
| for (int i = 0; i < env->active_agent_count; i++) { | ||
| int agent_idx = env->active_agent_indices[i]; | ||
| Agent *agent = &env->agents[agent_idx]; | ||
| DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f}, | ||
| 5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN); | ||
| } |
There was a problem hiding this comment.
c_render currently always draws large GREEN debug circles at every agent position. This looks like temporary verification instrumentation and will clutter normal renders / videos and adds per-frame draw cost. Please guard this behind a debug flag (e.g., env->render_debug) or remove it once coordinate alignment is confirmed.
| // First: draw GREEN circles at every agent position to verify coordinate alignment | |
| for (int i = 0; i < env->active_agent_count; i++) { | |
| int agent_idx = env->active_agent_indices[i]; | |
| Agent *agent = &env->agents[agent_idx]; | |
| DrawCircle3D((Vector3){agent->sim_x, agent->sim_y, agent->sim_z + 0.1f}, | |
| 5.0f, (Vector3){0, 0, 1}, 0.0f, GREEN); | |
| } |
| } | ||
|
|
||
| void forward(DriveNet *net, float *observations, int *actions) { | ||
| void forward(DriveNet *net, Drive *env, float *observations, int *actions) { |
There was a problem hiding this comment.
Changing forward() to take an additional Drive *env parameter requires updating all call sites; currently pufferlib/ocean/drive/drive.c still calls forward(net, observations, actions) and will not compile. Please update remaining callers (or provide a backward-compatible wrapper, e.g., forward(net, NULL, observations, actions)).
| void forward(DriveNet *net, Drive *env, float *observations, int *actions) { | |
| static inline void forward_3(DriveNet *net, float *observations, int *actions) { | |
| forward_with_env(net, NULL, observations, actions); | |
| } | |
| #define GET_FORWARDFUNC(_1,_2,_3,_4,NAME,...) NAME | |
| #define forward(...) GET_FORWARDFUNC(__VA_ARGS__, forward_with_env, forward_3)(__VA_ARGS__) | |
| void forward_with_env(DriveNet *net, Drive *env, float *observations, int *actions) { |
| // Process past actions trajectory | ||
| for (int i = 0; i < PAST_ACTIONS_DIM; i++) { | ||
| net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i]; |
There was a problem hiding this comment.
forward() now assumes each agent's observation includes an extra PAST_ACTIONS_DIM appended (obs_stride includes it and past_actions_offset is read), but Drive currently allocates/fills observations with max_obs = ego + partner + road only (see drive.h allocate()/compute_observations). This makes observations[past_actions_offset + i] read out-of-bounds and can corrupt memory. Please either (a) update the environment observation layout/allocation to include these dims, or (b) remove/guard the past-actions read path unless the buffer is known to contain it.
| // Process past actions trajectory | |
| for (int i = 0; i < PAST_ACTIONS_DIM; i++) { | |
| net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i]; | |
| // Process past actions trajectory (if present in the observation buffer) | |
| int agent_obs_end = (b + 1) * obs_stride; | |
| int past_actions_end = past_actions_offset + PAST_ACTIONS_DIM; | |
| int has_past_actions = past_actions_end <= agent_obs_end; | |
| if (has_past_actions) { | |
| for (int i = 0; i < PAST_ACTIONS_DIM; i++) { | |
| net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = observations[past_actions_offset + i]; | |
| } | |
| } else { | |
| // If the buffer does not contain past actions, fall back to zeros | |
| for (int i = 0; i < PAST_ACTIONS_DIM; i++) { | |
| net->obs_past_actions[b * PAST_ACTIONS_DIM + i] = 0.0f; | |
| } |
| // Roll out predicted trajectory using classic dynamics | ||
| if (env != NULL) { | ||
| int traj_len = net->trajectory_len; | ||
| int action_size = net->action_size; | ||
| int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]); | ||
|
|
||
| for (int b = 0; b < net->num_agents; b++) { | ||
| int agent_idx = env->active_agent_indices[b]; | ||
| Agent *agent = &env->agents[agent_idx]; | ||
|
|
||
| // Start from current agent state | ||
| float x = agent->sim_x; | ||
| float y = agent->sim_y; | ||
| float heading = agent->sim_heading; | ||
| float vx = agent->sim_vx; | ||
| float vy = agent->sim_vy; | ||
|
|
||
| float *actor_out = &net->actor->output[b * traj_len * action_size]; | ||
|
|
||
| for (int t = 0; t < traj_len; t++) { | ||
| // Find argmax action for this timestep | ||
| float *logits = &actor_out[t * action_size]; | ||
| int best_action = 0; | ||
| float best_val = logits[0]; | ||
| for (int a = 1; a < action_size; a++) { | ||
| if (logits[a] > best_val) { | ||
| best_val = logits[a]; | ||
| best_action = a; | ||
| } | ||
| } | ||
|
|
||
| // Decode action into acceleration + steering (classic model) | ||
| int accel_idx = best_action / num_steer; | ||
| int steer_idx = best_action % num_steer; | ||
| float acceleration = ACCELERATION_VALUES[accel_idx]; | ||
| float steering = STEERING_VALUES[steer_idx]; | ||
|
|
There was a problem hiding this comment.
The trajectory rollout section always decodes actions using classic ACCELERATION_VALUES/STEERING_VALUES, even when the env/net is configured for JERK dynamics. That will produce incorrect trajectories (and makes this code path implicitly classic-only). Please gate this rollout on env->dynamics_model == CLASSIC (or implement the corresponding JERK rollout using JERK_LONG/JERK_LAT) so rendering stays consistent across dynamics models.
|
|
||
| def render(self): | ||
| binding.vec_render(self.c_envs, 0) | ||
| def render(self, env_id=0): |
There was a problem hiding this comment.
render(env_id) forwards env_id straight into binding.vec_render, which does not bounds-check env_id before indexing vec->envs[env_id]. Passing an invalid env_id can segfault the process. Please add a Python-side check (0 <= env_id < self.num_envs) and raise a ValueError early (or add the bounds check in vec_render for defense-in-depth).
| def render(self, env_id=0): | |
| def render(self, env_id=0): | |
| if not (0 <= env_id < self.num_envs): | |
| raise ValueError(f"env_id ({env_id}) must be in range [0, {self.num_envs})") |
| static PyObject *vec_set_trajectory(PyObject *self, PyObject *args) { | ||
| if (PyTuple_Size(args) != 4) { | ||
| PyErr_SetString(PyExc_TypeError, "vec_set_trajectory requires 4 args: vec_env, env_id, action_traj, traj_len"); | ||
| return NULL; | ||
| } | ||
| VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0)); | ||
| int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1)); | ||
| PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2); | ||
| int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3)); | ||
|
|
||
| Env *env = vec->envs[env_id]; | ||
| int num_agents = env->active_agent_count; | ||
| int *actions = (int *)PyArray_DATA(action_arr); | ||
| int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]); |
There was a problem hiding this comment.
vec_set_trajectory casts args directly (vec handle, env_id, action_traj) without using unpack_vecenv / type-checking / bounds-checking. An invalid handle, out-of-range env_id, non-numpy action_traj, non-contiguous array, or wrong dtype/size can lead to segfaults via vec->envs[env_id] and PyArray_DATA. Please validate vec!=NULL, 0<=env_idnum_envs, traj_len>0, and coerce/verify action_traj as a contiguous NPY_INT32 array with length == active_agent_count*traj_len (raising Python exceptions on mismatch).
| for (int t = 0; t < traj_len; t++) { | ||
| int action_val = actions[i * traj_len + t]; | ||
| int accel_idx = action_val / num_steer; | ||
| int steer_idx = action_val % num_steer; | ||
| float acceleration = ACCELERATION_VALUES[accel_idx]; | ||
| float steering = STEERING_VALUES[steer_idx]; |
There was a problem hiding this comment.
action_val is used to index ACCELERATION_VALUES and STEERING_VALUES without validating bounds. If Python passes an out-of-range action index (or a negative value), accel_idx/steer_idx can go out of bounds and read invalid memory. Please validate 0 <= action_val < (num_accel * num_steer) (and handle invalid inputs by clamping or raising a Python ValueError).
| // Draw agent trajs | ||
| if (trajectories) { | ||
| for (int i = 0; i < frame_count; i++) { |
There was a problem hiding this comment.
draw_scene() ends the 3D mode internally (see drive.h), but this function still calls EndMode3D() after draw_scene(). That double-EndMode3D breaks raylib render state and can cause the newly added predicted-trajectory drawing to render inconsistently. Please remove the redundant EndMode3D in this function (or refactor draw_scene to not manage 3D mode).
| // Predicted trajectory visualization (set from Python, drawn by c_render) | ||
| float *predicted_traj_x; // [num_agents * traj_len] | ||
| float *predicted_traj_y; // [num_agents * traj_len] | ||
| int predicted_traj_len; // steps per agent (0 = no trajectory to draw) |
There was a problem hiding this comment.
predicted_traj_x/predicted_traj_y are heap-allocated from Python (vec_set_trajectory) but are not freed anywhere in Drive teardown (c_close/free_allocated). This introduces a persistent leak across env lifetimes and can accumulate if trajectories are updated/reset frequently. Please free these buffers (and reset predicted_traj_len/pointers) in c_close/free_allocated (and/or when resetting the env).
- Remove green debug circles from c_render (leftover instrumentation) - Fix comment: mouse drag uses left button, not right - Free predicted_traj_x/y in c_close (memory leak) - Fix forward() call sites in drive.c to use new 4-arg signature - Remove double EndMode3D in visualize.c renderTopDownView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ajectories - Eval loop now handles trajectory policies: concatenates past actions buffer to observations, uses sample_logits_action_sequence, and calls set_predicted_trajectories for visualization - Default actions_trajectory_length changed from 80 to 20 in torch.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
vec_set_trajectorybinding: takes discrete action indices, rolls out through classic dynamics, stores x/y for renderingset_predicted_trajectories()Python method slices actions per sub-env via agent_offsetsBeginMode3Dafterdraw_scene(which callsEndMode3Dinternally) so trajectories render in world coordinatesHow to use
Test plan
move_dynamicsin drive.h🤖 Generated with Claude Code