Conversation
…sable functions ONNX export: - --fake-trajectory: outputs [B, steps, 3] (x, y, heading) via jerk dynamics - --render: custom eval loop with trajectory visualization in raylib - DDP module. prefix stripping in checkpoint loading Dynamics refactor: - Extract classic_dynamics_step and jerk_dynamics_step as pure functions taking DynState + action → DynState - move_dynamics and vec_set_trajectory both use the same functions - No more duplicated physics code Trajectory rendering: - vec_set_trajectory supports both classic and jerk dynamics - c_render draws predicted trajectories as red lines/circles - set_predicted_trajectories Python method on Drive class
…ectory Repeating a jerk action 80 times causes runaway acceleration (jerk controls rate of change of acceleration). Now applies the action on step 0 and zero-jerk for remaining steps, maintaining the resulting acceleration without further accumulation.
--render exports .bin weights, builds ./visualize, and runs it to produce mp4 videos. Trajectory rendering in c_render limited to ego vehicle only.
- drivenet.h: roll out trajectory after forward pass using reusable dynamics step functions, store in predicted_traj_x/y - visualize.c: draw ego trajectory as sky-blue lines in renderTopDownView, re-enter BeginMode3D after draw_scene returns - export_onnx.py: generate temp INI with gigaflow-matching reward coefficients pinned (min=max) for consistent evaluation rendering
- export_onnx.py: --fake-trajectory, --render with gigaflow coefficients - onnx_inference.py: DrivePolicyONNX class + verification against PyTorch - Refactored dynamics into reusable classic_dynamics_step/jerk_dynamics_step - Trajectory rendering in visualize binary (ego-only, sky-blue) - Fixed visualize --config passthrough (was hardcoded in eval_gif)
There was a problem hiding this comment.
Pull request overview
This PR updates the PufferDrive ONNX export + visualization pipeline by adding optional trajectory outputs, a rendering path for exported weights, and a standalone ONNX inference/verification utility. It also refactors/centralizes dynamics stepping so both simulation and visualization can share the same kinematics logic.
Changes:
- Add
scripts/onnx_inference.pyto build observations, run ONNX inference, and verify ONNX vs PyTorch outputs on real env observations. - Extend
scripts/export_onnx.pywith--fake-trajectory(extra ONNX output) and--render(export.bin+ runvisualize). - Add reusable “pure” dynamics step functions and predicted-trajectory plumbing for rendering (Python binding + C rendering + drivenet forward).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/onnx_inference.py | New ONNX inference + env-observation round-trip and ONNX-vs-PyTorch verification utility. |
| scripts/export_onnx.py | Adds trajectory output option and an optional post-export rendering workflow; improves checkpoint key stripping. |
| pufferlib/ocean/env_binding.h | Adds a new Python-exposed entrypoint to push predicted trajectories into the C env for rendering. |
| pufferlib/ocean/drive/visualize.c | Enables --config passthrough and draws predicted trajectories during video rendering. |
| pufferlib/ocean/drive/drivenet.h | Adds predicted-trajectory buffers and rolls them out after action selection. |
| pufferlib/ocean/drive/drive.py | Adds a Python helper to set predicted trajectories across sub-envs for rendering. |
| pufferlib/ocean/drive/drive.h | Adds predicted trajectory storage to Drive, factors dynamics into reusable step functions, and draws trajectories in c_render. |
| pufferlib/ocean/drive/drive.c | Updates calls to the new forward(net, env, ...) signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else: | ||
| from pufferlib.ocean.drive.drive import ACCELERATION_VALUES, STEERING_VALUES | ||
|
|
||
| num_steer = 13 | ||
| accel_idx = action // num_steer | ||
| steer_idx = action % num_steer | ||
| return { | ||
| "acceleration": float(ACCELERATION_VALUES[accel_idx]), | ||
| "steering": float(STEERING_VALUES[steer_idx]), | ||
| "accel_idx": accel_idx, | ||
| "steer_idx": steer_idx, | ||
| } |
There was a problem hiding this comment.
decode_action() (classic branch) imports ACCELERATION_VALUES/STEERING_VALUES from pufferlib.ocean.drive.drive, but those symbols are not defined in drive.py (they exist only in the C header). This will raise an ImportError at runtime when dynamics_model="classic". Consider either hardcoding the arrays here, exposing them from the binding module, or decoding via known sizes + config/constants that exist in Python.
| a_long = ego_state.get("a_long", 0) | ||
| obs[0, 8] = a_long / 15.0 if a_long < 0 else a_long / 4.0 | ||
| obs[0, 9] = ego_state.get("a_lat", 0) / 4.0 | ||
| obs[0, 10] = 1.0 if ego_state.get("respawned", False) else 0.0 | ||
| obs[0, 11] = ego_state.get("goal_speed_min", 0) | ||
| obs[0, 12] = ego_state.get("goal_speed_max", 0) | ||
| obs[0, 13] = min(SPEED_LIMIT / MAX_SPEED, 1.0) | ||
| obs[0, 14] = ego_state.get("lane_center_dist", 0) | ||
| obs[0, 15] = ego_state.get("lane_angle", 1.0) |
There was a problem hiding this comment.
build_observation() documents/supports an ego_state["speed_limit"], but the observation always uses the module-level constant SPEED_LIMIT (ignoring any provided value). Either use the provided ego_state["speed_limit"] (and normalize it), or remove/clarify the speed_limit field in the docstring and in parse_env_obs_to_ego_state() to avoid misleading round-trip verification.
| # obs[9] = a_lat / JERK_LAT[2] | ||
| a_lat = observation[:, 9] * 4.0 # JERK_LAT[2] = 4 | ||
| # obs[5] = sim_length / MAX_VEH_LEN | ||
| wheelbase = observation[:, 5] * 5.5 # MAX_VEH_LEN approximate | ||
|
|
||
| # logits[0] for single-head multi-discrete (jerk: MultiDiscrete([12])) | ||
| logits_tensor = logits[0] if isinstance(logits, (tuple, list)) else logits | ||
| trajectory = self.rollout(logits_tensor, speed, a_long, a_lat, steering_angle, wheelbase) | ||
| return logits, value, new_h, new_c, trajectory |
There was a problem hiding this comment.
The fake-trajectory rollout derives wheelbase from observation[:, 5] using a hardcoded * 5.5, but observation[5] is normalized vehicle length (MAX_VEH_LEN=6 in C) and the environment sets wheelbase = 0.6 * length. This makes the exported trajectory inconsistent with the env dynamics. Consider computing wheelbase as 0.6 * (observation[:,5] * MAX_VEH_LEN) (or otherwise matching the env’s wheelbase logic/constants).
| "--output-topdown", output_video, | ||
| ] | ||
| print(f"Running: {' '.join(cmd)}") | ||
| subprocess.run(cmd) |
There was a problem hiding this comment.
--render path runs subprocess.run(cmd) without check=True, so failures in the visualize binary (missing deps, nonzero exit, etc.) won’t fail the export script and may silently produce no/partial output. Consider using check=True (and/or capturing stdout/stderr) so CI/users reliably detect render failures.
| subprocess.run(cmd) | |
| subprocess.run(cmd, check=True) |
| VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0)); | ||
| int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1)); | ||
| PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2); | ||
| int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3)); | ||
|
|
||
| Env *env = vec->envs[env_id]; | ||
| int num_agents = env->active_agent_count; | ||
| int *actions = (int *)PyArray_DATA(action_arr); | ||
| int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]); | ||
|
|
||
| free(env->predicted_traj_x); | ||
| free(env->predicted_traj_y); | ||
| int total = num_agents * traj_len; | ||
| env->predicted_traj_x = (float *)calloc(total, sizeof(float)); | ||
| env->predicted_traj_y = (float *)calloc(total, sizeof(float)); | ||
| env->predicted_traj_len = traj_len; |
There was a problem hiding this comment.
vec_set_trajectory directly casts args[0] to VecEnv* and args[2] to PyArrayObject* without using unpack_vecenv()/type checks, and it does not validate env_id bounds or that action_arr is contiguous int32 with expected length num_agents * traj_len. A malformed call could segfault or read out-of-bounds. Consider reusing unpack_vecenv(), validating 0 <= env_id < vec->num_envs, and checking PyArray_Check, dtype, contiguity, and size before accessing PyArray_DATA.
| VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0)); | |
| int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1)); | |
| PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2); | |
| int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3)); | |
| Env *env = vec->envs[env_id]; | |
| int num_agents = env->active_agent_count; | |
| int *actions = (int *)PyArray_DATA(action_arr); | |
| int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]); | |
| free(env->predicted_traj_x); | |
| free(env->predicted_traj_y); | |
| int total = num_agents * traj_len; | |
| env->predicted_traj_x = (float *)calloc(total, sizeof(float)); | |
| env->predicted_traj_y = (float *)calloc(total, sizeof(float)); | |
| env->predicted_traj_len = traj_len; | |
| // Reuse unpack_vecenv for consistency and safety. | |
| VecEnv *vec = unpack_vecenv(args); | |
| if (!vec) { | |
| // unpack_vecenv is expected to set an appropriate Python exception. | |
| return NULL; | |
| } | |
| PyObject *env_id_obj = PyTuple_GetItem(args, 1); | |
| if (!env_id_obj) { | |
| return NULL; | |
| } | |
| long env_id = PyLong_AsLong(env_id_obj); | |
| if (PyErr_Occurred()) { | |
| return NULL; | |
| } | |
| if (env_id < 0 || env_id >= vec->num_envs) { | |
| PyErr_SetString(PyExc_IndexError, "env_id out of range"); | |
| return NULL; | |
| } | |
| PyObject *traj_obj = PyTuple_GetItem(args, 2); | |
| if (!traj_obj) { | |
| return NULL; | |
| } | |
| if (!PyArray_Check(traj_obj)) { | |
| PyErr_SetString(PyExc_TypeError, "action_traj must be a NumPy array"); | |
| return NULL; | |
| } | |
| PyArrayObject *action_arr = (PyArrayObject *)traj_obj; | |
| if (PyArray_TYPE(action_arr) != NPY_INT32) { | |
| PyErr_SetString(PyExc_TypeError, "action_traj must have dtype int32"); | |
| return NULL; | |
| } | |
| if (!PyArray_ISCONTIGUOUS(action_arr)) { | |
| PyErr_SetString(PyExc_ValueError, "action_traj must be a contiguous array"); | |
| return NULL; | |
| } | |
| PyObject *traj_len_obj = PyTuple_GetItem(args, 3); | |
| if (!traj_len_obj) { | |
| return NULL; | |
| } | |
| long traj_len = PyLong_AsLong(traj_len_obj); | |
| if (PyErr_Occurred()) { | |
| return NULL; | |
| } | |
| if (traj_len < 0) { | |
| PyErr_SetString(PyExc_ValueError, "traj_len must be non-negative"); | |
| return NULL; | |
| } | |
| Env *env = vec->envs[env_id]; | |
| int num_agents = env->active_agent_count; | |
| npy_intp expected_total = (npy_intp)num_agents * (npy_intp)traj_len; | |
| if (PyArray_SIZE(action_arr) != expected_total) { | |
| PyErr_SetString(PyExc_ValueError, "action_traj size does not match num_agents * traj_len"); | |
| return NULL; | |
| } | |
| int *actions = (int *)PyArray_DATA(action_arr); | |
| int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]); | |
| free(env->predicted_traj_x); | |
| free(env->predicted_traj_y); | |
| int total = num_agents * (int)traj_len; | |
| env->predicted_traj_x = (float *)calloc(total, sizeof(float)); | |
| env->predicted_traj_y = (float *)calloc(total, sizeof(float)); | |
| env->predicted_traj_len = (int)traj_len; |
| DriveNet *net = init_drivenet(weights, env.active_agent_count, env.dynamics_model, env.reward_conditioning); | ||
|
|
||
| int frame_count = env.episode_length > 0 ? env.episode_length : TRAJECTORY_LENGTH_DEFAULT; | ||
| printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count); |
There was a problem hiding this comment.
The unconditional printf("episode_length=%d, frame_count=%d\n", ...) adds noisy debug output to visualize runs and can clutter logs/automation. Consider removing it or gating it behind a verbose/debug flag.
| printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count); | |
| const char *visualize_debug = getenv("PUFFER_VISUALIZE_DEBUG"); | |
| bool debug_logging = visualize_debug && visualize_debug[0] != '\0' && visualize_debug[0] != '0'; | |
| if (debug_logging) { | |
| printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count); | |
| } |
Summary
Improvements to ONNX export pipeline for PufferDrive.
New features
--fake-trajectory: adds trajectory output [B, 80, 3] (x, y, heading) via jerk dynamics rollout--render: exports .bin weights, builds visualize binary, renders mp4 video with gigaflow coefficientsscripts/onnx_inference.py: inference utility with observation construction + verificationInfrastructure fixes
classic_dynamics_step/jerk_dynamics_steppure functionsvisualize --configpassthrough (eval_gif was ignoring the flag)module.prefix stripping in checkpoint loadingVerification
--verifyflag runs both ONNX and PyTorch on real env observations