Skip to content

WIP: improve ONNX outputs#382

Open
eugenevinitsky wants to merge 5 commits into3.0from
ev/onnx-fake-trajectory
Open

WIP: improve ONNX outputs#382
eugenevinitsky wants to merge 5 commits into3.0from
ev/onnx-fake-trajectory

Conversation

@eugenevinitsky
Copy link
Copy Markdown

Summary

Improvements to ONNX export pipeline for PufferDrive.

New features

  • --fake-trajectory: adds trajectory output [B, 80, 3] (x, y, heading) via jerk dynamics rollout
  • --render: exports .bin weights, builds visualize binary, renders mp4 video with gigaflow coefficients
  • scripts/onnx_inference.py: inference utility with observation construction + verification

Infrastructure fixes

  • Refactored dynamics into reusable classic_dynamics_step / jerk_dynamics_step pure functions
  • Fixed visualize --config passthrough (eval_gif was ignoring the flag)
  • Trajectory rendering in visualize binary (ego-only, sky-blue lines)
  • DDP module. prefix stripping in checkpoint loading

Verification

  • --verify flag runs both ONNX and PyTorch on real env observations
  • Checks observation normalization round-trip (parse → rebuild → diff)
  • Checks model output match (logits, value, action)
  • All tests pass with max diff < 1e-5

…sable functions

ONNX export:
- --fake-trajectory: outputs [B, steps, 3] (x, y, heading) via jerk dynamics
- --render: custom eval loop with trajectory visualization in raylib
- DDP module. prefix stripping in checkpoint loading

Dynamics refactor:
- Extract classic_dynamics_step and jerk_dynamics_step as pure functions
  taking DynState + action → DynState
- move_dynamics and vec_set_trajectory both use the same functions
- No more duplicated physics code

Trajectory rendering:
- vec_set_trajectory supports both classic and jerk dynamics
- c_render draws predicted trajectories as red lines/circles
- set_predicted_trajectories Python method on Drive class
…ectory

Repeating a jerk action 80 times causes runaway acceleration (jerk
controls rate of change of acceleration). Now applies the action on
step 0 and zero-jerk for remaining steps, maintaining the resulting
acceleration without further accumulation.
--render exports .bin weights, builds ./visualize, and runs it to
produce mp4 videos. Trajectory rendering in c_render limited to
ego vehicle only.
- drivenet.h: roll out trajectory after forward pass using reusable
  dynamics step functions, store in predicted_traj_x/y
- visualize.c: draw ego trajectory as sky-blue lines in renderTopDownView,
  re-enter BeginMode3D after draw_scene returns
- export_onnx.py: generate temp INI with gigaflow-matching reward
  coefficients pinned (min=max) for consistent evaluation rendering
- export_onnx.py: --fake-trajectory, --render with gigaflow coefficients
- onnx_inference.py: DrivePolicyONNX class + verification against PyTorch
- Refactored dynamics into reusable classic_dynamics_step/jerk_dynamics_step
- Trajectory rendering in visualize binary (ego-only, sky-blue)
- Fixed visualize --config passthrough (was hardcoded in eval_gif)
Copilot AI review requested due to automatic review settings March 30, 2026 22:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the PufferDrive ONNX export + visualization pipeline by adding optional trajectory outputs, a rendering path for exported weights, and a standalone ONNX inference/verification utility. It also refactors/centralizes dynamics stepping so both simulation and visualization can share the same kinematics logic.

Changes:

  • Add scripts/onnx_inference.py to build observations, run ONNX inference, and verify ONNX vs PyTorch outputs on real env observations.
  • Extend scripts/export_onnx.py with --fake-trajectory (extra ONNX output) and --render (export .bin + run visualize).
  • Add reusable “pure” dynamics step functions and predicted-trajectory plumbing for rendering (Python binding + C rendering + drivenet forward).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/onnx_inference.py New ONNX inference + env-observation round-trip and ONNX-vs-PyTorch verification utility.
scripts/export_onnx.py Adds trajectory output option and an optional post-export rendering workflow; improves checkpoint key stripping.
pufferlib/ocean/env_binding.h Adds a new Python-exposed entrypoint to push predicted trajectories into the C env for rendering.
pufferlib/ocean/drive/visualize.c Enables --config passthrough and draws predicted trajectories during video rendering.
pufferlib/ocean/drive/drivenet.h Adds predicted-trajectory buffers and rolls them out after action selection.
pufferlib/ocean/drive/drive.py Adds a Python helper to set predicted trajectories across sub-envs for rendering.
pufferlib/ocean/drive/drive.h Adds predicted trajectory storage to Drive, factors dynamics into reusable step functions, and draws trajectories in c_render.
pufferlib/ocean/drive/drive.c Updates calls to the new forward(net, env, ...) signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +285 to +296
else:
from pufferlib.ocean.drive.drive import ACCELERATION_VALUES, STEERING_VALUES

num_steer = 13
accel_idx = action // num_steer
steer_idx = action % num_steer
return {
"acceleration": float(ACCELERATION_VALUES[accel_idx]),
"steering": float(STEERING_VALUES[steer_idx]),
"accel_idx": accel_idx,
"steer_idx": steer_idx,
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_action() (classic branch) imports ACCELERATION_VALUES/STEERING_VALUES from pufferlib.ocean.drive.drive, but those symbols are not defined in drive.py (they exist only in the C header). This will raise an ImportError at runtime when dynamics_model="classic". Consider either hardcoding the arrays here, exposing them from the binding module, or decoding via known sizes + config/constants that exist in Python.

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +197
a_long = ego_state.get("a_long", 0)
obs[0, 8] = a_long / 15.0 if a_long < 0 else a_long / 4.0
obs[0, 9] = ego_state.get("a_lat", 0) / 4.0
obs[0, 10] = 1.0 if ego_state.get("respawned", False) else 0.0
obs[0, 11] = ego_state.get("goal_speed_min", 0)
obs[0, 12] = ego_state.get("goal_speed_max", 0)
obs[0, 13] = min(SPEED_LIMIT / MAX_SPEED, 1.0)
obs[0, 14] = ego_state.get("lane_center_dist", 0)
obs[0, 15] = ego_state.get("lane_angle", 1.0)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_observation() documents/supports an ego_state["speed_limit"], but the observation always uses the module-level constant SPEED_LIMIT (ignoring any provided value). Either use the provided ego_state["speed_limit"] (and normalize it), or remove/clarify the speed_limit field in the docstring and in parse_env_obs_to_ego_state() to avoid misleading round-trip verification.

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +182
# obs[9] = a_lat / JERK_LAT[2]
a_lat = observation[:, 9] * 4.0 # JERK_LAT[2] = 4
# obs[5] = sim_length / MAX_VEH_LEN
wheelbase = observation[:, 5] * 5.5 # MAX_VEH_LEN approximate

# logits[0] for single-head multi-discrete (jerk: MultiDiscrete([12]))
logits_tensor = logits[0] if isinstance(logits, (tuple, list)) else logits
trajectory = self.rollout(logits_tensor, speed, a_long, a_lat, steering_angle, wheelbase)
return logits, value, new_h, new_c, trajectory
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fake-trajectory rollout derives wheelbase from observation[:, 5] using a hardcoded * 5.5, but observation[5] is normalized vehicle length (MAX_VEH_LEN=6 in C) and the environment sets wheelbase = 0.6 * length. This makes the exported trajectory inconsistent with the env dynamics. Consider computing wheelbase as 0.6 * (observation[:,5] * MAX_VEH_LEN) (or otherwise matching the env’s wheelbase logic/constants).

Copilot uses AI. Check for mistakes.
"--output-topdown", output_video,
]
print(f"Running: {' '.join(cmd)}")
subprocess.run(cmd)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--render path runs subprocess.run(cmd) without check=True, so failures in the visualize binary (missing deps, nonzero exit, etc.) won’t fail the export script and may silently produce no/partial output. Consider using check=True (and/or capturing stdout/stderr) so CI/users reliably detect render failures.

Suggested change
subprocess.run(cmd)
subprocess.run(cmd, check=True)

Copilot uses AI. Check for mistakes.
Comment on lines +538 to +553
VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));

Env *env = vec->envs[env_id];
int num_agents = env->active_agent_count;
int *actions = (int *)PyArray_DATA(action_arr);
int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);

free(env->predicted_traj_x);
free(env->predicted_traj_y);
int total = num_agents * traj_len;
env->predicted_traj_x = (float *)calloc(total, sizeof(float));
env->predicted_traj_y = (float *)calloc(total, sizeof(float));
env->predicted_traj_len = traj_len;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vec_set_trajectory directly casts args[0] to VecEnv* and args[2] to PyArrayObject* without using unpack_vecenv()/type checks, and it does not validate env_id bounds or that action_arr is contiguous int32 with expected length num_agents * traj_len. A malformed call could segfault or read out-of-bounds. Consider reusing unpack_vecenv(), validating 0 <= env_id < vec->num_envs, and checking PyArray_Check, dtype, contiguity, and size before accessing PyArray_DATA.

Suggested change
VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));
Env *env = vec->envs[env_id];
int num_agents = env->active_agent_count;
int *actions = (int *)PyArray_DATA(action_arr);
int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
free(env->predicted_traj_x);
free(env->predicted_traj_y);
int total = num_agents * traj_len;
env->predicted_traj_x = (float *)calloc(total, sizeof(float));
env->predicted_traj_y = (float *)calloc(total, sizeof(float));
env->predicted_traj_len = traj_len;
// Reuse unpack_vecenv for consistency and safety.
VecEnv *vec = unpack_vecenv(args);
if (!vec) {
// unpack_vecenv is expected to set an appropriate Python exception.
return NULL;
}
PyObject *env_id_obj = PyTuple_GetItem(args, 1);
if (!env_id_obj) {
return NULL;
}
long env_id = PyLong_AsLong(env_id_obj);
if (PyErr_Occurred()) {
return NULL;
}
if (env_id < 0 || env_id >= vec->num_envs) {
PyErr_SetString(PyExc_IndexError, "env_id out of range");
return NULL;
}
PyObject *traj_obj = PyTuple_GetItem(args, 2);
if (!traj_obj) {
return NULL;
}
if (!PyArray_Check(traj_obj)) {
PyErr_SetString(PyExc_TypeError, "action_traj must be a NumPy array");
return NULL;
}
PyArrayObject *action_arr = (PyArrayObject *)traj_obj;
if (PyArray_TYPE(action_arr) != NPY_INT32) {
PyErr_SetString(PyExc_TypeError, "action_traj must have dtype int32");
return NULL;
}
if (!PyArray_ISCONTIGUOUS(action_arr)) {
PyErr_SetString(PyExc_ValueError, "action_traj must be a contiguous array");
return NULL;
}
PyObject *traj_len_obj = PyTuple_GetItem(args, 3);
if (!traj_len_obj) {
return NULL;
}
long traj_len = PyLong_AsLong(traj_len_obj);
if (PyErr_Occurred()) {
return NULL;
}
if (traj_len < 0) {
PyErr_SetString(PyExc_ValueError, "traj_len must be non-negative");
return NULL;
}
Env *env = vec->envs[env_id];
int num_agents = env->active_agent_count;
npy_intp expected_total = (npy_intp)num_agents * (npy_intp)traj_len;
if (PyArray_SIZE(action_arr) != expected_total) {
PyErr_SetString(PyExc_ValueError, "action_traj size does not match num_agents * traj_len");
return NULL;
}
int *actions = (int *)PyArray_DATA(action_arr);
int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
free(env->predicted_traj_x);
free(env->predicted_traj_y);
int total = num_agents * (int)traj_len;
env->predicted_traj_x = (float *)calloc(total, sizeof(float));
env->predicted_traj_y = (float *)calloc(total, sizeof(float));
env->predicted_traj_len = (int)traj_len;

Copilot uses AI. Check for mistakes.
DriveNet *net = init_drivenet(weights, env.active_agent_count, env.dynamics_model, env.reward_conditioning);

int frame_count = env.episode_length > 0 ? env.episode_length : TRAJECTORY_LENGTH_DEFAULT;
printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unconditional printf("episode_length=%d, frame_count=%d\n", ...) adds noisy debug output to visualize runs and can clutter logs/automation. Consider removing it or gating it behind a verbose/debug flag.

Suggested change
printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);
const char *visualize_debug = getenv("PUFFER_VISUALIZE_DEBUG");
bool debug_logging = visualize_debug && visualize_debug[0] != '\0' && visualize_debug[0] != '0';
if (debug_logging) {
printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants