WIP: improve ONNX outputs by eugenevinitsky · Pull Request #382 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-30T22:16:14Z

Summary

Improvements to ONNX export pipeline for PufferDrive.

New features

--fake-trajectory: adds trajectory output [B, 80, 3] (x, y, heading) via jerk dynamics rollout
--render: exports .bin weights, builds visualize binary, renders mp4 video with gigaflow coefficients
scripts/onnx_inference.py: inference utility with observation construction + verification

Infrastructure fixes

Refactored dynamics into reusable classic_dynamics_step / jerk_dynamics_step pure functions
Fixed visualize --config passthrough (eval_gif was ignoring the flag)
Trajectory rendering in visualize binary (ego-only, sky-blue lines)
DDP module. prefix stripping in checkpoint loading

Verification

--verify flag runs both ONNX and PyTorch on real env observations
Checks observation normalization round-trip (parse → rebuild → diff)
Checks model output match (logits, value, action)
All tests pass with max diff < 1e-5

…sable functions ONNX export: - --fake-trajectory: outputs [B, steps, 3] (x, y, heading) via jerk dynamics - --render: custom eval loop with trajectory visualization in raylib - DDP module. prefix stripping in checkpoint loading Dynamics refactor: - Extract classic_dynamics_step and jerk_dynamics_step as pure functions taking DynState + action → DynState - move_dynamics and vec_set_trajectory both use the same functions - No more duplicated physics code Trajectory rendering: - vec_set_trajectory supports both classic and jerk dynamics - c_render draws predicted trajectories as red lines/circles - set_predicted_trajectories Python method on Drive class

…ectory Repeating a jerk action 80 times causes runaway acceleration (jerk controls rate of change of acceleration). Now applies the action on step 0 and zero-jerk for remaining steps, maintaining the resulting acceleration without further accumulation.

--render exports .bin weights, builds ./visualize, and runs it to produce mp4 videos. Trajectory rendering in c_render limited to ego vehicle only.

- drivenet.h: roll out trajectory after forward pass using reusable dynamics step functions, store in predicted_traj_x/y - visualize.c: draw ego trajectory as sky-blue lines in renderTopDownView, re-enter BeginMode3D after draw_scene returns - export_onnx.py: generate temp INI with gigaflow-matching reward coefficients pinned (min=max) for consistent evaluation rendering

- export_onnx.py: --fake-trajectory, --render with gigaflow coefficients - onnx_inference.py: DrivePolicyONNX class + verification against PyTorch - Refactored dynamics into reusable classic_dynamics_step/jerk_dynamics_step - Trajectory rendering in visualize binary (ego-only, sky-blue) - Fixed visualize --config passthrough (was hardcoded in eval_gif)

Copilot

Pull request overview

This PR updates the PufferDrive ONNX export + visualization pipeline by adding optional trajectory outputs, a rendering path for exported weights, and a standalone ONNX inference/verification utility. It also refactors/centralizes dynamics stepping so both simulation and visualization can share the same kinematics logic.

Changes:

Add scripts/onnx_inference.py to build observations, run ONNX inference, and verify ONNX vs PyTorch outputs on real env observations.
Extend scripts/export_onnx.py with --fake-trajectory (extra ONNX output) and --render (export .bin + run visualize).
Add reusable “pure” dynamics step functions and predicted-trajectory plumbing for rendering (Python binding + C rendering + drivenet forward).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
scripts/onnx_inference.py	New ONNX inference + env-observation round-trip and ONNX-vs-PyTorch verification utility.
scripts/export_onnx.py	Adds trajectory output option and an optional post-export rendering workflow; improves checkpoint key stripping.
pufferlib/ocean/env_binding.h	Adds a new Python-exposed entrypoint to push predicted trajectories into the C env for rendering.
pufferlib/ocean/drive/visualize.c	Enables `--config` passthrough and draws predicted trajectories during video rendering.
pufferlib/ocean/drive/drivenet.h	Adds predicted-trajectory buffers and rolls them out after action selection.
pufferlib/ocean/drive/drive.py	Adds a Python helper to set predicted trajectories across sub-envs for rendering.
pufferlib/ocean/drive/drive.h	Adds predicted trajectory storage to `Drive`, factors dynamics into reusable step functions, and draws trajectories in `c_render`.
pufferlib/ocean/drive/drive.c	Updates calls to the new `forward(net, env, ...)` signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T22:24:24Z

+        else:
+            from pufferlib.ocean.drive.drive import ACCELERATION_VALUES, STEERING_VALUES
+
+            num_steer = 13
+            accel_idx = action // num_steer
+            steer_idx = action % num_steer
+            return {
+                "acceleration": float(ACCELERATION_VALUES[accel_idx]),
+                "steering": float(STEERING_VALUES[steer_idx]),
+                "accel_idx": accel_idx,
+                "steer_idx": steer_idx,
+            }


decode_action() (classic branch) imports ACCELERATION_VALUES/STEERING_VALUES from pufferlib.ocean.drive.drive, but those symbols are not defined in drive.py (they exist only in the C header). This will raise an ImportError at runtime when dynamics_model="classic". Consider either hardcoding the arrays here, exposing them from the binding module, or decoding via known sizes + config/constants that exist in Python.

Copilot · 2026-03-30T22:24:24Z

+            a_long = ego_state.get("a_long", 0)
+            obs[0, 8] = a_long / 15.0 if a_long < 0 else a_long / 4.0
+            obs[0, 9] = ego_state.get("a_lat", 0) / 4.0
+            obs[0, 10] = 1.0 if ego_state.get("respawned", False) else 0.0
+            obs[0, 11] = ego_state.get("goal_speed_min", 0)
+            obs[0, 12] = ego_state.get("goal_speed_max", 0)
+            obs[0, 13] = min(SPEED_LIMIT / MAX_SPEED, 1.0)
+            obs[0, 14] = ego_state.get("lane_center_dist", 0)
+            obs[0, 15] = ego_state.get("lane_angle", 1.0)


build_observation() documents/supports an ego_state["speed_limit"], but the observation always uses the module-level constant SPEED_LIMIT (ignoring any provided value). Either use the provided ego_state["speed_limit"] (and normalize it), or remove/clarify the speed_limit field in the docstring and in parse_env_obs_to_ego_state() to avoid misleading round-trip verification.

Copilot · 2026-03-30T22:24:24Z

+            # obs[9] = a_lat / JERK_LAT[2]
+            a_lat = observation[:, 9] * 4.0  # JERK_LAT[2] = 4
+            # obs[5] = sim_length / MAX_VEH_LEN
+            wheelbase = observation[:, 5] * 5.5  # MAX_VEH_LEN approximate
+
+            # logits[0] for single-head multi-discrete (jerk: MultiDiscrete([12]))
+            logits_tensor = logits[0] if isinstance(logits, (tuple, list)) else logits
+            trajectory = self.rollout(logits_tensor, speed, a_long, a_lat, steering_angle, wheelbase)
+            return logits, value, new_h, new_c, trajectory


The fake-trajectory rollout derives wheelbase from observation[:, 5] using a hardcoded * 5.5, but observation[5] is normalized vehicle length (MAX_VEH_LEN=6 in C) and the environment sets wheelbase = 0.6 * length. This makes the exported trajectory inconsistent with the env dynamics. Consider computing wheelbase as 0.6 * (observation[:,5] * MAX_VEH_LEN) (or otherwise matching the env’s wheelbase logic/constants).

Copilot · 2026-03-30T22:24:25Z

+                "--output-topdown", output_video,
+            ]
+            print(f"Running: {' '.join(cmd)}")
+            subprocess.run(cmd)


--render path runs subprocess.run(cmd) without check=True, so failures in the visualize binary (missing deps, nonzero exit, etc.) won’t fail the export script and may silently produce no/partial output. Consider using check=True (and/or capturing stdout/stderr) so CI/users reliably detect render failures.

Suggested change

subprocess.run(cmd)

subprocess.run(cmd, check=True)

Copilot · 2026-03-30T22:24:25Z

+    VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
+    int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
+    PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
+    int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));
+
+    Env *env = vec->envs[env_id];
+    int num_agents = env->active_agent_count;
+    int *actions = (int *)PyArray_DATA(action_arr);
+    int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
+
+    free(env->predicted_traj_x);
+    free(env->predicted_traj_y);
+    int total = num_agents * traj_len;
+    env->predicted_traj_x = (float *)calloc(total, sizeof(float));
+    env->predicted_traj_y = (float *)calloc(total, sizeof(float));
+    env->predicted_traj_len = traj_len;


vec_set_trajectory directly casts args[0] to VecEnv* and args[2] to PyArrayObject* without using unpack_vecenv()/type checks, and it does not validate env_id bounds or that action_arr is contiguous int32 with expected length num_agents * traj_len. A malformed call could segfault or read out-of-bounds. Consider reusing unpack_vecenv(), validating 0 <= env_id < vec->num_envs, and checking PyArray_Check, dtype, contiguity, and size before accessing PyArray_DATA.

Suggested change

VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));

int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));

PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);

int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));

Env *env = vec->envs[env_id];

int num_agents = env->active_agent_count;

int *actions = (int *)PyArray_DATA(action_arr);

int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);

free(env->predicted_traj_x);

free(env->predicted_traj_y);

int total = num_agents * traj_len;

env->predicted_traj_x = (float *)calloc(total, sizeof(float));

env->predicted_traj_y = (float *)calloc(total, sizeof(float));

env->predicted_traj_len = traj_len;

// Reuse unpack_vecenv for consistency and safety.

VecEnv *vec = unpack_vecenv(args);

if (!vec) {

// unpack_vecenv is expected to set an appropriate Python exception.

return NULL;

}

PyObject *env_id_obj = PyTuple_GetItem(args, 1);

if (!env_id_obj) {

return NULL;

}

long env_id = PyLong_AsLong(env_id_obj);

if (PyErr_Occurred()) {

return NULL;

}

if (env_id < 0 || env_id >= vec->num_envs) {

PyErr_SetString(PyExc_IndexError, "env_id out of range");

return NULL;

}

PyObject *traj_obj = PyTuple_GetItem(args, 2);

if (!traj_obj) {

return NULL;

}

if (!PyArray_Check(traj_obj)) {

PyErr_SetString(PyExc_TypeError, "action_traj must be a NumPy array");

return NULL;

}

PyArrayObject *action_arr = (PyArrayObject *)traj_obj;

if (PyArray_TYPE(action_arr) != NPY_INT32) {

PyErr_SetString(PyExc_TypeError, "action_traj must have dtype int32");

return NULL;

}

if (!PyArray_ISCONTIGUOUS(action_arr)) {

PyErr_SetString(PyExc_ValueError, "action_traj must be a contiguous array");

return NULL;

}

PyObject *traj_len_obj = PyTuple_GetItem(args, 3);

if (!traj_len_obj) {

return NULL;

}

long traj_len = PyLong_AsLong(traj_len_obj);

if (PyErr_Occurred()) {

return NULL;

}

if (traj_len < 0) {

PyErr_SetString(PyExc_ValueError, "traj_len must be non-negative");

return NULL;

}

Env *env = vec->envs[env_id];

int num_agents = env->active_agent_count;

npy_intp expected_total = (npy_intp)num_agents * (npy_intp)traj_len;

if (PyArray_SIZE(action_arr) != expected_total) {

PyErr_SetString(PyExc_ValueError, "action_traj size does not match num_agents * traj_len");

return NULL;

}

int *actions = (int *)PyArray_DATA(action_arr);

int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);

free(env->predicted_traj_x);

free(env->predicted_traj_y);

int total = num_agents * (int)traj_len;

env->predicted_traj_x = (float *)calloc(total, sizeof(float));

env->predicted_traj_y = (float *)calloc(total, sizeof(float));

env->predicted_traj_len = (int)traj_len;

Copilot · 2026-03-30T22:24:25Z

    DriveNet *net = init_drivenet(weights, env.active_agent_count, env.dynamics_model, env.reward_conditioning);

    int frame_count = env.episode_length > 0 ? env.episode_length : TRAJECTORY_LENGTH_DEFAULT;
+    printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);


The unconditional printf("episode_length=%d, frame_count=%d\n", ...) adds noisy debug output to visualize runs and can clutter logs/automation. Consider removing it or gating it behind a verbose/debug flag.

Suggested change

printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);

const char *visualize_debug = getenv("PUFFER_VISUALIZE_DEBUG");

bool debug_logging = visualize_debug && visualize_debug[0] != '\0' && visualize_debug[0] != '0';

if (debug_logging) {

printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);

}

eugenevinitsky added 5 commits March 30, 2026 09:32

Render uses visualize binary, ego-only trajectory in c_render

36c7e1e

--render exports .bin weights, builds ./visualize, and runs it to produce mp4 videos. Trajectory rendering in c_render limited to ego vehicle only.

Copilot AI review requested due to automatic review settings March 30, 2026 22:16

Copilot started reviewing on behalf of eugenevinitsky March 30, 2026 22:16 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: improve ONNX outputs#382

WIP: improve ONNX outputs#382
eugenevinitsky wants to merge 5 commits into
3.0from
ev/onnx-fake-trajectory

eugenevinitsky commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    VecEnv *vec = (VecEnv *)PyLong_AsVoidPtr(PyTuple_GetItem(args, 0));
-    int env_id = PyLong_AsLong(PyTuple_GetItem(args, 1));
-    PyArrayObject *action_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
-    int traj_len = PyLong_AsLong(PyTuple_GetItem(args, 3));
-    Env *env = vec->envs[env_id];
-    int num_agents = env->active_agent_count;
-    int *actions = (int *)PyArray_DATA(action_arr);
-    int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
-    free(env->predicted_traj_x);
-    free(env->predicted_traj_y);
-    int total = num_agents * traj_len;
-    env->predicted_traj_x = (float *)calloc(total, sizeof(float));
-    env->predicted_traj_y = (float *)calloc(total, sizeof(float));
-    env->predicted_traj_len = traj_len;
+    // Reuse unpack_vecenv for consistency and safety.
+    VecEnv *vec = unpack_vecenv(args);
+    if (!vec) {
+        // unpack_vecenv is expected to set an appropriate Python exception.
+        return NULL;
+    }
+    PyObject *env_id_obj = PyTuple_GetItem(args, 1);
+    if (!env_id_obj) {
+        return NULL;
+    }
+    long env_id = PyLong_AsLong(env_id_obj);
+    if (PyErr_Occurred()) {
+        return NULL;
+    }
+    if (env_id < 0 || env_id >= vec->num_envs) {
+        PyErr_SetString(PyExc_IndexError, "env_id out of range");
+        return NULL;
+    }
+    PyObject *traj_obj = PyTuple_GetItem(args, 2);
+    if (!traj_obj) {
+        return NULL;
+    }
+    if (!PyArray_Check(traj_obj)) {
+        PyErr_SetString(PyExc_TypeError, "action_traj must be a NumPy array");
+        return NULL;
+    }
+    PyArrayObject *action_arr = (PyArrayObject *)traj_obj;
+    if (PyArray_TYPE(action_arr) != NPY_INT32) {
+        PyErr_SetString(PyExc_TypeError, "action_traj must have dtype int32");
+        return NULL;
+    }
+    if (!PyArray_ISCONTIGUOUS(action_arr)) {
+        PyErr_SetString(PyExc_ValueError, "action_traj must be a contiguous array");
+        return NULL;
+    }
+    PyObject *traj_len_obj = PyTuple_GetItem(args, 3);
+    if (!traj_len_obj) {
+        return NULL;
+    }
+    long traj_len = PyLong_AsLong(traj_len_obj);
+    if (PyErr_Occurred()) {
+        return NULL;
+    }
+    if (traj_len < 0) {
+        PyErr_SetString(PyExc_ValueError, "traj_len must be non-negative");
+        return NULL;
+    }
+    Env *env = vec->envs[env_id];
+    int num_agents = env->active_agent_count;
+    npy_intp expected_total = (npy_intp)num_agents * (npy_intp)traj_len;
+    if (PyArray_SIZE(action_arr) != expected_total) {
+        PyErr_SetString(PyExc_ValueError, "action_traj size does not match num_agents * traj_len");
+        return NULL;
+    }
+    int *actions = (int *)PyArray_DATA(action_arr);
+    int num_steer = sizeof(STEERING_VALUES) / sizeof(STEERING_VALUES[0]);
+    free(env->predicted_traj_x);
+    free(env->predicted_traj_y);
+    int total = num_agents * (int)traj_len;
+    env->predicted_traj_x = (float *)calloc(total, sizeof(float));
+    env->predicted_traj_y = (float *)calloc(total, sizeof(float));
+    env->predicted_traj_len = (int)traj_len;

-    printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);
+    const char *visualize_debug = getenv("PUFFER_VISUALIZE_DEBUG");
+    bool debug_logging = visualize_debug && visualize_debug[0] != '\0' && visualize_debug[0] != '0';
+    if (debug_logging) {
+        printf("episode_length=%d, frame_count=%d\n", env.episode_length, frame_count);
+    }

Conversation

eugenevinitsky commented Mar 30, 2026

Summary

New features

Infrastructure fixes

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants