WIP: merge turbostream DO NOT MERGE by eugenevinitsky · Pull Request #384 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-31T02:02:11Z

Summary

Merges turbostream (vcha/turbostream) features into 2.0
Traffic lights, stop signs, route/waypoint system
Updated observation space, transition-level PPO, symlog rewards
New Carla map binaries, MLOps tooling, notebooks, eval scripts
submit_cluster.py from 3.0
Maps will need rebuilding for production use

Status

Builds locally (macOS)
Builds on cluster (Linux + CUDA)
Training runs with 4 workers / 1 map
Training runs at full scale (16 workers / 8 maps) — testing
Rendering pipeline verified

🤖 Generated with Claude Code

Brings in turbostream features: traffic lights, stop signs, route/waypoint system, updated observation space, transition-level PPO, symlog rewards, new Carla map binaries, MLOps tooling, notebooks, eval scripts, and submit_cluster.py from 3.0. Maps will need rebuilding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Merges the turbostream drivability/benchmarking work into the 2.0 codebase, updating the Drive environment interfaces, evaluation tooling, and MLOps workflow to support multi-scenario eval, richer observations, and new rendering paths.

Changes:

Refactors PufferDrive env bindings/config and expands observation/reward/control features (traffic lights, stop signs, waypoint routing, reward conditioning).
Updates training/eval utilities (WOSAC + human replay subprocess eval, deterministic sampling option, rendering video pipeline).
Adds MLOps tooling + Docker images + GitHub workflows and debugging notebooks.

Reviewed changes

Copilot reviewed 51 out of 78 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
pufferlib/vector.py	Adjusts seeding, env lifecycle, and adds get_state for Serial
pufferlib/utils.py	Adds human replay eval subprocess + C-rendered video logging
pufferlib/sweep.py	Adds categorical sweep parameter space
pufferlib/spaces.py	Drops gym spaces support; uses gymnasium spaces directly
pufferlib/pytorch.py	Adds deterministic sampling path for eval
pufferlib/ocean/torch.py	Major Drive policy/backbone refactor (new obs layout, optional split actor/critic)
pufferlib/ocean/environment.py	Simplifies ocean env factory mapping to drive-only
pufferlib/ocean/env_config.h	Extends INI parsing/config struct for new features + helper loader
pufferlib/ocean/env_binding.h	Updates Python/C binding API (masks, render, log/get, trajectory APIs)
pufferlib/ocean/drive/visualize.c	Adds standalone headless visualizer that records mp4 via ffmpeg
pufferlib/ocean/drive/drivenet.h	Updates C inference net constants/layout for new obs sizes
pufferlib/ocean/drive/drive.c	Refactors demo/perf test + loads INI via shared loader
pufferlib/ocean/drive/datatypes.h	Adds shared datatypes/constants for rewards/metrics/map/traffic controls
pufferlib/ocean/drive/README.md	Documents Drive env behavior and metric aggregation
pufferlib/ocean/benchmark/wosac.ini	Updates WOSAC weights/config to 2025 challenge
pufferlib/ocean/benchmark/visual_sanity_check.py	Updates WOSAC sanity script env setup
pufferlib/ocean/benchmark/metrics_sanity_check.py	Uses policy-based trajectory collection instead of random baseline
pufferlib/ocean/benchmark/metrics.py	Renames internal variable for clarity (scenario_length)
pufferlib/ocean/benchmark/README.md	Adds WOSAC benchmark documentation
pufferlib/config/ocean/drive.ini	Updates env/train/eval/policy defaults for turbostream merge
pufferlib/config/default.ini	Updates default optimizer/determinism + BPTT/ppo granularity params
pufferlib/init.py	Removes resources symlink creation at import time
notebooks/01_observations.ipynb	Adds observation pipeline debug notebook
notebooks/02_rewards.ipynb	Adds reward signals debug notebook
notebooks/03_metrics.ipynb	Adds episode metrics/logging debug notebook
mlops/run_training.sh	Adds GPU preflight + background GCS sync + torchrun entrypoint
mlops/dockerfile/launch.dockerfile	Adds training image Dockerfile that compiles extensions
mlops/dockerfile/build.dockerfile	Adds base CUDA/Python deps image build
mlops/dockerfile/.dockerignore	Adds dockerignore optimized for training builds
mlops/configs/smoke-tests.yaml	Adds CI/merge smoke-test training config matrix
mlops/configs/experiments.yaml	Adds experiment/run batch launch config template
mlops/README.md	Documents Vertex AI training workflow and CLI usage
data_utils/carla/process_carla_roads.py	Adds Carla OpenDRIVE → JSON road extraction utility
README.md	Rewrites top-level docs for new train/eval flows + config overview
.pre-commit-config.yaml	Updates clang-format hook config
.github/workflows/utest.yml	Retargets unit test workflow to main branch
.github/workflows/training-test.yml	Adds post-merge training smoke test workflow
.github/workflows/train-ci.yml	Retargets training CI workflow to main branch
.github/workflows/render-ci.yml	Adds headless render CI test
.github/workflows/pre-commit.yml	Adds pre-commit GitHub Actions workflow
.github/workflows/perf-ci.yml	Retargets perf CI workflow to main branch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T02:04:25Z

+    if (PyTuple_Size(args) != 7) {
+        PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments");


The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.

Suggested change

if (PyTuple_Size(args) != 7) {

PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments");

if (PyTuple_Size(args) != 8) {

PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 8 arguments");

Copilot · 2026-03-31T02:04:25Z

-    PyObject *is_vehicle_arr = PyTuple_GetItem(args, 7);
-    PyObject *is_track_to_predict_arr = PyTuple_GetItem(args, 8);
-    PyObject *scenario_id_arr = PyTuple_GetItem(args, 9);
+    PyObject *scenario_id_arr = PyTuple_GetItem(args, 7);


The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.

Copilot · 2026-03-31T02:04:26Z

-        Env *env = vec->envs[i];
-        for (int j = 0; j < num_keys; j++) {
-            ((float *)&env->log)[j] = 0.0f;
+        if (env->log.n == 0) {


In this early-return path, list is leaked (and dict is returned without cleaning up the unused list). This will accumulate reference leaks during eval-mode polling. A concrete fix is to Py_DECREF(list) before returning, and also ensure any pre-allocated objects in this branch are either used or decref'd consistently.

Suggested change

if (env->log.n == 0) {

if (env->log.n == 0) {

Py_DECREF(list);

Copilot · 2026-03-31T02:04:26Z

+    try:
+        # Create output directory for videos
+        video_output_dir = os.path.join(model_dir, "videos")
+        os.makedirs(video_output_dir, exist_ok=True)
+
+        # Copy the binary weights to the expected location
+        expected_weights_path = "resources/drive/puffer_drive_weights.bin"
+        os.makedirs(os.path.dirname(expected_weights_path), exist_ok=True)
+        shutil.copy2(bin_path, expected_weights_path)


expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.

Copilot · 2026-03-31T02:04:26Z

+    finally:
+        # Clean up bin weights file
+        if os.path.exists(expected_weights_path):
+            os.remove(expected_weights_path)


expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.

Copilot · 2026-03-31T02:04:26Z

+            if (IsKeyDown(KEY_LEFT) || IsKeyDown(KEY_A)) {
+                actions[env.human_agent_idx][1] += steer_delta;
+                // Cap steering to minimum of 0
+                if (actions[env.human_agent_idx][1] < 0) {
+                    actions[env.human_agent_idx][1] = 0;
                }
-                if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) {
-                    jerk_lat_idx = 0; // right turn (-4.0)
+            }
+            if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) {
+                actions[env.human_agent_idx][1] -= steer_delta;
+                // Cap steering to maximum of 12
+                if (actions[env.human_agent_idx][1] > 12) {
+                    actions[env.human_agent_idx][1] = 12;
                }
-
-                // Encode into single integer: action = jerk_long_idx * 3 + jerk_lat_idx
-                actions[env.human_agent_idx] = jerk_long_idx * 3 + jerk_lat_idx;
+            }


The steering bounds checks are inverted relative to the operations: after adding steer_delta (left), the value can exceed the max and should be capped at 12, not checked for < 0; after subtracting (right), the value can go below 0 and should be capped at 0, not checked for > 12. As written, out-of-range steering indices can be produced.

Copilot · 2026-03-31T02:04:27Z

        )

+    def get_state(self):
+        return [env.get_state() for env in self.envs][0]


This builds a full list of per-env states but only returns the first element. If the intention is to expose the driver/first env state, avoid the list allocation and call self.envs[0].get_state() directly.

Suggested change

return [env.get_state() for env in self.envs][0]

return self.envs[0].get_state()

Copilot · 2026-03-31T02:04:27Z

@@ -0,0 +1,108 @@
+# PufferDrive
+
+This readme contains several important assumptions and definions about the `PufferDrive` environment.


Fix typo: 'definions' → 'definitions'.

Suggested change

This readme contains several important assumptions and definions about the `PufferDrive` environment.

This readme contains several important assumptions and definitions about the `PufferDrive` environment.

Copilot · 2026-03-31T02:04:27Z

@@ -26,4 +26,3 @@ repos:
  rev: 'v21.1.5'
  hooks:
  -   id: clang-format


The prior exclude pattern for clang-format was removed, which means clang-format will now run on file types/paths that were previously excluded (e.g., large generated/build or non-C/C++ directories). If those directories still exist in the repo, reinstating an exclude (or narrowing files:) will avoid unnecessary work and reduce the chance of formatting unintended assets.

Suggested change

- id: clang-format

- id: clang-format

files: \.(c|cc|cpp|cxx|h|hh|hpp|hxx)$

Was always 0, making avg_distance_per_infraction always 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes render_videos call failing with 'module has no attribute utils'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ormat When render_map is not set, discovers .bin files from the env's map_dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…put paths - Pass bin_path directly to visualize via --policy-name instead of copying to a relative symlinked path that doesn't resolve in code isolation - Use absolute paths for output mp4s in the video output dir - Print stderr on render failure for debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…de to drivenet.h Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Default num_controllable_agents to 32 instead of -1 (caused calloc with huge size, producing 0 agents) - Pass simulation_mode, num_target_waypoints, min/max_waypoint_spacing, and traffic_light_behavior from config to Drive struct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Symlog compresses large penalties (e.g. collision -3.5 → -1.5), weakening the learning signal for collision avoidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- New drivenet.h matching PyTorch gigaflow encoder architecture (6 encoders + 4-layer backbone, no LSTM) - Fix double-rotation bug in mode-0 wireframe (heading applied twice) - Add traffic light and road edge rendering to 3D view - Color-code collision types (orange=vehicle, red=offroad, yellow=red light) - Store render_corners on Agent struct from compute_metrics - Add debug config printing to visualize.c - Add render_maps.py script for static map visualization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ror) ini_parse treats return 0 as parse error. The handler was returning 0 for non-[env] sections like [base], [train], causing all config values to be zero. This made observations 18 floats instead of 874, causing heap-buffer-overflow in drivenet forward pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

120s was too short for rendering 1024 frames with 32 agents, causing truncated videos (moov atom not found). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Visualize can exit non-zero (ASAN, signals) but still produce valid videos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eugenevinitsky and others added 6 commits March 30, 2026 20:55

Add cluster configs from 3.0, update map_dir for turbostream

99d6d47

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add MPLCONFIGDIR to scratch cache exports in submit_cluster.py

e03c848

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add --wandb-name argument support to pufferl.py

b0d6294

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add visualize.c from turbostream (was missed in initial merge)

db84939

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix visualize.c struct fields to match turbostream drive.h

bbd8d77

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 31, 2026 02:02

Copilot AI reviewed Mar 31, 2026

View reviewed changes

eugenevinitsky changed the title ~~WIP: merge turbostream~~ WIP: merge turbostream DO NOT MERGE Mar 31, 2026

eugenevinitsky and others added 18 commits March 30, 2026 22:49

Fix distance_since_spawn: accumulate sim_speed * dt each step

fc54f7c

Was always 0, making avg_distance_per_infraction always 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add missing import pufferlib.utils in pufferl.py

6d4cd71

Fixes render_videos call failing with 'module has no attribute utils'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Auto-discover map files for rendering instead of using map_%03d.bin f…

4806ad5

…ormat When render_map is not set, discovers .bin files from the env's map_dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Point render_map at actual Carla Town01 binary

91aa2c1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix formatting (ruff) and relax matplotlib pin

41cba0f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restrict clang-format hook to C/C++ files only

b1e529d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix stale expected_weights_path ref, add visualize stderr logging

dbb7c43

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix visualize build: add include guards to drive.h, add drive.h inclu…

51f9722

…de to drivenet.h Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Make symlog reward transform optional, off by default

6d99aa8

Symlog compresses large penalties (e.g. collision -3.5 → -1.5), weakening the learning signal for collision avoidance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Warn on unknown keys in [env] section, skip other sections silently

897997b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Build visualize with fast mode (no ASAN), fix ini parser for non-env …

5104a9d

…sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Increase render timeout from 120s to 600s

6e40e7b

120s was too short for rendering 1024 frames with 32 agents, causing truncated videos (moov atom not found). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Check if video files exist regardless of exit code for wandb logging

e84c9a6

Visualize can exit non-zero (ASAN, signals) but still produce valid videos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Always log visualize exit code and output on non-zero return

c3f3082

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eugenevinitsky closed this May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: merge turbostream DO NOT MERGE#384

WIP: merge turbostream DO NOT MERGE#384
eugenevinitsky wants to merge 24 commits into
2.0from
2.0_v2

eugenevinitsky commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (PyTuple_Size(args) != 7) {
		PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments");

	if (env->log.n == 0) {
	if (env->log.n == 0) {
	Py_DECREF(list);

	return [env.get_state() for env in self.envs][0]
	return self.envs[0].get_state()

		@@ -0,0 +1,108 @@
		# PufferDrive

		This readme contains several important assumptions and definions about the `PufferDrive` environment.

	- id: clang-format
	- id: clang-format
	files: \.(c\|cc\|cpp\|cxx\|h\|hh\|hpp\|hxx)$

Conversation

eugenevinitsky commented Mar 31, 2026

Summary

Status

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants