Skip to content

WIP: merge turbostream DO NOT MERGE#384

Open
eugenevinitsky wants to merge 16 commits into2.0from
2.0_v2
Open

WIP: merge turbostream DO NOT MERGE#384
eugenevinitsky wants to merge 16 commits into2.0from
2.0_v2

Conversation

@eugenevinitsky
Copy link
Copy Markdown

Summary

  • Merges turbostream (vcha/turbostream) features into 2.0
  • Traffic lights, stop signs, route/waypoint system
  • Updated observation space, transition-level PPO, symlog rewards
  • New Carla map binaries, MLOps tooling, notebooks, eval scripts
  • submit_cluster.py from 3.0
  • Maps will need rebuilding for production use

Status

  • Builds locally (macOS)
  • Builds on cluster (Linux + CUDA)
  • Training runs with 4 workers / 1 map
  • Training runs at full scale (16 workers / 8 maps) — testing
  • Rendering pipeline verified

🤖 Generated with Claude Code

eugenevinitsky and others added 6 commits March 30, 2026 20:55
Brings in turbostream features: traffic lights, stop signs, route/waypoint
system, updated observation space, transition-level PPO, symlog rewards,
new Carla map binaries, MLOps tooling, notebooks, eval scripts, and
submit_cluster.py from 3.0. Maps will need rebuilding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 31, 2026 02:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Merges the turbostream drivability/benchmarking work into the 2.0 codebase, updating the Drive environment interfaces, evaluation tooling, and MLOps workflow to support multi-scenario eval, richer observations, and new rendering paths.

Changes:

  • Refactors PufferDrive env bindings/config and expands observation/reward/control features (traffic lights, stop signs, waypoint routing, reward conditioning).
  • Updates training/eval utilities (WOSAC + human replay subprocess eval, deterministic sampling option, rendering video pipeline).
  • Adds MLOps tooling + Docker images + GitHub workflows and debugging notebooks.

Reviewed changes

Copilot reviewed 51 out of 78 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pufferlib/vector.py Adjusts seeding, env lifecycle, and adds get_state for Serial
pufferlib/utils.py Adds human replay eval subprocess + C-rendered video logging
pufferlib/sweep.py Adds categorical sweep parameter space
pufferlib/spaces.py Drops gym spaces support; uses gymnasium spaces directly
pufferlib/pytorch.py Adds deterministic sampling path for eval
pufferlib/ocean/torch.py Major Drive policy/backbone refactor (new obs layout, optional split actor/critic)
pufferlib/ocean/environment.py Simplifies ocean env factory mapping to drive-only
pufferlib/ocean/env_config.h Extends INI parsing/config struct for new features + helper loader
pufferlib/ocean/env_binding.h Updates Python/C binding API (masks, render, log/get, trajectory APIs)
pufferlib/ocean/drive/visualize.c Adds standalone headless visualizer that records mp4 via ffmpeg
pufferlib/ocean/drive/drivenet.h Updates C inference net constants/layout for new obs sizes
pufferlib/ocean/drive/drive.c Refactors demo/perf test + loads INI via shared loader
pufferlib/ocean/drive/datatypes.h Adds shared datatypes/constants for rewards/metrics/map/traffic controls
pufferlib/ocean/drive/README.md Documents Drive env behavior and metric aggregation
pufferlib/ocean/benchmark/wosac.ini Updates WOSAC weights/config to 2025 challenge
pufferlib/ocean/benchmark/visual_sanity_check.py Updates WOSAC sanity script env setup
pufferlib/ocean/benchmark/metrics_sanity_check.py Uses policy-based trajectory collection instead of random baseline
pufferlib/ocean/benchmark/metrics.py Renames internal variable for clarity (scenario_length)
pufferlib/ocean/benchmark/README.md Adds WOSAC benchmark documentation
pufferlib/config/ocean/drive.ini Updates env/train/eval/policy defaults for turbostream merge
pufferlib/config/default.ini Updates default optimizer/determinism + BPTT/ppo granularity params
pufferlib/init.py Removes resources symlink creation at import time
notebooks/01_observations.ipynb Adds observation pipeline debug notebook
notebooks/02_rewards.ipynb Adds reward signals debug notebook
notebooks/03_metrics.ipynb Adds episode metrics/logging debug notebook
mlops/run_training.sh Adds GPU preflight + background GCS sync + torchrun entrypoint
mlops/dockerfile/launch.dockerfile Adds training image Dockerfile that compiles extensions
mlops/dockerfile/build.dockerfile Adds base CUDA/Python deps image build
mlops/dockerfile/.dockerignore Adds dockerignore optimized for training builds
mlops/configs/smoke-tests.yaml Adds CI/merge smoke-test training config matrix
mlops/configs/experiments.yaml Adds experiment/run batch launch config template
mlops/README.md Documents Vertex AI training workflow and CLI usage
data_utils/carla/process_carla_roads.py Adds Carla OpenDRIVE → JSON road extraction utility
README.md Rewrites top-level docs for new train/eval flows + config overview
.pre-commit-config.yaml Updates clang-format hook config
.github/workflows/utest.yml Retargets unit test workflow to main branch
.github/workflows/training-test.yml Adds post-merge training smoke test workflow
.github/workflows/train-ci.yml Retargets training CI workflow to main branch
.github/workflows/render-ci.yml Adds headless render CI test
.github/workflows/pre-commit.yml Adds pre-commit GitHub Actions workflow
.github/workflows/perf-ci.yml Retargets perf CI workflow to main branch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +820 to +821
if (PyTuple_Size(args) != 7) {
PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments");
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.

Suggested change
if (PyTuple_Size(args) != 7) {
PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments");
if (PyTuple_Size(args) != 8) {
PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 8 arguments");

Copilot uses AI. Check for mistakes.
PyObject *is_vehicle_arr = PyTuple_GetItem(args, 7);
PyObject *is_track_to_predict_arr = PyTuple_GetItem(args, 8);
PyObject *scenario_id_arr = PyTuple_GetItem(args, 9);
PyObject *scenario_id_arr = PyTuple_GetItem(args, 7);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.

Copilot uses AI. Check for mistakes.
Env *env = vec->envs[i];
for (int j = 0; j < num_keys; j++) {
((float *)&env->log)[j] = 0.0f;
if (env->log.n == 0) {
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this early-return path, list is leaked (and dict is returned without cleaning up the unused list). This will accumulate reference leaks during eval-mode polling. A concrete fix is to Py_DECREF(list) before returning, and also ensure any pre-allocated objects in this branch are either used or decref'd consistently.

Suggested change
if (env->log.n == 0) {
if (env->log.n == 0) {
Py_DECREF(list);

Copilot uses AI. Check for mistakes.
Comment on lines +191 to +199
try:
# Create output directory for videos
video_output_dir = os.path.join(model_dir, "videos")
os.makedirs(video_output_dir, exist_ok=True)

# Copy the binary weights to the expected location
expected_weights_path = "resources/drive/puffer_drive_weights.bin"
os.makedirs(os.path.dirname(expected_weights_path), exist_ok=True)
shutil.copy2(bin_path, expected_weights_path)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +323
finally:
# Clean up bin weights file
if os.path.exists(expected_weights_path):
os.remove(expected_weights_path)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +101
if (IsKeyDown(KEY_LEFT) || IsKeyDown(KEY_A)) {
actions[env.human_agent_idx][1] += steer_delta;
// Cap steering to minimum of 0
if (actions[env.human_agent_idx][1] < 0) {
actions[env.human_agent_idx][1] = 0;
}
if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) {
jerk_lat_idx = 0; // right turn (-4.0)
}
if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) {
actions[env.human_agent_idx][1] -= steer_delta;
// Cap steering to maximum of 12
if (actions[env.human_agent_idx][1] > 12) {
actions[env.human_agent_idx][1] = 12;
}

// Encode into single integer: action = jerk_long_idx * 3 + jerk_lat_idx
actions[env.human_agent_idx] = jerk_long_idx * 3 + jerk_lat_idx;
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steering bounds checks are inverted relative to the operations: after adding steer_delta (left), the value can exceed the max and should be capped at 12, not checked for < 0; after subtracting (right), the value can go below 0 and should be capped at 0, not checked for > 12. As written, out-of-range steering indices can be produced.

Copilot uses AI. Check for mistakes.
)

def get_state(self):
return [env.get_state() for env in self.envs][0]
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds a full list of per-env states but only returns the first element. If the intention is to expose the driver/first env state, avoid the list allocation and call self.envs[0].get_state() directly.

Suggested change
return [env.get_state() for env in self.envs][0]
return self.envs[0].get_state()

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,108 @@
# PufferDrive

This readme contains several important assumptions and definions about the `PufferDrive` environment.
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo: 'definions' → 'definitions'.

Suggested change
This readme contains several important assumptions and definions about the `PufferDrive` environment.
This readme contains several important assumptions and definitions about the `PufferDrive` environment.

Copilot uses AI. Check for mistakes.
@@ -26,4 +26,3 @@ repos:
rev: 'v21.1.5'
hooks:
- id: clang-format
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior exclude pattern for clang-format was removed, which means clang-format will now run on file types/paths that were previously excluded (e.g., large generated/build or non-C/C++ directories). If those directories still exist in the repo, reinstating an exclude (or narrowing files:) will avoid unnecessary work and reduce the chance of formatting unintended assets.

Suggested change
- id: clang-format
- id: clang-format
files: \.(c|cc|cpp|cxx|h|hh|hpp|hxx)$

Copilot uses AI. Check for mistakes.
@eugenevinitsky eugenevinitsky changed the title WIP: merge turbostream WIP: merge turbostream DO NOT MERGE Mar 31, 2026
eugenevinitsky and others added 10 commits March 30, 2026 22:49
Was always 0, making avg_distance_per_infraction always 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes render_videos call failing with 'module has no attribute utils'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ormat

When render_map is not set, discovers .bin files from the env's map_dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…put paths

- Pass bin_path directly to visualize via --policy-name instead of copying
  to a relative symlinked path that doesn't resolve in code isolation
- Use absolute paths for output mp4s in the video output dir
- Print stderr on render failure for debugging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…de to drivenet.h

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default num_controllable_agents to 32 instead of -1 (caused calloc
  with huge size, producing 0 agents)
- Pass simulation_mode, num_target_waypoints, min/max_waypoint_spacing,
  and traffic_light_behavior from config to Drive struct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants