Conversation
Brings in turbostream features: traffic lights, stop signs, route/waypoint system, updated observation space, transition-level PPO, symlog rewards, new Carla map binaries, MLOps tooling, notebooks, eval scripts, and submit_cluster.py from 3.0. Maps will need rebuilding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Merges the turbostream drivability/benchmarking work into the 2.0 codebase, updating the Drive environment interfaces, evaluation tooling, and MLOps workflow to support multi-scenario eval, richer observations, and new rendering paths.
Changes:
- Refactors PufferDrive env bindings/config and expands observation/reward/control features (traffic lights, stop signs, waypoint routing, reward conditioning).
- Updates training/eval utilities (WOSAC + human replay subprocess eval, deterministic sampling option, rendering video pipeline).
- Adds MLOps tooling + Docker images + GitHub workflows and debugging notebooks.
Reviewed changes
Copilot reviewed 51 out of 78 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| pufferlib/vector.py | Adjusts seeding, env lifecycle, and adds get_state for Serial |
| pufferlib/utils.py | Adds human replay eval subprocess + C-rendered video logging |
| pufferlib/sweep.py | Adds categorical sweep parameter space |
| pufferlib/spaces.py | Drops gym spaces support; uses gymnasium spaces directly |
| pufferlib/pytorch.py | Adds deterministic sampling path for eval |
| pufferlib/ocean/torch.py | Major Drive policy/backbone refactor (new obs layout, optional split actor/critic) |
| pufferlib/ocean/environment.py | Simplifies ocean env factory mapping to drive-only |
| pufferlib/ocean/env_config.h | Extends INI parsing/config struct for new features + helper loader |
| pufferlib/ocean/env_binding.h | Updates Python/C binding API (masks, render, log/get, trajectory APIs) |
| pufferlib/ocean/drive/visualize.c | Adds standalone headless visualizer that records mp4 via ffmpeg |
| pufferlib/ocean/drive/drivenet.h | Updates C inference net constants/layout for new obs sizes |
| pufferlib/ocean/drive/drive.c | Refactors demo/perf test + loads INI via shared loader |
| pufferlib/ocean/drive/datatypes.h | Adds shared datatypes/constants for rewards/metrics/map/traffic controls |
| pufferlib/ocean/drive/README.md | Documents Drive env behavior and metric aggregation |
| pufferlib/ocean/benchmark/wosac.ini | Updates WOSAC weights/config to 2025 challenge |
| pufferlib/ocean/benchmark/visual_sanity_check.py | Updates WOSAC sanity script env setup |
| pufferlib/ocean/benchmark/metrics_sanity_check.py | Uses policy-based trajectory collection instead of random baseline |
| pufferlib/ocean/benchmark/metrics.py | Renames internal variable for clarity (scenario_length) |
| pufferlib/ocean/benchmark/README.md | Adds WOSAC benchmark documentation |
| pufferlib/config/ocean/drive.ini | Updates env/train/eval/policy defaults for turbostream merge |
| pufferlib/config/default.ini | Updates default optimizer/determinism + BPTT/ppo granularity params |
| pufferlib/init.py | Removes resources symlink creation at import time |
| notebooks/01_observations.ipynb | Adds observation pipeline debug notebook |
| notebooks/02_rewards.ipynb | Adds reward signals debug notebook |
| notebooks/03_metrics.ipynb | Adds episode metrics/logging debug notebook |
| mlops/run_training.sh | Adds GPU preflight + background GCS sync + torchrun entrypoint |
| mlops/dockerfile/launch.dockerfile | Adds training image Dockerfile that compiles extensions |
| mlops/dockerfile/build.dockerfile | Adds base CUDA/Python deps image build |
| mlops/dockerfile/.dockerignore | Adds dockerignore optimized for training builds |
| mlops/configs/smoke-tests.yaml | Adds CI/merge smoke-test training config matrix |
| mlops/configs/experiments.yaml | Adds experiment/run batch launch config template |
| mlops/README.md | Documents Vertex AI training workflow and CLI usage |
| data_utils/carla/process_carla_roads.py | Adds Carla OpenDRIVE → JSON road extraction utility |
| README.md | Rewrites top-level docs for new train/eval flows + config overview |
| .pre-commit-config.yaml | Updates clang-format hook config |
| .github/workflows/utest.yml | Retargets unit test workflow to main branch |
| .github/workflows/training-test.yml | Adds post-merge training smoke test workflow |
| .github/workflows/train-ci.yml | Retargets training CI workflow to main branch |
| .github/workflows/render-ci.yml | Adds headless render CI test |
| .github/workflows/pre-commit.yml | Adds pre-commit GitHub Actions workflow |
| .github/workflows/perf-ci.yml | Retargets perf CI workflow to main branch |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (PyTuple_Size(args) != 7) { | ||
| PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments"); |
There was a problem hiding this comment.
The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.
| if (PyTuple_Size(args) != 7) { | |
| PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 7 arguments"); | |
| if (PyTuple_Size(args) != 8) { | |
| PyErr_SetString(PyExc_TypeError, "get_ground_truth_trajectories requires 8 arguments"); |
| PyObject *is_vehicle_arr = PyTuple_GetItem(args, 7); | ||
| PyObject *is_track_to_predict_arr = PyTuple_GetItem(args, 8); | ||
| PyObject *scenario_id_arr = PyTuple_GetItem(args, 9); | ||
| PyObject *scenario_id_arr = PyTuple_GetItem(args, 7); |
There was a problem hiding this comment.
The argument count check and indexing are inconsistent: a tuple of size 7 has valid indices 0..6, but the code unconditionally reads item 7. This can cause an out-of-bounds access and segfault. Fix by requiring 8 arguments (and keeping index 7), or by reading scenario_id_arr from index 6 and updating the rest of the expected signature accordingly.
| Env *env = vec->envs[i]; | ||
| for (int j = 0; j < num_keys; j++) { | ||
| ((float *)&env->log)[j] = 0.0f; | ||
| if (env->log.n == 0) { |
There was a problem hiding this comment.
In this early-return path, list is leaked (and dict is returned without cleaning up the unused list). This will accumulate reference leaks during eval-mode polling. A concrete fix is to Py_DECREF(list) before returning, and also ensure any pre-allocated objects in this branch are either used or decref'd consistently.
| if (env->log.n == 0) { | |
| if (env->log.n == 0) { | |
| Py_DECREF(list); |
pufferlib/utils.py
Outdated
| try: | ||
| # Create output directory for videos | ||
| video_output_dir = os.path.join(model_dir, "videos") | ||
| os.makedirs(video_output_dir, exist_ok=True) | ||
|
|
||
| # Copy the binary weights to the expected location | ||
| expected_weights_path = "resources/drive/puffer_drive_weights.bin" | ||
| os.makedirs(os.path.dirname(expected_weights_path), exist_ok=True) | ||
| shutil.copy2(bin_path, expected_weights_path) |
There was a problem hiding this comment.
expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.
pufferlib/utils.py
Outdated
| finally: | ||
| # Clean up bin weights file | ||
| if os.path.exists(expected_weights_path): | ||
| os.remove(expected_weights_path) |
There was a problem hiding this comment.
expected_weights_path is defined inside the try block but referenced in finally. If an exception occurs before assignment (e.g., in mkdir/model_dir logic), finally will raise UnboundLocalError and mask the original failure. Define expected_weights_path before the try (e.g., set it to the intended path or None) and guard the cleanup accordingly.
| if (IsKeyDown(KEY_LEFT) || IsKeyDown(KEY_A)) { | ||
| actions[env.human_agent_idx][1] += steer_delta; | ||
| // Cap steering to minimum of 0 | ||
| if (actions[env.human_agent_idx][1] < 0) { | ||
| actions[env.human_agent_idx][1] = 0; | ||
| } | ||
| if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) { | ||
| jerk_lat_idx = 0; // right turn (-4.0) | ||
| } | ||
| if (IsKeyDown(KEY_RIGHT) || IsKeyDown(KEY_D)) { | ||
| actions[env.human_agent_idx][1] -= steer_delta; | ||
| // Cap steering to maximum of 12 | ||
| if (actions[env.human_agent_idx][1] > 12) { | ||
| actions[env.human_agent_idx][1] = 12; | ||
| } | ||
|
|
||
| // Encode into single integer: action = jerk_long_idx * 3 + jerk_lat_idx | ||
| actions[env.human_agent_idx] = jerk_long_idx * 3 + jerk_lat_idx; | ||
| } |
There was a problem hiding this comment.
The steering bounds checks are inverted relative to the operations: after adding steer_delta (left), the value can exceed the max and should be capped at 12, not checked for < 0; after subtracting (right), the value can go below 0 and should be capped at 0, not checked for > 12. As written, out-of-range steering indices can be produced.
| ) | ||
|
|
||
| def get_state(self): | ||
| return [env.get_state() for env in self.envs][0] |
There was a problem hiding this comment.
This builds a full list of per-env states but only returns the first element. If the intention is to expose the driver/first env state, avoid the list allocation and call self.envs[0].get_state() directly.
| return [env.get_state() for env in self.envs][0] | |
| return self.envs[0].get_state() |
| @@ -0,0 +1,108 @@ | |||
| # PufferDrive | |||
|
|
|||
| This readme contains several important assumptions and definions about the `PufferDrive` environment. | |||
There was a problem hiding this comment.
Fix typo: 'definions' → 'definitions'.
| This readme contains several important assumptions and definions about the `PufferDrive` environment. | |
| This readme contains several important assumptions and definitions about the `PufferDrive` environment. |
| @@ -26,4 +26,3 @@ repos: | |||
| rev: 'v21.1.5' | |||
| hooks: | |||
| - id: clang-format | |||
There was a problem hiding this comment.
The prior exclude pattern for clang-format was removed, which means clang-format will now run on file types/paths that were previously excluded (e.g., large generated/build or non-C/C++ directories). If those directories still exist in the repo, reinstating an exclude (or narrowing files:) will avoid unnecessary work and reduce the chance of formatting unintended assets.
| - id: clang-format | |
| - id: clang-format | |
| files: \.(c|cc|cpp|cxx|h|hh|hpp|hxx)$ |
Was always 0, making avg_distance_per_infraction always 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes render_videos call failing with 'module has no attribute utils'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ormat When render_map is not set, discovers .bin files from the env's map_dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…put paths - Pass bin_path directly to visualize via --policy-name instead of copying to a relative symlinked path that doesn't resolve in code isolation - Use absolute paths for output mp4s in the video output dir - Print stderr on render failure for debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…de to drivenet.h Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default num_controllable_agents to 32 instead of -1 (caused calloc with huge size, producing 0 agents) - Pass simulation_mode, num_target_waypoints, min/max_waypoint_spacing, and traffic_light_behavior from config to Drive struct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
vcha/turbostream) features into 2.0submit_cluster.pyfrom 3.0Status
🤖 Generated with Claude Code