clip speed at limit, remove overspeed reward, move advantages out of loop by eugenevinitsky · Pull Request #378 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-29T15:18:48Z

Summary

clipSpeed now takes a configurable max speed parameter
Classic dynamics clips at SPEED_LIMIT (20 m/s) instead of MAX_SPEED (100 m/s)
Jerk dynamics uses SPEED_LIMIT constant instead of hardcoded 20.0f
Remove overspeed reward penalty — speed is now physically clamped so agents can't overspeed. The penalty was ~1.0/step without dt scaling, dwarfing all other per-step rewards (~0.001-0.01).
Remove stopped agents from the metrics
Move advantage computation out of the loop

- clipSpeed now takes a maxSpeed parameter instead of using MAX_SPEED (100 m/s) - Classic dynamics clips at SPEED_LIMIT (20 m/s) - Jerk dynamics uses SPEED_LIMIT instead of hardcoded 20.0f - Remove overspeed reward penalty (was ~1.0/step, dwarfing all other rewards) Speed is now physically clamped so agents can't overspeed.

The GOAL_STOP branch used assignment (=) instead of accumulation (+=) for rewards[i], discarding all other reward components (collision, lane, velocity) computed earlier in the same step. episode_return already used +=, so it diverged from the actual training reward.

Previously required respawn_timestep != -1, which is only set after the first respawn. The agent never received a reward for reaching its initial goal. Now uses reward_goal for the first goal and reward_goal_post_respawn for subsequent goals.

Copilot

Pull request overview

This PR adjusts Drive environment dynamics and rewards so agent speed is physically clamped to the configured speed limit, and removes the now-redundant overspeed reward term.

Changes:

Update clipSpeed to accept a maxSpeed parameter and clamp classic dynamics at SPEED_LIMIT (20 m/s) instead of MAX_SPEED (100 m/s).
Replace a hardcoded jerk dynamics speed cap (20.0f) with the SPEED_LIMIT constant.
Remove the overspeed reward/penalty term in c_step since speed is clamped in dynamics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T15:22:14Z

pufferlib/ocean/drive/drive.h

        if (signed_v * v_new < 0) {
            v_new = 0.0f;
        } else {
-            v_new = clip(v_new, -2.0f, 20.0f);
+            v_new = clip(v_new, -2.0f, SPEED_LIMIT);
        }


In JERK dynamics, v_new is clipped to SPEED_LIMIT, but the subsequent displacement uses d = 0.5f * (v_new + signed_v) * env->dt. Since signed_v comes from the current state (which can be > SPEED_LIMIT right after set_start_position()), the agent can still move farther than the speed limit would allow for a step even though the stored velocity is clamped. Consider also clamping signed_v (or using a clamped signed_v in the d computation) when v_new is clipped so the per-step displacement is physically consistent with the speed limit.

…atch Previously computed after training using partially-overwritten values and last-minibatch advantages. Now captured on the first minibatch iteration when values are still pristine and advantages use ratio=1, matching SB3's approach. No extra advantage computation needed.

After GOAL_RESPAWN, current_goal_reached stayed 1, so the !current_goal_reached check at the goal reward gate always failed. Combined with bug #5, GOAL_RESPAWN agents got zero goal reward for the entire episode. Also reset score_for_current_goal.

…ved agents Stopped/removed agents can't act, so their metrics pollute logged statistics (e.g. lane center rate, comfort violations, avg speed). Now skips compute_agent_metrics and all reward/metric accumulation for these agents, zeroing their reward.

…values Previously computed after training using partially-overwritten values and last-minibatch advantages. Now computed once before the training loop using original value predictions and full-buffer advantages with ratio=1, matching SB3's approach.

GAE was recomputed every minibatch for V-trace importance weighting, but the result was immediately overwritten (dead code). Now computed once before the loop like SB3, saving total_minibatches-1 redundant GAE passes. Also removes unused self.ratio and self.importance tensors.

eugenevinitsky · 2026-03-29T16:12:18Z

wandb runs to evaluate: https://wandb.ai/emerge_/mar-29-bugfix-comparison?nw=nwusereugenevinitsky

self.values[idx] = newvalue was overwriting stored values during training, added for V-trace (now removed). This broke value function clipping (the clipping window moved with updates instead of anchoring to original predictions) and corrupted explained variance. Values are now frozen during training like SB3.

Copilot AI review requested due to automatic review settings March 29, 2026 15:18

Copilot started reviewing on behalf of eugenevinitsky March 29, 2026 15:19 View session

eugenevinitsky added 2 commits March 29, 2026 11:19

Copilot AI reviewed Mar 29, 2026

View reviewed changes

eugenevinitsky added 4 commits March 29, 2026 11:33

eugenevinitsky force-pushed the ev/bug_fixes branch from 7e22de8 to f7904da Compare March 29, 2026 15:53

eugenevinitsky changed the title ~~Bug fixes: clip speed at limit, remove overspeed reward~~ Bug fixes: clip speed at limit, remove overspeed reward, move advantages out of loop Mar 29, 2026

eugenevinitsky changed the title ~~Bug fixes: clip speed at limit, remove overspeed reward, move advantages out of loop~~ clip speed at limit, remove overspeed reward, move advantages out of loop Mar 29, 2026

Remove grid_index guard and overspeed comment

b6da5e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip speed at limit, remove overspeed reward, move advantages out of loop#378

clip speed at limit, remove overspeed reward, move advantages out of loop#378
eugenevinitsky wants to merge 10 commits into3.0from
ev/bug_fixes

eugenevinitsky commented Mar 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

eugenevinitsky commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

eugenevinitsky commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eugenevinitsky commented Mar 29, 2026 •

edited

Loading