Randomize agent positions on respawn in variable agent mode by eugenevinitsky · Pull Request #376 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-29T03:04:07Z

Summary

Previously agents always reset to their initial spawn position. Now in INIT_VARIABLE_AGENT_NUMBER mode, both mid-episode respawns and full episode resets pick a new random collision-free position on a drivable lane via length-weighted sampling.

Changes (all in drive.h)

Add randomize_agent_position() function: picks a random drivable lane point, checks for collisions, sets heading/velocity from lane geometry
Call it from respawn_agent() and c_reset() when in variable agent mode
Sample new goals after randomizing position (not before)
Don't overwrite sampled goals with stale init_goal in variable agent mode
Fix collision check and velocity restoration in respawn

Previously agents always reset to their initial spawn position. Now in INIT_VARIABLE_AGENT_NUMBER mode, both mid-episode respawns and full episode resets pick a new random collision-free position on a drivable lane via length-weighted sampling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After moving an agent to a new random position, must also sample a new goal relative to that position. Previously reset_goal_positions would restore the original init goal, which could be far from the new spawn. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…gent mode c_reset's GOAL_GENERATE_NEW block was restoring init_goal_x/y/z after sample_new_goal had already set fresh goals relative to the new position. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use proper OBB collision check (check_spawn_collision) instead of rough distance approximation in randomize_agent_position - Restore original log_velocity on respawn for non-variable-agent modes instead of zeroing it (preserves data-driven replay behavior) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Updates the Ocean Drive simulator’s variable-agent initialization mode so agents no longer always reset/respawn to their original spawn point, instead selecting a new random collision-free lane position and then sampling goals from that new pose.

Changes:

Add randomize_agent_position() to sample a length-weighted random lane point and validate collision/offroad.
Use randomized positions on respawn_agent() and c_reset() when init_mode == INIT_VARIABLE_AGENT_NUMBER.
Prevent variable-agent resets from overwriting newly-sampled goals with stale init_goal_*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-29T03:08:25Z

+    // Pre-compute drivable lanes
+    int drivable_lanes[env->num_roads];
+    float lane_lengths[env->num_roads];
+    int num_drivable = 0;
+    float total_lane_length = 0.0f;
+    for (int i = 0; i < env->num_roads; i++) {
+        if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) {
+            drivable_lanes[num_drivable] = i;
+            lane_lengths[num_drivable] = env->road_elements[i].polyline_length;
+            total_lane_length += lane_lengths[num_drivable];
+            num_drivable++;
+        }
+    }
+
+    if (num_drivable == 0)


randomize_agent_position allocates drivable_lanes/lane_lengths as VLAs sized by env->num_roads on the stack and recomputes drivable lanes each call. Since env->num_roads can be large and this function may run frequently (respawns/resets), this can cause excessive stack usage and unnecessary work. Prefer reusing env->num_drivable, env->drivable_lane_indices, env->drivable_lane_lengths, and env->total_drivable_lane_length computed in compute_drivable_lane_points() (or allocate once on the heap) instead of per-call stack arrays.

Suggested change

// Pre-compute drivable lanes

int drivable_lanes[env->num_roads];

float lane_lengths[env->num_roads];

int num_drivable = 0;

float total_lane_length = 0.0f;

for (int i = 0; i < env->num_roads; i++) {

if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) {

drivable_lanes[num_drivable] = i;

lane_lengths[num_drivable] = env->road_elements[i].polyline_length;

total_lane_length += lane_lengths[num_drivable];

num_drivable++;

}

}

if (num_drivable == 0)

// Use pre-computed drivable lanes from the environment to avoid per-call VLAs and recomputation.

int num_drivable = env->num_drivable;

float total_lane_length = env->total_drivable_lane_length;

int *drivable_lanes = env->drivable_lane_indices;

float *lane_lengths = env->drivable_lane_lengths;

if (num_drivable <= 0 || total_lane_length <= 0.0f || drivable_lanes == NULL || lane_lengths == NULL)

Copilot · 2026-03-29T03:08:25Z

+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            randomize_agent_position(env, agent_idx);
+        }


In c_reset() variable-agent mode, the return value of randomize_agent_position() is ignored. If it fails to find a valid spawn, the agent will keep its prior episode position (which could be offroad/colliding), and the subsequent sample_new_goal() will be based on that stale state. Please check the return value and apply a deterministic fallback (e.g., retry with different constraints, respawn to log_trajectory_*[0], or mark/remove the agent) so resets are reliable.

Copilot · 2026-03-29T03:08:26Z

+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            randomize_agent_position(env, agent_idx);
+        }


In c_reset() variable-agent mode, positions are randomized but per-agent dynamics state (e.g., sim_vx/sim_vy, speed, accelerations, steering) is never reset. This means agents can start a new episode with leftover velocities from the previous episode even though their pose changed. Reset the dynamics fields after randomize_agent_position() (similar to respawn_agent()’s variable-mode branch) to avoid carry-over behavior and immediate post-reset collisions.

Suggested change

}

}

// Reset dynamics state after positions have been randomized to avoid carry-over

for (int x = 0; x < env->active_agent_count; x++) {

int agent_idx = env->active_agent_indices[x];

Agent *agent = &env->agents[agent_idx];

agent->sim_vx = 0.0f;

agent->sim_vy = 0.0f;

agent->sim_speed = 0.0f;

agent->sim_ax = 0.0f;

agent->sim_ay = 0.0f;

agent->steering = 0.0f;

}

Copilot · 2026-03-29T03:08:26Z

+        // Sample new goals relative to new positions
+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            sample_new_goal(env, agent_idx);
+        }


sample_new_goal() increments agent->goals_sampled_this_episode, but c_reset() later unconditionally sets goals_sampled_this_episode = 1.0f for all agents. In variable-agent mode this means the initial goal sampling done here will be double-counted or discarded depending on ordering. Consider moving goal sampling to after the per-agent reset loop (or resetting goals_sampled_this_episode appropriately before sampling) so the counter reflects the actual number of sampled goals.

Copilot · 2026-03-29T03:08:26Z

+    if (env->init_mode == INIT_VARIABLE_AGENT_NUMBER) {
+        // Randomize all agent positions on reset
+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            randomize_agent_position(env, agent_idx);
+        }
+        // Sample new goals relative to new positions
+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            sample_new_goal(env, agent_idx);
+        }


This PR changes reset/respawn semantics in INIT_VARIABLE_AGENT_NUMBER (new random collision-free spawn on both c_reset() and respawn_agent()), but there doesn’t appear to be an automated regression test covering this behavior (e.g., positions change across resets in the same env instance and respawns remain collision-free). Adding a Drive test would help prevent future regressions, similar to existing spawn-diversity tests in tests/test_rand_seed_bug.py.

Previously randomize-on-respawn was gated on init_mode == INIT_VARIABLE_AGENT_NUMBER. Now it's a separate config flag (randomize_respawn, default 0) so it can be toggled independently. Includes test in tests/test_randomize_respawn.py (requires cluster).

save_map_binary wrote 16 bytes of scenario_id before sdc_track_index, but load_map_binary in C starts reading at sdc_track_index with no scenario_id. This misaligned all subsequent reads, causing segfaults when loading converted maps. Also fixes Town10HD conversion failure (scenario_id was an int, not str).

…lean # Conflicts: # pufferlib/ocean/drive/drive.py

eugenevinitsky and others added 5 commits March 28, 2026 23:03

Fix: don't overwrite sampled goals with stale init_goal in variable a…

7dd6431

…gent mode c_reset's GOAL_GENERATE_NEW block was restoring init_goal_x/y/z after sample_new_goal had already set fresh goals relative to the new position. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix clang-format on randomize_agent_position

4d29124

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 29, 2026 03:04

Copilot started reviewing on behalf of eugenevinitsky March 29, 2026 03:04 View session

Copilot AI reviewed Mar 29, 2026

View reviewed changes

eugenevinitsky added 3 commits March 28, 2026 23:10

Default randomize_respawn to 1 (enabled)

b1bd6b4

eugenevinitsky requested review from Wea3el and mpragnay April 6, 2026 14:31

Merge remote-tracking branch 'origin/3.0' into ev/randomize-respawn-c…

28c5f05

…lean # Conflicts: # pufferlib/ocean/drive/drive.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomize agent positions on respawn in variable agent mode#376

Randomize agent positions on respawn in variable agent mode#376
eugenevinitsky wants to merge 9 commits into
3.0from
ev/randomize-respawn-clean

eugenevinitsky commented Mar 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Copilot AI Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        }
+        }
+        // Reset dynamics state after positions have been randomized to avoid carry-over
+        for (int x = 0; x < env->active_agent_count; x++) {
+            int agent_idx = env->active_agent_indices[x];
+            Agent *agent = &env->agents[agent_idx];
+            agent->sim_vx = 0.0f;
+            agent->sim_vy = 0.0f;
+            agent->sim_speed = 0.0f;
+            agent->sim_ax = 0.0f;
+            agent->sim_ay = 0.0f;
+            agent->steering = 0.0f;
+        }

Conversation

eugenevinitsky commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes (all in drive.h)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eugenevinitsky commented Mar 29, 2026 •

edited

Loading