Randomize agent positions on respawn in variable agent mode#376
Randomize agent positions on respawn in variable agent mode#376eugenevinitsky wants to merge 8 commits into3.0from
Conversation
Previously agents always reset to their initial spawn position. Now in INIT_VARIABLE_AGENT_NUMBER mode, both mid-episode respawns and full episode resets pick a new random collision-free position on a drivable lane via length-weighted sampling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After moving an agent to a new random position, must also sample a new goal relative to that position. Previously reset_goal_positions would restore the original init goal, which could be far from the new spawn. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gent mode c_reset's GOAL_GENERATE_NEW block was restoring init_goal_x/y/z after sample_new_goal had already set fresh goals relative to the new position. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use proper OBB collision check (check_spawn_collision) instead of rough distance approximation in randomize_agent_position - Restore original log_velocity on respawn for non-variable-agent modes instead of zeroing it (preserves data-driven replay behavior) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates the Ocean Drive simulator’s variable-agent initialization mode so agents no longer always reset/respawn to their original spawn point, instead selecting a new random collision-free lane position and then sampling goals from that new pose.
Changes:
- Add
randomize_agent_position()to sample a length-weighted random lane point and validate collision/offroad. - Use randomized positions on
respawn_agent()andc_reset()wheninit_mode == INIT_VARIABLE_AGENT_NUMBER. - Prevent variable-agent resets from overwriting newly-sampled goals with stale
init_goal_*.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Pre-compute drivable lanes | ||
| int drivable_lanes[env->num_roads]; | ||
| float lane_lengths[env->num_roads]; | ||
| int num_drivable = 0; | ||
| float total_lane_length = 0.0f; | ||
| for (int i = 0; i < env->num_roads; i++) { | ||
| if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) { | ||
| drivable_lanes[num_drivable] = i; | ||
| lane_lengths[num_drivable] = env->road_elements[i].polyline_length; | ||
| total_lane_length += lane_lengths[num_drivable]; | ||
| num_drivable++; | ||
| } | ||
| } | ||
|
|
||
| if (num_drivable == 0) |
There was a problem hiding this comment.
randomize_agent_position allocates drivable_lanes/lane_lengths as VLAs sized by env->num_roads on the stack and recomputes drivable lanes each call. Since env->num_roads can be large and this function may run frequently (respawns/resets), this can cause excessive stack usage and unnecessary work. Prefer reusing env->num_drivable, env->drivable_lane_indices, env->drivable_lane_lengths, and env->total_drivable_lane_length computed in compute_drivable_lane_points() (or allocate once on the heap) instead of per-call stack arrays.
| // Pre-compute drivable lanes | |
| int drivable_lanes[env->num_roads]; | |
| float lane_lengths[env->num_roads]; | |
| int num_drivable = 0; | |
| float total_lane_length = 0.0f; | |
| for (int i = 0; i < env->num_roads; i++) { | |
| if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) { | |
| drivable_lanes[num_drivable] = i; | |
| lane_lengths[num_drivable] = env->road_elements[i].polyline_length; | |
| total_lane_length += lane_lengths[num_drivable]; | |
| num_drivable++; | |
| } | |
| } | |
| if (num_drivable == 0) | |
| // Use pre-computed drivable lanes from the environment to avoid per-call VLAs and recomputation. | |
| int num_drivable = env->num_drivable; | |
| float total_lane_length = env->total_drivable_lane_length; | |
| int *drivable_lanes = env->drivable_lane_indices; | |
| float *lane_lengths = env->drivable_lane_lengths; | |
| if (num_drivable <= 0 || total_lane_length <= 0.0f || drivable_lanes == NULL || lane_lengths == NULL) |
| for (int x = 0; x < env->active_agent_count; x++) { | ||
| int agent_idx = env->active_agent_indices[x]; | ||
| randomize_agent_position(env, agent_idx); | ||
| } |
There was a problem hiding this comment.
In c_reset() variable-agent mode, the return value of randomize_agent_position() is ignored. If it fails to find a valid spawn, the agent will keep its prior episode position (which could be offroad/colliding), and the subsequent sample_new_goal() will be based on that stale state. Please check the return value and apply a deterministic fallback (e.g., retry with different constraints, respawn to log_trajectory_*[0], or mark/remove the agent) so resets are reliable.
| for (int x = 0; x < env->active_agent_count; x++) { | ||
| int agent_idx = env->active_agent_indices[x]; | ||
| randomize_agent_position(env, agent_idx); | ||
| } |
There was a problem hiding this comment.
In c_reset() variable-agent mode, positions are randomized but per-agent dynamics state (e.g., sim_vx/sim_vy, speed, accelerations, steering) is never reset. This means agents can start a new episode with leftover velocities from the previous episode even though their pose changed. Reset the dynamics fields after randomize_agent_position() (similar to respawn_agent()’s variable-mode branch) to avoid carry-over behavior and immediate post-reset collisions.
| } | |
| } | |
| // Reset dynamics state after positions have been randomized to avoid carry-over | |
| for (int x = 0; x < env->active_agent_count; x++) { | |
| int agent_idx = env->active_agent_indices[x]; | |
| Agent *agent = &env->agents[agent_idx]; | |
| agent->sim_vx = 0.0f; | |
| agent->sim_vy = 0.0f; | |
| agent->sim_speed = 0.0f; | |
| agent->sim_ax = 0.0f; | |
| agent->sim_ay = 0.0f; | |
| agent->steering = 0.0f; | |
| } |
| // Sample new goals relative to new positions | ||
| for (int x = 0; x < env->active_agent_count; x++) { | ||
| int agent_idx = env->active_agent_indices[x]; | ||
| sample_new_goal(env, agent_idx); | ||
| } |
There was a problem hiding this comment.
sample_new_goal() increments agent->goals_sampled_this_episode, but c_reset() later unconditionally sets goals_sampled_this_episode = 1.0f for all agents. In variable-agent mode this means the initial goal sampling done here will be double-counted or discarded depending on ordering. Consider moving goal sampling to after the per-agent reset loop (or resetting goals_sampled_this_episode appropriately before sampling) so the counter reflects the actual number of sampled goals.
pufferlib/ocean/drive/drive.h
Outdated
| if (env->init_mode == INIT_VARIABLE_AGENT_NUMBER) { | ||
| // Randomize all agent positions on reset | ||
| for (int x = 0; x < env->active_agent_count; x++) { | ||
| int agent_idx = env->active_agent_indices[x]; | ||
| randomize_agent_position(env, agent_idx); | ||
| } | ||
| // Sample new goals relative to new positions | ||
| for (int x = 0; x < env->active_agent_count; x++) { | ||
| int agent_idx = env->active_agent_indices[x]; | ||
| sample_new_goal(env, agent_idx); | ||
| } |
There was a problem hiding this comment.
This PR changes reset/respawn semantics in INIT_VARIABLE_AGENT_NUMBER (new random collision-free spawn on both c_reset() and respawn_agent()), but there doesn’t appear to be an automated regression test covering this behavior (e.g., positions change across resets in the same env instance and respawns remain collision-free). Adding a Drive test would help prevent future regressions, similar to existing spawn-diversity tests in tests/test_rand_seed_bug.py.
Previously randomize-on-respawn was gated on init_mode == INIT_VARIABLE_AGENT_NUMBER. Now it's a separate config flag (randomize_respawn, default 0) so it can be toggled independently. Includes test in tests/test_randomize_respawn.py (requires cluster).
save_map_binary wrote 16 bytes of scenario_id before sdc_track_index, but load_map_binary in C starts reading at sdc_track_index with no scenario_id. This misaligned all subsequent reads, causing segfaults when loading converted maps. Also fixes Town10HD conversion failure (scenario_id was an int, not str).
Summary
Previously agents always reset to their initial spawn position. Now in
INIT_VARIABLE_AGENT_NUMBERmode, both mid-episode respawns and full episode resets pick a new random collision-free position on a drivable lane via length-weighted sampling.Changes (all in drive.h)
randomize_agent_position()function: picks a random drivable lane point, checks for collisions, sets heading/velocity from lane geometryrespawn_agent()andc_reset()when in variable agent modeinit_goalin variable agent mode