Skip to content

Randomize agent positions on respawn in variable agent mode#376

Open
eugenevinitsky wants to merge 8 commits into3.0from
ev/randomize-respawn-clean
Open

Randomize agent positions on respawn in variable agent mode#376
eugenevinitsky wants to merge 8 commits into3.0from
ev/randomize-respawn-clean

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Mar 29, 2026

Summary

Previously agents always reset to their initial spawn position. Now in INIT_VARIABLE_AGENT_NUMBER mode, both mid-episode respawns and full episode resets pick a new random collision-free position on a drivable lane via length-weighted sampling.

Changes (all in drive.h)

  • Add randomize_agent_position() function: picks a random drivable lane point, checks for collisions, sets heading/velocity from lane geometry
  • Call it from respawn_agent() and c_reset() when in variable agent mode
  • Sample new goals after randomizing position (not before)
  • Don't overwrite sampled goals with stale init_goal in variable agent mode
  • Fix collision check and velocity restoration in respawn

Eugene Vinitsky and others added 5 commits March 28, 2026 23:03
Previously agents always reset to their initial spawn position. Now in
INIT_VARIABLE_AGENT_NUMBER mode, both mid-episode respawns and full
episode resets pick a new random collision-free position on a drivable
lane via length-weighted sampling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After moving an agent to a new random position, must also sample a new
goal relative to that position. Previously reset_goal_positions would
restore the original init goal, which could be far from the new spawn.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gent mode

c_reset's GOAL_GENERATE_NEW block was restoring init_goal_x/y/z after
sample_new_goal had already set fresh goals relative to the new position.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use proper OBB collision check (check_spawn_collision) instead of
  rough distance approximation in randomize_agent_position
- Restore original log_velocity on respawn for non-variable-agent modes
  instead of zeroing it (preserves data-driven replay behavior)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 29, 2026 03:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Ocean Drive simulator’s variable-agent initialization mode so agents no longer always reset/respawn to their original spawn point, instead selecting a new random collision-free lane position and then sampling goals from that new pose.

Changes:

  • Add randomize_agent_position() to sample a length-weighted random lane point and validate collision/offroad.
  • Use randomized positions on respawn_agent() and c_reset() when init_mode == INIT_VARIABLE_AGENT_NUMBER.
  • Prevent variable-agent resets from overwriting newly-sampled goals with stale init_goal_*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2832 to +2846
// Pre-compute drivable lanes
int drivable_lanes[env->num_roads];
float lane_lengths[env->num_roads];
int num_drivable = 0;
float total_lane_length = 0.0f;
for (int i = 0; i < env->num_roads; i++) {
if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) {
drivable_lanes[num_drivable] = i;
lane_lengths[num_drivable] = env->road_elements[i].polyline_length;
total_lane_length += lane_lengths[num_drivable];
num_drivable++;
}
}

if (num_drivable == 0)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

randomize_agent_position allocates drivable_lanes/lane_lengths as VLAs sized by env->num_roads on the stack and recomputes drivable lanes each call. Since env->num_roads can be large and this function may run frequently (respawns/resets), this can cause excessive stack usage and unnecessary work. Prefer reusing env->num_drivable, env->drivable_lane_indices, env->drivable_lane_lengths, and env->total_drivable_lane_length computed in compute_drivable_lane_points() (or allocate once on the heap) instead of per-call stack arrays.

Suggested change
// Pre-compute drivable lanes
int drivable_lanes[env->num_roads];
float lane_lengths[env->num_roads];
int num_drivable = 0;
float total_lane_length = 0.0f;
for (int i = 0; i < env->num_roads; i++) {
if (env->road_elements[i].type == ROAD_LANE && env->road_elements[i].polyline_length > 0.0f) {
drivable_lanes[num_drivable] = i;
lane_lengths[num_drivable] = env->road_elements[i].polyline_length;
total_lane_length += lane_lengths[num_drivable];
num_drivable++;
}
}
if (num_drivable == 0)
// Use pre-computed drivable lanes from the environment to avoid per-call VLAs and recomputation.
int num_drivable = env->num_drivable;
float total_lane_length = env->total_drivable_lane_length;
int *drivable_lanes = env->drivable_lane_indices;
float *lane_lengths = env->drivable_lane_lengths;
if (num_drivable <= 0 || total_lane_length <= 0.0f || drivable_lanes == NULL || lane_lengths == NULL)

Copilot uses AI. Check for mistakes.
Comment on lines +3191 to +3194
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
randomize_agent_position(env, agent_idx);
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In c_reset() variable-agent mode, the return value of randomize_agent_position() is ignored. If it fails to find a valid spawn, the agent will keep its prior episode position (which could be offroad/colliding), and the subsequent sample_new_goal() will be based on that stale state. Please check the return value and apply a deterministic fallback (e.g., retry with different constraints, respawn to log_trajectory_*[0], or mark/remove the agent) so resets are reliable.

Copilot uses AI. Check for mistakes.
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
randomize_agent_position(env, agent_idx);
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In c_reset() variable-agent mode, positions are randomized but per-agent dynamics state (e.g., sim_vx/sim_vy, speed, accelerations, steering) is never reset. This means agents can start a new episode with leftover velocities from the previous episode even though their pose changed. Reset the dynamics fields after randomize_agent_position() (similar to respawn_agent()’s variable-mode branch) to avoid carry-over behavior and immediate post-reset collisions.

Suggested change
}
}
// Reset dynamics state after positions have been randomized to avoid carry-over
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
Agent *agent = &env->agents[agent_idx];
agent->sim_vx = 0.0f;
agent->sim_vy = 0.0f;
agent->sim_speed = 0.0f;
agent->sim_ax = 0.0f;
agent->sim_ay = 0.0f;
agent->steering = 0.0f;
}

Copilot uses AI. Check for mistakes.
Comment on lines +3195 to +3199
// Sample new goals relative to new positions
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
sample_new_goal(env, agent_idx);
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_new_goal() increments agent->goals_sampled_this_episode, but c_reset() later unconditionally sets goals_sampled_this_episode = 1.0f for all agents. In variable-agent mode this means the initial goal sampling done here will be double-counted or discarded depending on ordering. Consider moving goal sampling to after the per-agent reset loop (or resetting goals_sampled_this_episode appropriately before sampling) so the counter reflects the actual number of sampled goals.

Copilot uses AI. Check for mistakes.
Comment on lines +3189 to +3199
if (env->init_mode == INIT_VARIABLE_AGENT_NUMBER) {
// Randomize all agent positions on reset
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
randomize_agent_position(env, agent_idx);
}
// Sample new goals relative to new positions
for (int x = 0; x < env->active_agent_count; x++) {
int agent_idx = env->active_agent_indices[x];
sample_new_goal(env, agent_idx);
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes reset/respawn semantics in INIT_VARIABLE_AGENT_NUMBER (new random collision-free spawn on both c_reset() and respawn_agent()), but there doesn’t appear to be an automated regression test covering this behavior (e.g., positions change across resets in the same env instance and respawns remain collision-free). Adding a Drive test would help prevent future regressions, similar to existing spawn-diversity tests in tests/test_rand_seed_bug.py.

Copilot uses AI. Check for mistakes.
Previously randomize-on-respawn was gated on init_mode ==
INIT_VARIABLE_AGENT_NUMBER. Now it's a separate config flag
(randomize_respawn, default 0) so it can be toggled independently.

Includes test in tests/test_randomize_respawn.py (requires cluster).
save_map_binary wrote 16 bytes of scenario_id before sdc_track_index,
but load_map_binary in C starts reading at sdc_track_index with no
scenario_id. This misaligned all subsequent reads, causing segfaults
when loading converted maps.

Also fixes Town10HD conversion failure (scenario_id was an int, not str).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants