Mohit/cleanup apr25 by m2kulkarni · Pull Request #36 · Emerge-Lab/Adaptive_Driving_Agent

m2kulkarni · 2026-05-15T21:01:02Z

No description provided.

* Make sure we can overwrite goal_behavior from python side and other minor improvements. * Fix stop goal behavior bug. * Make goal radius configurable for WOSAC eval. * Reset to defaults + cleanup. * Minor * Minor * Incorprate feedback.

Accel is being cut in half for no reason

* Add control mode. * Fix error message.

Remove halved accel

* Fix incorrect obs dim in draw_agent_obs * Update drive.h --------- Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

…erge-Lab#104) * make joint action space, currently uses multidiscrete and should be replaced with discrete * Fix shape mismatch in logits. * Minor * Revert: Puffer doesn't like Discrete * Minor * Make action dim conditional on dynamics model. --------- Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

* Replace default learning rate and ent_coef. * Minor * Round.

* Quick integration of WOSAC eval during training, will clean up tomorrow. * Refactor eval code into separate util functions. * Refactor code to support more eval modes. * Add human replay evaluation mode. * Address comments. * Fix args and add to readme * Improve and simplify code. * Minor. * Reset to default ini settings.

* Add python test for ini file parsing - Check values from default.ini - Check values from drive.ini - Additional checks for comments capabilities * Add C test for ini file parsing - Add CMake project to configure, build and test - Test value parsing - Test comments format - Add comments for (un)expected results * FIX: Solve all memory errors in tests - Compile with asan * Remove unprinted messages * Add utest to the CI - Ini parsing tests - Update comments to clarify intent * Update tests/ini_parser/ini_tester.c - Change check conditions to if/else instead of ifs - Speed up parsing speed (exist as soon as match is found) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/ini_parser/ini_tester.c - Fix mismatch assignation Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * FIX: Move num_map to the high level of testing --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…-Lab#138) * Adding Interaction features Notes: - Need to add safeguards to load each map only once - Might be slow if we increase num_agents per scenario, next step will be torch. I added some tests to see the distance and ttc computations are correct, and metrics_sanity_check looks okay. I'll keep making some plots to validate it. * Added the additive smoothing logic for Bernoulli estimate. Ref in original code: message BernoulliEstimate { // Additive smoothing to apply to the underlying 2-bins histogram, to avoid // infinite values for empty bins. optional float additive_smoothing_pseudocount = 4 [default = 0.001]; } * Little cleanup of estimators.py * Towards map-based realism metrics: First step: extract the map from the vecenv * Second step: Map features (signed distance to road edges) A bunch of little tests in test_map_metric_features.py to ensure this do what it is supposed to do. python -m pufferlib.ocean.benchmark.test_map_metrics Next steps should be straightforward. Will need to check at some point if doing this on numpy isnt too slow * Map-based features. This works, and passes all the tests, I would still want to make additionnal checks with the renderer because we never know. With this, we have the whole set of WOSAC metrics (except for traffic lights), and we might also have the same issue as the original WOSAC code: it is slow. Next step would be to transition from numpy to torch. * Added a visual sanity check, plot random trajectories and indicate when WOSAC sees an offorad or a collision python pufferlib/ocean/benchmark/visual_sanity_check.py * Update WOSAC control mode and ids. * Eval mask for tracks_to_predict agents * Replacing numpy by torch for the computation of interaction and map metrics. It makes the computation way faster, and all the tests pass. I didn't switch kinematics to torch because it was already fast, but I might make the change for consistency. * Precommit * Resolve small comments. * More descriptive error message when going OOM. --------- Co-authored-by: WaelDLZ <wawa@CRE1-W60060.vnet.valeo.com> Co-authored-by: Waël Doulazmi <wawa@10-20-1-143.dynapool.wireless.nyu.edu> Co-authored-by: Waël Doulazmi <wawa@Waels-MacBook-Air.local> Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

Co-authored-by: Pragnay Mandavilli <pm3881@gr052.hpc.nyu.edu>

* Add option for targeted experiments. * Rename for clarity. * Minor * Remove tag * Add to help message and make deepcopy of args to prevent state pollution.

…merge-Lab#146) * Little optimizations to use less memory in interaction_features.py They mostly consist in using in-place operations and deleting unused variables. Code passes the tests. Next steps: - clean the .cpu().numpy() in ttc computation - memory optimization for the map_features as well * Add future todo. --------- Co-authored-by: Waël Doulazmi <waeldoulazmi@gmail.com>

…rent dataset (Emerge-Lab#151) * Support train/test split with datasets. * Switch defaults. * Minor. * Typo. * More robust way of parsing the path.

* Load the sprites inside eval-gif() * Color consistency. * pedestrians and cyclists 3d models * Minor. --------- Co-authored-by: Spencer Cheng <spenccheng@gmail.com>

* multiprocessing and progbar * cleanup

* Test * Edit. * Edit.

* Get rid of magic numbers in torch net. * Stop recording agent view once agent reaches first got goal. Respawning vids look confusing. * Add in missing models for headless rendering. * Fix bbox rotation bug in render function. * Remove magic numbers. Define constants once in drive.h and read from there.

Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

…merge-Lab#165) * Get rid of magic numbers in torch net. * Stop recording agent view once agent reaches first got goal. Respawning vids look confusing. * Add in missing models for headless rendering. * Fix bbox rotation bug in render function. * Remove magic numbers. Define constants once in drive.h and read from there. * Remove all magic numbers in drivenet.h * Clean up more magic numbers. * Minor * Minor.

…uiv test The previous version did `int(pos.item()) % horizon` to get a Python int for cache slot indexing. On the GPU ego policy under torch.compile this caused a Dynamo graph break and a CUDA sync every step. Refactored to keep `slot` as a 1-element long tensor throughout: - Slot computed as `(pos % horizon).long()` (no .item()). - Position embedding pulled with `index_select(1, slot_t)`. - Cache write uses `index_copy_(2, slot_t, k)` instead of `[..., slot, :] =`. - Attention mask built from `arange <= slot_t` instead of slot-keyed dict. Verified: torch.compile of forward_eval on CUDA bf16 completes 5 calls with no graph breaks logged. Also adds tests/test_co_player_equivalence.py — equivalence check on the *real* puffer_drive_b6big5j1.pt co-player checkpoint over 200 streaming steps with multiple horizon wraps + per-row reset, on both CPU and GPU, in fp32 and bf16. Production paths (CPU fp32 for co-player, CUDA bf16 for ego) match legacy to ~1e-5 and ~0.16 max-logit-diff respectively, both well within RL noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Driving_Agent into mohit/cleanup-apr25

Sweep on a 1-GPU test mirroring the production puffer_adaptive_drive command (KV-cache fix on, MM=400, MAXMB=36400) showed: nwork=16 (old default): 70.9K SPS (eval 44s, train 5s) nwork=32 (new default): 90.1K SPS (eval 1m 9s, train 7s) -> +27% Eval scales sublinearly because of CPU/memory-bandwidth contention, but the gain is real and free of any algorithmic change. nwork=48 needed --train.cpu-offload True (~33 GB obs buffer doesn't fit on a 5090); cpu_offload has a CPU/GPU index-device bug in the adaptive code path so 48 isn't reachable yet. For the typical multi-job pattern (4 simultaneous 1-GPU runs), 4 jobs x 32 workers = 128 cores; the box has 224, so still well within budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Driving_Agent into mohit/cleanup-apr25

…behavior stop

… goal-behavior When --load-model-path points at a checkpoint that has a sibling trainer_state.pt, restore optimizer state, epoch, global_step, and advance the cosine LR scheduler to the saved position. When --load-id is also passed, the wandb logger now reuses that run id (resume="allow" was already the default in WandbLogger), so the resumed training appends to the original wandb history instead of starting a fresh run. Also pin --env.goal-behavior 2 in the eval-time render path so the human_replay videos match the eval metric semantics (stop-on-goal, not respawn). torch.load needs weights_only=False because trainer_state.pt holds the optimizer state dict (with class refs), not just tensors — otherwise PyTorch 2.6's default weights_only=True silently rejects the load and the resume falls back to a cold-start. Two new launcher scripts (nuplan_transformer_local_resume.sh and the k=3 variant) automate the resume flow: snapshot the latest checkpoint of each run, pass --load-id + --load-model-path, and re-spawn the 8 killed runs in fresh tmux sessions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

With cpu_offload=True, the observations buffer is allocated on CPU (with pin_memory) so it doesn't eat GPU VRAM. The training-time minibatch sampling then crashed at: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) because the prio-sampled `idx` tensor lives on the training device (GPU) but is used to fancy-index the CPU obs tensor. Move the index to CPU for the gather, then ship the gathered minibatch back to the device with a pinned-memory async copy. Path is gated on cpu_offload so the non-offload default is bit-identical. Unlocks --vec.num-workers > 16 for k=3 adaptive training (was OOM-bound on the GPU obs buffer; with offload, obs lives in 500GB RAM). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move the per-step co-player Transformer forward pass out of the forked worker subprocesses (where it's stuck on a single CPU thread) and onto the main GPU process. Workers now skip get_co_player_actions and just read the co-player slots of the shared-memory `actions` buffer that the main process fills before each vec_step. For k=3 adaptive runs this lifts the env-stepping bottleneck (single worker = 273 steps × ~50-100 ms CPU forward = ~15-27 s of inference per worker per epoch). On GPU with the existing KV cache the same forward is ~5-10 ms — the env-stepping section becomes essentially free. Wiring: * adaptive.ini: new opt-in flag `external_co_player_actions`. Default False so existing per-worker CPU runs are bit-identical. * vector.py: when the flag + `co_player_enabled` are set, allocate a `co_player_conditioning` SHM buffer (per-worker x co-players x conditioning_dim), expose the GPU co-player policy on `vecenv`, and skip the parent-process single-thread torch lockdown (workers no longer run torch). * drive.py: env step() skips the local CPU forward; reset() and the scenario-boundary still resample conditioning and write it to the SHM (main reads it before each forward). State management is owned entirely by main now. * pufferl.py: PuffeRL.__init__ moves the co-player to the training device and sets up per-worker state dicts (lazy-allocated by forward_eval to avoid dtype mismatch with autocast). evaluate() gains _fill_external_co_player_actions which extracts co-player obs from the recv batch, concatenates SHM-resident conditioning (matches drive.py:_add_co_player_conditioning exactly), runs forward_eval on GPU, and writes argmax actions into vecenv.actions[worker_id] before send. * models.py: defensive cast in _prime_kv_cache so a dtype mismatch between the K/V cache and the prime-time layer output (which can appear under autocast/mixed precision) doesn't crash with `Index put requires the source and destination dtypes match`. Smoke run on GPU 6 (puffer_adaptive_drive, k=2, nw=2, b6big5j1 co-player ckpt with all-conditioning) advances cleanly through 3+ epochs, no errors, SPS = 17K → 23K. Known limitations / followups: * batch_size > 1 in vec is not yet supported (raises NotImplemented). Production runs use batch_size=1 so this is fine, but the loop needs a small generalization for completeness. * Numerical equivalence to the per-worker CPU path has not yet been verified end-to-end on a long run; smoke matches by inspection of losses (no NaN, sensible policy_loss / kl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds trajectory_length and output_subdir kwargs to process_all_maps / load_map / save_map_binary so we can regenerate nuplan binaries at the full 201-frame length (the underlying JSONs go up to 201 frames; the old 91-frame default truncates ~55% of the data on average) without overwriting the existing 91-frame nuplan/ binaries. Defaults preserved at 91 so all existing code paths and binaries keep working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The aggregate ada_delta_score was hovering at ~0 across all 4 k=3 trained checkpoints, but most nuplan scenes are easy (ego trivially succeeds in scenario 0) so the signal in the few hard cases gets diluted. This adds a per-(rollout, agent, scenario) success log so downstream analysis can compute conditional rates like P(succeed s_k | failed s_0) — the actual in-context-adaptation signal lives there. Pipeline: - HumanReplayEvaluator now tracks per-agent goal-reach via the +1 reward spike (in stop-on-goal eval, dones are not set per agent). Dumps a flat list-of-records to wandb metrics under per_agent_success_log. - RECOVERY_CACHE_RESET_PER_SCENARIO=1 env var resets the policy's K/V cache at every scenario boundary — the control variant that lets us isolate "is the cache helping?" from "is current obs alone enough?" - scripts/eval_recovery_{all,control}.sh launch the 4 k=3 checkpoints in parallel on GPUs 4-7. recovery_compare.py renders adaptive-vs- control side-by-side with all conditional rates. Result on the 4 current k=3 checkpoints (300 rollouts × 99 agents): - Cache contributes a ~+1% lift on P(succeed s_k | failed s_0) — small but positive across all 4, statistically significant (~3σ) on 2/4. - Bulk of policy capacity is single-scenario observation; cross-scenario context provides marginal signal. Map rotation per scenario (TODO) should force the policy to lean on the cache more. Also includes: - scripts/coplayers/nuplan_transformer_local_201.sh: launcher for new co-players trained on the regenerated 201-frame nuplan binaries. - Resume scripts updated to point at /workspace/ADA (post-merge of the coplayer-to-gpu branch) and use the optimized flags where appropriate. - TODO_paper.md tracks pending work (resume k=3, map rotation, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per-scenario map rotation: at each scenario boundary the env hard-reinits with fresh map_ids while leaving the ego policy's K/V cache (held in main) intact. Forces the policy to use cross-scenario context because the current scene is genuinely new each scenario. Render path needed C-side help to survive the in-step vec_close + revec: - New c_donate_client / c_adopt_client (env_binding.h, drive.h) stash env[0]->client into a global before vec_close so raylib + ffmpeg pipe survive the swap. - Render env sets _render_keep_client_on_swap=True; reinit calls donate before vec_close and adopt after revec. - Reinit's env_init now passes render_mode=self._render_mode_int so the new env stays in HEADLESS and write_frame_to_pipe keeps firing past the first scenario. Without this, render segfaulted at the boundary (raylib's CloseWindow -> InitWindow cycle is not safe under xvfb) or stopped writing frames at frame 91 (render_mode silently reset to OFF). Launchers: - nuplan_transformer_local_k3_maprand.sh: 4 k=3 runs vs old 91-frame coplayers with map_rand on. Default GPUs 4-7. - nuplan_transformer_local_k2_201.sh: 4 k=2 runs vs new 201-frame coplayers with map_rand on. Default GPUs 0-3, nw=32. The 4 k=2/201/maprand runs that were started today (wandb runs 1f7gi2r3, 99vg079m, i8qy2lc8, szzu12a2) were trained on this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Experiments enabled by this commit (in approx order of run): - City-adapt log-replay training: train on US (Boston+Pittsburgh+Vegas, 5072 scenes), eval on Singapore (329 held-out scenes). co_player_enabled off, scenarios continue on the same map (map_rand off). Launcher: scripts/adaptive/nuplan_transformer_local_k2_201_city.sh. - Entropy-sweep coplayer training: 5 partners trained on the 201-frame nuPlan data with entropy_ub ∈ {0.05, 0.10, 0.20, 0.50, 1.00} at fixed discount_lb=0.4. Gives a partner pool spanning near-deterministic to very stochastic. Launcher: scripts/coplayers/nuplan_transformer_local_201_entropy_sweep.sh. - Diverse-partner adaptive training: pair ego with one entropy-sweep partner, sample partner conditioning per episode from the partner's wide trained range. map_rand off so ego batch row → agent identity stays stable across the boundary; cache encodes partner type from s_0, applies in s_1. Launcher: scripts/adaptive/nuplan_transformer_local_k2_201_diverse.sh. Architecture / env additions: - drive.py: condition_rand_per_scenario flag (default False). When True the partner conditioning vector is re-sampled at every scenario boundary; partner POLICY weights are unchanged but its conditioning input changes. Defined as a separate axis from map_rand (the two resamplings happen via different code paths). Currently unused by the diverse launcher (per-episode sampling already gives the right semantics for adaptation), but kept for future curriculum work. - adaptive.ini: exposes condition_rand_per_scenario as a CLI flag. - utils.py: render_videos forces external_co_player_actions=False on the render env (centralized inference is only set up for the trainer's workers, not the standalone render env, so co-players froze without this). - drive.h: render_mode propagation in env reinit + per-agent color distinguishing ego (magenta) from co-player (blue) so debug renders show which policy controls which car. - models.py: probe_attention support — when state["_probe_attention"] is True, forward_eval computes attention weights via explicit softmax (instead of SDPA's functional which doesn't return weights) and appends per-(layer, slot) tensors to state["_attn_weights"]. Gated so no overhead in production. Analysis tooling: - scripts/probe_attention.py: load checkpoint + env, run a probed rollout, dump (layer, head, query_step, key_position) attention tensors as npz. - scripts/visualize_attention.py: render heatmaps and a cross-scenario attention-mass time series from probe outputs. - scripts/counterfactual_cache.py: paired rollouts with cache preserved vs zeroed at scenario boundary; reports paired s_1 lift and conditional recovery P(succ s_1 | fail s_0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In-process curriculum that anneals partner's entropy_weight_ub through 4 stages (5% → 20% → 50% → 100% of the user-passed final value), each 30 episodes per worker. Stash sampled entropy stats per resample and drain to wandb at next report_interval — gives a per-episode trace of the partner-conditioning distribution actually fed to the partner policy. Launcher: 4-run ablation (curriculum vs no-curriculum × {final_ub=0.5, 1.0}) using matched partners from the entropy sweep (n48teqjs for e=1.0, 6rauydj2 for e=0.5).

Curriculum stages cut the ego K/V cache at within-episode scenario boundaries: Stage 0 (k_eff=1): cut every boundary → 4 independent scenarios Stage 1 (k_eff=2): cut middle boundary → 2 clean k=2 chunks Stage 2 (k_eff=K_max): no cuts → full cross-scenario context Reset mechanism: at boundaries to cut, drive.py sets truncations[ego_ids]=1 and terminals[ego_ids]=1. pufferl drops the cache via done_mask=t+d at eval, and create_episode_mask blocks cross-boundary attention during training — eval and train see the same effective context. Trade-off: PPO treats the cut as episode- end for advantage, slightly under-credits cross-segment value; effect shrinks to zero at stage 2 (no cuts). Stage formula: cut at boundary iff current_scenario % k_eff == 0 (within-episode only). With K_max=4 and stages k_eff∈{1,2,4} this gives uniform splits. Launcher: 2 runs at k_max=4 horizon=804, partner n48teqjs at e_ub=1.0; differs only in k_eff_curriculum_enabled. Sized memory: mb_mult=50 keeps minibatch_size=40200 (same as k=2/100). Per-run RAM ≈ 2× k=2 → can't run alongside the active k=2 entropy ablation.

The per-worker episode counter that drives the entropy curriculum is not part of the model checkpoint, so a naive resume of a curriculum run restarts the curriculum from stage 0 (entropy_ub = 5% of final) even if the original run was already in stage 2 or 3. The new kwarg lets the resume launcher seed the counter to the original run's ending episode count so the curriculum picks up at the right stage. Resume launcher (scripts/adaptive/nuplan_transformer_local_k2_201_curriculum_resume.sh) restores the 3 killed runs from the entropy_ub ablation: - exp1 nocurr_e1.0 from epoch 60 (no curriculum) - exp2 curr_e0.5 from epoch 80 (curriculum, episodes_start=80 → resumes in stage 2) - exp3 nocurr_e0.5 from epoch 130 (no curriculum) exp0 curr_e1.0 (uufybjgm) finished cleanly and is not resumed. Default LR is preserved — pufferl already restores optimizer state and global_step from sibling trainer_state.pt (pufferl.py:280) so cosine LR resumes mid-schedule, not at peak.

pufferl.evaluate() created a fresh state dict every step in the rollout loop and pulled `transformer_context` / `transformer_position` from persistent storage — but NEVER persisted `k_cache` / `v_cache`. The model wrote them into the local state dict at the end of each forward_eval, but pufferl threw them away. Effect: on every rollout step the model saw `state.get("k_cache") is None`, triggered the `need_alloc` branch, allocated fresh empty K/V tensors, AND reset `pos` to 0. So every step the transformer attended only to the current observation; the cache was never used. Meanwhile the training pass (full-sequence forward) DID apply attention across all timesteps. So the policy's gradient was computed assuming it had used past context, but at action-selection time it never did. Massive train/eval mismatch — the policy weights were pulled toward "use context" but the policy never actually had context at decision time. This silently broke every in-context-learning experiment in the project (k_eff curriculum, entropy curriculum, oracle, partner-pool diversity, gamma sweep) — none of them could have ever worked because the cache they were trying to teach the policy to use was empty at inference. The eval pipeline (HumanReplayEvaluator), render rollouts (drive/rollout.py), and analysis scripts (probe_attention.py) all persist state correctly, so they showed the policy under "with cache" conditions — but the policy had been trained against "no cache" rollouts, so its weights couldn't actually leverage the context they were now being given. Fix: add `transformer_k_cache` and `transformer_v_cache` persistent dicts alongside the existing transformer_context/position. Pull them into per-step state, write back after forward_eval, zero the rows on episode boundary (so a fresh episode doesn't inherit prior K/V). Verified with /tmp/test_kv_cache_persistence.py: with persisted state, pos increments 1→2→3→…, and past slots fill with non-zero K/V values. Without persistence (the old behavior), pos stays at 1 and past slots are always zero.

With the K/V cache now persisted across rollout steps, pos=-1 caused the first post-reset step to write to slot horizon-1 instead of slot 0 (since (-1) % horizon == horizon-1). The cache zeroing handled the correctness of attention values, but it left the new episode's first token in a slot that the rest of the episode would never naturally overwrite via index_copy_ at slot=pos%horizon. Cleaner to mirror the need_alloc/first-call branch: pos=0 → slot=0 → write the new episode's first token to slot 0 like a fresh run would.

Learnable PE (nn.Parameter, init zeros + std=0.02 noise) requires the model to learn temporal structure from gradients. With our sparse adaptation reward, that structure was probably never being learned well — at start of training PE is essentially zero and the model sees a 'bag of timesteps'. Sinusoidal PE encodes absolute position via sin/cos at multiple frequencies. The model has temporal structure available from step zero of training, no waiting for gradients to teach it. Implementation: register_buffer (non-trainable, no grad). Same shape (1, horizon, hidden_size) as before, accessed via the same get_positional_embedding helper, so both forward (training) and forward_eval (rollout) paths work unchanged. Note: existing checkpoints with the trained learnable PE param will need strict=False loading to migrate to this version. Fresh runs start with the sinusoidal buffer immediately.

Default eval (50 rollouts on full nuplan_201) showed ada_delta_score ~0 across all configs. Inspection of the underlying trajectory data: 54% of nuplan_201 maps have zero SDC-vehicle interaction, diluting the average. nuplan_hard is the top 10% (540 maps) by SDC interaction-step count, defined purely from the recorded human trajectories in data/nuplan_gpudrive/nuplan/ — no policy involvement. On 4lm6kkh7 epoch 40 with 200 rollouts, ada_delta_score moves from ±0.005 (full set) to +0.222 ± 0.18 on nuplan_hard. Collisions in s_1 roughly halve vs s_0; episode_return nearly doubles. The model adapts; the metric was diluted. Also adds reward_only_last_scenario flag to Drive: zeros rewards in scenarios 0..k-2, used to test whether reward shape (rather than architecture) is the bottleneck for cross-scenario adaptation. Files: - scripts/score_maps_interaction.py (compute hardness per map) - scripts/build_nuplan_hard.py (symlink top-K maps into a new dir) - scripts/adaptive/nuplan_transformer_local_k2_201_lastscen.sh - pufferlib/ocean/drive/drive.py (+reward_only_last_scenario) - pufferlib/config/ocean/adaptive.ini (expose new kwarg) - notes/nuplan_hard.md (recreation + eval recipe + caveats) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- partner_sweep: 5 ego runs against the entropy-conditioned partners (miku2puk, 2e029h15, m2ygolog, 6rauydj2, n48teqjs) at γ=0.995, lane_align=0.025, eval against nuplan_hard. Holds everything else fixed so ada_delta vs partner-entropy is the only varying dimension. - gamma_sweep: γ ∈ {0.99, 0.995, 0.999} × 2 partners. The 0.995 winner (4lm6kkh7) is the basis for all later runs. - adam_test, lr_test, prio_test, prio_clip_test, prio_ent_test: ablation launchers for optimizer/LR/priority sampling/clip-coef investigations into the high clipfrac. - oracle_g0995: oracle-conditioned ego at γ=0.995 (oracle is incompatible with human_replay eval, kept for completeness). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

daphne-cornelisse and others added 30 commits November 11, 2025 13:35

Goal behavior fixes (Emerge-Lab#124)

a6ee4a4

* Make sure we can overwrite goal_behavior from python side and other minor improvements. * Fix stop goal behavior bug. * Make goal radius configurable for WOSAC eval. * Reset to defaults + cleanup. * Minor * Minor * Incorprate feedback.

Update drive.h

c75b549

Accel is being cut in half for no reason

Add mode to only control the self-driving car (SDC) (Emerge-Lab#130)

d130cad

* Add control mode. * Fix error message.

Merge pull request Emerge-Lab#129 from Emerge-Lab/eugenevinitsky-patch-1

c26b245

Remove halved accel

Fix incorrect obs dim in draw_agent_obs (Emerge-Lab#109)

fecbb2d

* Fix incorrect obs dim in draw_agent_obs * Update drive.h --------- Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

Replace default ent_coef and learning_rate hparams (Emerge-Lab#134)

97dcb3d

* Replace default learning rate and ent_coef. * Minor * Round.

Add new weights binary with joint action space. (Emerge-Lab#136)

040d39d

Fix missing arg (Emerge-Lab#141)

c697f17

Multi map render support to wandb (Emerge-Lab#143)

f8021df

Co-authored-by: Pragnay Mandavilli <pm3881@gr052.hpc.nyu.edu>

Add mode for controlled experiments (Emerge-Lab#144)

87033d0

* Add option for targeted experiments. * Rename for clarity. * Minor * Remove tag * Add to help message and make deepcopy of args to prevent state pollution.

Fix broken link

8690940

Data processing script that works decent. (Emerge-Lab#150)

9d6a311

Pass map_dir to the env via .ini and enable evaluation on a diffe…

99060ba

…rent dataset (Emerge-Lab#151) * Support train/test split with datasets. * Switch defaults. * Minor. * Typo. * More robust way of parsing the path.

Add sprites in headless rendering (Emerge-Lab#152)

6eaea31

* Load the sprites inside eval-gif() * Color consistency. * pedestrians and cyclists 3d models * Minor. --------- Co-authored-by: Spencer Cheng <spenccheng@gmail.com>

Faster file processing (Emerge-Lab#153)

a6af21c

* multiprocessing and progbar * cleanup

Add link to small clean eval dataset

6ce4879

Fix link typo

0eab9bd

Gif for readme (Emerge-Lab#155)

225ef99

* Test * Edit. * Edit.

Fix link?

f44573e

Fix vertical spaces.

b11d5e1

Update README.md

9c8b017

WIP changes (Emerge-Lab#156)

95ceedd

Co-authored-by: Daphne Cornelisse <cor.daphne@gmail.com>

Releas note

9d249b9

mohitmk01 and others added 30 commits April 25, 2026 20:57

render fix

81a3207

Merge branch 'mohit/cleanup-apr25' of github.com:Emerge-Lab/Adaptive_…

3df8ea3

…Driving_Agent into mohit/cleanup-apr25

commentsed and render

0a8a580

removed comments

621ed4f

nothing

0f36823

Merge branch 'mohit/cleanup-apr25' of github.com:Emerge-Lab/Adaptive_…

22070dd

…Driving_Agent into mohit/cleanup-apr25

fixed evaluations such that we now run num-rollouts and also do goal …

1db628d

…behavior stop

fixes

01a3d36

precommit

b2ecee2

latest 2nd may

f3d04e4

did i finally fix things? mask still there

5d9bbb4

cleanup debug

d3b59a3

removed some comments

4dd8a01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mohit/cleanup apr25#36

Mohit/cleanup apr25#36
m2kulkarni wants to merge 159 commits into
mainfrom
mohit/cleanup-apr25

m2kulkarni commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

m2kulkarni commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants