Stagger map resampling across workers by eugenevinitsky · Pull Request #373 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-03-28T19:50:42Z

Summary

All workers started with tick=0 and hit resample_frequency at the same step, causing every worker to reset maps simultaneously
This produced a large correlated shift in the environment distribution within a single PPO update, destabilizing training and causing spiky losses
Fix: on first reset(), randomize each worker's initial tick to [0, resample_frequency) using a seeded RNG
The first episode per worker is shorter than usual, but workers stay desynchronized for the rest of training

Test plan

Verify losses are smoother (no periodic spikes at resample boundaries)
Verify different workers resample at different global steps

All workers started with tick=0 and hit resample_frequency at the same step, causing every worker to reset maps simultaneously. This produced a large correlated shift in the environment distribution within a single PPO update, destabilizing training and causing spiky losses. Fix: on first reset, randomize each worker's initial tick to a value in [0, resample_frequency). The first episode per worker will be shorter than usual, but from then on workers resample at different times, spreading the environment distribution shift across updates.

Copilot

Pull request overview

This PR aims to prevent all Drive workers from resampling maps on the same global step by staggering each worker’s initial tick, reducing correlated environment shifts that can destabilize PPO training.

Changes:

Initialize a per-environment RNG (np.random.default_rng(seed)) to drive deterministic staggering.
On reset(), set tick to a random offset in [0, resample_frequency) (or 0 if resampling is disabled).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-28T19:54:40Z

pufferlib/ocean/drive/drive.py

+        if self.resample_frequency > 0:
+            self.tick = int(self._rng.integers(self.resample_frequency))
+        else:
+            self.tick = 0


This change introduces new behavior (staggering map resampling via randomized initial tick) that is central to training stability, but there’s no automated test asserting workers/env instances start with different ticks when resample_frequency > 0. Adding a small pytest regression test (e.g., create multiple Drive instances with different seeds and assert their initial tick values aren’t all equal, and/or that only the first reset staggers if that’s the intended behavior) would help prevent accidental re-synchronization regressions.

Copilot · 2026-03-28T19:54:40Z

pufferlib/ocean/drive/drive.py

+        # Stagger initial tick so workers don't all resample maps at the same step.
+        # The first episode will be shorter than resample_frequency, but this
+        # desynchronizes resets across workers for the rest of training.
+        if self.resample_frequency > 0:
+            self.tick = int(self._rng.integers(self.resample_frequency))


reset() now randomizes self.tick on every reset, but the PR description says to stagger only on the first reset per worker. If reset() is called at every episode boundary, this will repeatedly shorten the post-reset interval until the next resample (and may change training dynamics beyond the intended one-time desync). Consider gating this with a flag (e.g., only stagger once, then reset tick to 0 on subsequent resets).

Copilot · 2026-03-28T19:54:41Z

pufferlib/ocean/drive/drive.py

        self.resample_frequency = resample_frequency
+        self._rng = np.random.default_rng(seed)
        self.dynamics_model = dynamics_model


self._rng is seeded from the constructor argument seed, but reset(seed=...) also accepts a seed and is used by PufferLib to control per-run determinism. Because self.tick staggering draws from self._rng, changing the reset() seed will not affect the stagger offset, which can be surprising for reproducibility. Consider deriving the stagger RNG from the reset() seed (or re-seeding _rng on reset) so reset(seed=...) fully controls stochastic reset behavior.

Copilot AI review requested due to automatic review settings March 28, 2026 19:50

Copilot started reviewing on behalf of eugenevinitsky March 28, 2026 19:51 View session

Copilot AI reviewed Mar 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stagger map resampling across workers#373

Stagger map resampling across workers#373
eugenevinitsky wants to merge 1 commit into3.0from
ev/stagger-resample

eugenevinitsky commented Mar 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Copilot AI Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented Mar 28, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants