Refactor/torch autocast encapsulate global state by nathon-lee · Pull Request #7946 · deepspeedai/DeepSpeed

nathon-lee · 2026-04-02T06:06:20Z

refactor: replace bare global vars in torch_autocast with _AutocastState

TORCH_AUTOCAST_INITIALIZED and TORCH_AUTOCAST_DTYPE were module-level
globals mutated via global statements inside init_autocast_params().
This pattern is fragile: it is invisible to type checkers, prevents
isolation between multiple engine instances, and makes the state harder
to reset in tests.

Replace them with a private _AutocastState dataclass instance
_autocast_state. The public API (is_autocast_initialized,
get_autocast_dtype) is unchanged, so no call sites are affected.

fix: store autocast state per-engine to support multiple engine configs

Previously, _autocast_state was a module-level singleton in
torch_autocast.py. When a second DeepSpeed engine called
init_autocast_params(), it would overwrite the first engine's dtype
and initialized state, making it impossible to run two engines with
different autocast configurations concurrently.

Fix by attaching _AutocastState directly to the engine instance
(engine._autocast_state). Update is_autocast_initialized() and
get_autocast_dtype() to accept an engine argument. For ZeRO
optimizers (which hold no engine reference), switch from the global
state query to the per-parameter has_comm_dtype() check; parameters
are already stamped by their own engine inside init_autocast_params(),
so isolation is automatic.

This reverts commit ff88670. Co-authored-by: nathon-lee <248585198+nathon-lee@users.noreply.github.com>

Revert "fix: update 1 file reformatted." (ff88670)

This reverts commit b90aee5.

Revert accidental Muon optimizer code re-introduction from copilot PRs

Signed-off-by: nathon-lee <leejianwoo@gmail.com>

tohtana

Hi @nathon-lee, thank you for opening this PR!
_autocast_state is still global and doesn't seem support different configs for multiple engines. Did I misunderstand something?

nathon-lee · 2026-04-03T13:31:49Z

**tohtana **

@tohtana thank you, good catch, — I still need to make one more change.

Signed-off-by: nathon-lee <leejianwoo@gmail.com>

Copilot AI and others added 14 commits February 27, 2026 06:30

Initial plan

001f77c

Revert "fix: update 1 file reformatted."

b90aee5

This reverts commit ff88670. Co-authored-by: nathon-lee <248585198+nathon-lee@users.noreply.github.com>

Merge pull request #5 from nathon-lee/copilot/git-revert-ff886701

b6da9af

Revert "fix: update 1 file reformatted." (ff88670)

Merge branch 'deepspeedai:master' into master

bb7f64f

Initial plan

cbc816c

Reapply "fix: update 1 file reformatted."

5fcc9a7

This reverts commit b90aee5.

Merge pull request #6 from nathon-lee/copilot/remove-commits-from-master

f7c5d75

Revert accidental Muon optimizer code re-introduction from copilot PRs

Merge branch 'deepspeedai:master' into master

18efbcc

Merge branch 'deepspeedai:master' into master

e2ac74d

Merge branch 'deepspeedai:master' into master

da07382

Merge branch 'deepspeedai:master' into master

5d8875c

Merge branch 'deepspeedai:master' into master

316b6dd

Merge branch 'deepspeedai:master' into master

2020543

refactor: replace bare global vars in torch_autocast with _AutocastState

c5d457d

Signed-off-by: nathon-lee <leejianwoo@gmail.com>

nathon-lee requested review from tjruwase and tohtana as code owners April 2, 2026 06:06

tohtana reviewed Apr 3, 2026

View reviewed changes

fix: store autocast state per-engine to support multiple engine configs

72cadeb

Signed-off-by: nathon-lee <leejianwoo@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/torch autocast encapsulate global state#7946

Refactor/torch autocast encapsulate global state#7946
nathon-lee wants to merge 15 commits into
deepspeedai:masterfrom
nathon-lee:refactor/torch-autocast-encapsulate-global-state

nathon-lee commented Apr 2, 2026 •

edited

Loading

Uh oh!

tohtana left a comment

Uh oh!

nathon-lee commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nathon-lee commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tohtana left a comment

Choose a reason for hiding this comment

Uh oh!

nathon-lee commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nathon-lee commented Apr 2, 2026 •

edited

Loading

nathon-lee commented Apr 3, 2026 •

edited

Loading