Is current megablocks compatible with distributed optimizer in Megatron-LM?

Hi there, thanks for the amazing work! I found expert parallel is not compatible with the distributed optimizer in the fork version of Megatron-LM here:

https://github.com/stanford-futuredata/Megatron-LM/blob/85f95aef3b648075fe6f291c86714fdcbd9cd1f5/megatron/arguments.py#L352-L356

But there's no such validation in the open PR to Megatron-LM: https://github.com/NVIDIA/Megatron-LM/pull/288

Does that mean the assertion is redundant and the current version of megablocks is compatible with the distributed optimizer under expert parallelism?

Thanks very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is current megablocks compatible with distributed optimizer in Megatron-LM? #160

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is current megablocks compatible with distributed optimizer in Megatron-LM? #160

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions