Skip to content

minGPT example does not verify the GPU count across all nodes #1395

@Michaelvll

Description

@Michaelvll

if not verify_min_gpu_count(min_gpus=_min_gpu_count):

The above line check the GPU count for the current process, which makes 2 node with 1 GPU each node fail to run, i.e. the slurm launcher script to fail:

https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/sbatch_run.sh#L19

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions