Hello,
Thanks for sharing the PYtorch code for embedding training.
If we look at thepytorch_xvectors/pytorch_run.sh,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1
train_xent.py exp/xvector_nnet_1a/egs/
If we look at the above line,it seems like you are training the DNN on using single GPU. Is it possible to train using multiple gpus?
Further if we look at the train_utils.py script,
def prepareModel(args):
elif args.trainingMode == 'init':
net.to(device)
net = torch.nn.parallel.DistributedDataParallel(net,
device_ids=[0],
output_device=0)
if torch.cuda.device_count() > 1:
print("Using ", torch.cuda.device_count(), "GPUs!")
net = nn.DataParallel(net)
Why we are using both torch.nn.parallel.DistributedDataParallel and net = nn.DataParallel(net) ?
When I tried to train, it's training using single GPU. How it needs to modified to train on multiple gpus?
I look forward to hearing from you.
Thanks.
K. Ahilan
Hello,
Thanks for sharing the PYtorch code for embedding training.
If we look at thepytorch_xvectors/pytorch_run.sh,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1
train_xent.py exp/xvector_nnet_1a/egs/
If we look at the above line,it seems like you are training the DNN on using single GPU. Is it possible to train using multiple gpus?
Further if we look at the train_utils.py script,
def prepareModel(args):
elif args.trainingMode == 'init':
net.to(device)
net = torch.nn.parallel.DistributedDataParallel(net,
device_ids=[0],
output_device=0)
if torch.cuda.device_count() > 1:
print("Using ", torch.cuda.device_count(), "GPUs!")
net = nn.DataParallel(net)
Why we are using both torch.nn.parallel.DistributedDataParallel and net = nn.DataParallel(net) ?
When I tried to train, it's training using single GPU. How it needs to modified to train on multiple gpus?
I look forward to hearing from you.
Thanks.
K. Ahilan