0%

单机多GPU训练学习

发表于 2021-05-12

单机多卡情况下训练学习

一。DataParallel

代码量少，但效果较差

os.environ[“CUDA_VISIBLE_DEVICES”] = '0,1,2'

USE_CUDA = torch.cuda.is_available()
if torch.cuda.device_count() > 1:
	print("Let's use", torch.cuda.device_count(), "GPUs!")
	# dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
device = torch.device("cuda:0" if USE_CUDA else "cpu")

model = torchvision.models.vgg19_bn(pretrained = True)

model = torch.nn.DataParallel(model, device_ids=[0,1,2])
model.to(device)

提示：需要注意的是 device_ids 中包含的卡数要和 os.environ[“CUDA_VISIBLE_DEVICES”] 中包含的卡数相等。这里的卡数是指数量，无需具体卡号一一对应，此外，batch_size 的数量需要大于 GPU 的数量。

剩下的工作就是将之前单卡运行时的所有 **.cuda() ** 替换为 **.to(device) **即可。

model（如：model.to(device)）

input（如：input = input).to(device)

target（如：target = torch.from_numpy(np.array(target)).long()).to(device)

nn.CrossEntropyLoss()（如：criterion = nn.CrossEntropyLoss().to(device)）

参考：Pytorch官方DataParallel()教程

二。DistributedDataParallel

参考：Pytorch官方DistributedDataParallel()教程