我一直是TensorFlow的用户,最近开始使用PyTorch。作为尝试,我使用这两个库实现了简单的分类任务。
然而,PyTorch比TensorFlow慢得多:PyTorch需要42分钟,而TensorFlow只需11分钟。我参考了PyTorch官方教程,并对其进行了少量修改。
有谁能分享一些关于这个问题的建议吗?
以下是我尝试过的总结。
环境:Colab Pro+
数据集:Cifar10
分类器:VGG16
优化器:Adam
损失函数:交叉熵
批次大小:32
PyTorch
代码:
import torch, torchvisionfrom torch import nnfrom torchvision import transforms, modelsfrom tqdm import tqdmimport time, copytrans = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor(),])data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'), transform=trans, download=True) for phase in ['train', 'test']}dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5): since = time.time() best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 for epoch in range(num_epochs): print('Epoch {}/{}'.format(epoch, num_epochs - 1)) print('-' * 10) # Each epoch has a training and validation phase for phase in ['train', 'test']: if phase == 'train': model.train() # Set model to training mode else: model.eval() # Set model to evaluate mode running_loss = 0.0 running_corrects = 0 # Iterate over data. for inputs, labels in tqdm(iter(dataloaders[phase])): inputs = inputs.to(device) labels = labels.to(device) # zero the parameter gradients optimizer.zero_grad() # forward # track history if only in train with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) _, preds = torch.max(outputs, 1) loss = criterion(outputs, labels) # backward + optimize only if in training phase if phase == 'train': loss.backward() optimizer.step() # statistics running_loss += loss.item() * inputs.size(0) running_corrects += torch.sum(preds == labels.data) epoch_loss = running_loss / len(dataloaders[phase]) epoch_acc = running_corrects.double() / len(dataloaders[phase]) print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc)) # deep copy the model if phase == 'test' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) print() time_elapsed = time.time() - since print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60)) print('Best val Acc: {:4f}'.format(best_acc)) # load best model weights model.load_state_dict(best_model_wts) return modeldevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")model = models.vgg16(pretrained=False)model = model.to(device)model = train_model(model=model, criterion=nn.CrossEntropyLoss(), optimizer=torch.optim.Adam(model.parameters(), lr=0.001), dataloaders=dataloaders, device=device, )
结果:
Epoch 0/4---------- 0%| | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)100%|██████████| 1563/1563 [07:50<00:00, 3.32it/s]train Loss: 75.5199 Acc: 3.2809100%|██████████| 313/313 [00:38<00:00, 8.11it/s]test Loss: 73.7274 Acc: 3.1949Epoch 1/4----------100%|██████████| 1563/1563 [07:50<00:00, 3.33it/s]train Loss: 73.8162 Acc: 3.2514100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.6114 Acc: 3.1949Epoch 2/4----------100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]train Loss: 73.7741 Acc: 3.1369100%|██████████| 313/313 [00:38<00:00, 8.11it/s]test Loss: 73.5873 Acc: 3.1949Epoch 3/4----------100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]train Loss: 73.7493 Acc: 3.1331100%|██████████| 313/313 [00:38<00:00, 8.12it/s]test Loss: 73.6191 Acc: 3.1949Epoch 4/4----------100%|██████████| 1563/1563 [07:49<00:00, 3.33it/s]train Loss: 73.7289 Acc: 3.1939100%|██████████| 313/313 [00:38<00:00, 8.13it/s]test Loss: 73.5955 Acc: 3.1949Training complete in 42m 22sBest val Acc: 3.194888
TensorFlow
代码:
import tensorflow_datasets as tfdsfrom tensorflow.keras import applications, modelsimport tensorflow as tfimport timeds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip): image = ip['image'] label = ip['label'] image = tf.image.resize(image, (224, 224)) image = tf.expand_dims(image,0) label = tf.one_hot(label,10) label = tf.expand_dims(label,0) return (image, label)ds_train_ = ds_train.map(resize)ds_test_ = ds_test.map(resize)model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])batch_size = 32since = time.time()history = model.fit(ds_train_, batch_size = batch_size, steps_per_epoch = len(ds_train)//batch_size, epochs = 5, validation_steps = len(ds_test), validation_data = ds_test_, shuffle = True,)time_elapsed = time.time() - sinceprint('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))
结果:
Epoch 1/51562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 2/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000Epoch 3/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 4/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000Epoch 5/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000Training complete in 11m 23s
回答:
这是因为在您的TensorFlow代码中,数据管道每次步骤向模型输入一个批次的单张图像,而不是一个批次的32张图像。
将batch_size
传入model.fit
并不真正控制数据集形式的数据的批次大小。日志中显示的每个epoch的步骤看起来正确的原因是您将steps_per_epoch
传入了model.fit
。
要正确设置批次大小:
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip): image = ip['image'] label = ip['label'] image = tf.image.resize(image, (224, 224)) label = tf.one_hot(label,10) return (image, label)train_size=len(ds_train)test_size=len(ds_test)ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)
model.fit
调用:
history = model.fit(ds_train_, epochs = 1, validation_data = ds_test_)
解决这个问题后,TensorFlow的速度表现与PyTorch相似。在我的机器上,PyTorch每个epoch需要大约27分钟,而TensorFlow每个epoch需要大约24分钟。
根据NVIDIA的基准测试,在大多数使用实际数据集和问题规模的流行深度学习应用中,PyTorch和TensorFlow的速度表现相似。(参考:https://developer.nvidia.com/deep-learning-performance-training-inference)