使用PyTorch进行分类比TensorFlow慢得多：42分钟对比11分钟

我一直是TensorFlow的用户，最近开始使用PyTorch。作为尝试，我使用这两个库实现了简单的分类任务。
然而，PyTorch比TensorFlow慢得多：PyTorch需要42分钟，而TensorFlow只需11分钟。我参考了PyTorch官方教程，并对其进行了少量修改。

有谁能分享一些关于这个问题的建议吗？

以下是我尝试过的总结。

环境：Colab Pro+
数据集：Cifar10
分类器：VGG16
优化器：Adam
损失函数：交叉熵
批次大小：32

PyTorch
代码：

import torch, torchvisionfrom torch import nnfrom torchvision import transforms, modelsfrom tqdm import tqdmimport time, copytrans = transforms.Compose([transforms.Resize((224, 224)),                            transforms.ToTensor(),])data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'),  transform=trans, download=True) for phase in ['train', 'test']}dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):    since = time.time()    best_model_wts = copy.deepcopy(model.state_dict())    best_acc = 0.0    for epoch in range(num_epochs):        print('Epoch {}/{}'.format(epoch, num_epochs - 1))        print('-' * 10)        # Each epoch has a training and validation phase        for phase in ['train', 'test']:            if phase == 'train':                model.train()  # Set model to training mode            else:                model.eval()   # Set model to evaluate mode            running_loss = 0.0            running_corrects = 0            # Iterate over data.            for inputs, labels in tqdm(iter(dataloaders[phase])):                inputs = inputs.to(device)                labels = labels.to(device)                # zero the parameter gradients                optimizer.zero_grad()                # forward                # track history if only in train                with torch.set_grad_enabled(phase == 'train'):                    outputs = model(inputs)                    _, preds = torch.max(outputs, 1)                    loss = criterion(outputs, labels)                    # backward + optimize only if in training phase                    if phase == 'train':                        loss.backward()                        optimizer.step()                # statistics                running_loss += loss.item() * inputs.size(0)                running_corrects += torch.sum(preds == labels.data)            epoch_loss = running_loss / len(dataloaders[phase])            epoch_acc = running_corrects.double() / len(dataloaders[phase])            print('{} Loss: {:.4f} Acc: {:.4f}'.format(                phase, epoch_loss, epoch_acc))            # deep copy the model            if phase == 'test' and epoch_acc > best_acc:                best_acc = epoch_acc                best_model_wts = copy.deepcopy(model.state_dict())        print()    time_elapsed = time.time() - since    print('Training complete in {:.0f}m {:.0f}s'.format(        time_elapsed // 60, time_elapsed % 60))    print('Best val Acc: {:4f}'.format(best_acc))    # load best model weights    model.load_state_dict(best_model_wts)    return modeldevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")model = models.vgg16(pretrained=False)model = model.to(device)model = train_model(model=model,                    criterion=nn.CrossEntropyLoss(),                     optimizer=torch.optim.Adam(model.parameters(), lr=0.001),                    dataloaders=dataloaders,                    device=device,                    )

结果：

Epoch 0/4----------  0%|          | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)100%|██████████| 1563/1563 [07:50<00:00,  3.32it/s]train Loss: 75.5199 Acc: 3.2809100%|██████████| 313/313 [00:38<00:00,  8.11it/s]test Loss: 73.7274 Acc: 3.1949Epoch 1/4----------100%|██████████| 1563/1563 [07:50<00:00,  3.33it/s]train Loss: 73.8162 Acc: 3.2514100%|██████████| 313/313 [00:38<00:00,  8.13it/s]test Loss: 73.6114 Acc: 3.1949Epoch 2/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7741 Acc: 3.1369100%|██████████| 313/313 [00:38<00:00,  8.11it/s]test Loss: 73.5873 Acc: 3.1949Epoch 3/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7493 Acc: 3.1331100%|██████████| 313/313 [00:38<00:00,  8.12it/s]test Loss: 73.6191 Acc: 3.1949Epoch 4/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7289 Acc: 3.1939100%|██████████| 313/313 [00:38<00:00,  8.13it/s]test Loss: 73.5955 Acc: 3.1949Training complete in 42m 22sBest val Acc: 3.194888

TensorFlow
代码：

import tensorflow_datasets as tfdsfrom tensorflow.keras import applications, modelsimport tensorflow as tfimport timeds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip):    image = ip['image']    label = ip['label']    image = tf.image.resize(image, (224, 224))    image = tf.expand_dims(image,0)    label = tf.one_hot(label,10)    label = tf.expand_dims(label,0)    return (image, label)ds_train_ = ds_train.map(resize)ds_test_ = ds_test.map(resize)model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])batch_size = 32since = time.time()history = model.fit(ds_train_,                    batch_size = batch_size,                    steps_per_epoch = len(ds_train)//batch_size,                    epochs = 5,                    validation_steps = len(ds_test),                    validation_data = ds_test_,                    shuffle = True,)time_elapsed = time.time() - sinceprint('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))

结果：

Epoch 1/51562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 2/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000Epoch 3/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 4/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000Epoch 5/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000Training complete in 11m 23s

回答：

这是因为在您的TensorFlow代码中，数据管道每次步骤向模型输入一个批次的单张图像，而不是一个批次的32张图像。

将batch_size传入model.fit并不真正控制数据集形式的数据的批次大小。日志中显示的每个epoch的步骤看起来正确的原因是您将steps_per_epoch传入了model.fit。

要正确设置批次大小：

ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip):    image = ip['image']    label = ip['label']    image = tf.image.resize(image, (224, 224))    label = tf.one_hot(label,10)    return (image, label)train_size=len(ds_train)test_size=len(ds_test)ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)

model.fit调用：

history = model.fit(ds_train_,                    epochs = 1,                    validation_data = ds_test_)

解决这个问题后，TensorFlow的速度表现与PyTorch相似。在我的机器上，PyTorch每个epoch需要大约27分钟，而TensorFlow每个epoch需要大约24分钟。

根据NVIDIA的基准测试，在大多数使用实际数据集和问题规模的流行深度学习应用中，PyTorch和TensorFlow的速度表现相似。（参考：https://developer.nvidia.com/deep-learning-performance-training-inference）

学技术

使用PyTorch进行分类比TensorFlow慢得多：42分钟对比11分钟

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复