使用PyTorch进行分类比TensorFlow慢得多:42分钟对比11分钟

我一直是TensorFlow的用户,最近开始使用PyTorch。作为尝试,我使用这两个库实现了简单的分类任务。
然而,PyTorch比TensorFlow慢得多:PyTorch需要42分钟,而TensorFlow只需11分钟。我参考了PyTorch官方教程,并对其进行了少量修改。

有谁能分享一些关于这个问题的建议吗?

以下是我尝试过的总结。

环境:Colab Pro+
数据集:Cifar10
分类器:VGG16
优化器:Adam
损失函数:交叉熵
批次大小:32

PyTorch
代码:

import torch, torchvisionfrom torch import nnfrom torchvision import transforms, modelsfrom tqdm import tqdmimport time, copytrans = transforms.Compose([transforms.Resize((224, 224)),                            transforms.ToTensor(),])data = {phase: torchvision.datasets.CIFAR10('./', train = (phase=='train'),  transform=trans, download=True) for phase in ['train', 'test']}dataloaders = {phase: torch.utils.data.DataLoader(data[phase], batch_size=32, shuffle=True) for phase in ['train', 'test']}def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=5):    since = time.time()    best_model_wts = copy.deepcopy(model.state_dict())    best_acc = 0.0    for epoch in range(num_epochs):        print('Epoch {}/{}'.format(epoch, num_epochs - 1))        print('-' * 10)        # Each epoch has a training and validation phase        for phase in ['train', 'test']:            if phase == 'train':                model.train()  # Set model to training mode            else:                model.eval()   # Set model to evaluate mode            running_loss = 0.0            running_corrects = 0            # Iterate over data.            for inputs, labels in tqdm(iter(dataloaders[phase])):                inputs = inputs.to(device)                labels = labels.to(device)                # zero the parameter gradients                optimizer.zero_grad()                # forward                # track history if only in train                with torch.set_grad_enabled(phase == 'train'):                    outputs = model(inputs)                    _, preds = torch.max(outputs, 1)                    loss = criterion(outputs, labels)                    # backward + optimize only if in training phase                    if phase == 'train':                        loss.backward()                        optimizer.step()                # statistics                running_loss += loss.item() * inputs.size(0)                running_corrects += torch.sum(preds == labels.data)            epoch_loss = running_loss / len(dataloaders[phase])            epoch_acc = running_corrects.double() / len(dataloaders[phase])            print('{} Loss: {:.4f} Acc: {:.4f}'.format(                phase, epoch_loss, epoch_acc))            # deep copy the model            if phase == 'test' and epoch_acc > best_acc:                best_acc = epoch_acc                best_model_wts = copy.deepcopy(model.state_dict())        print()    time_elapsed = time.time() - since    print('Training complete in {:.0f}m {:.0f}s'.format(        time_elapsed // 60, time_elapsed % 60))    print('Best val Acc: {:4f}'.format(best_acc))    # load best model weights    model.load_state_dict(best_model_wts)    return modeldevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")model = models.vgg16(pretrained=False)model = model.to(device)model = train_model(model=model,                    criterion=nn.CrossEntropyLoss(),                     optimizer=torch.optim.Adam(model.parameters(), lr=0.001),                    dataloaders=dataloaders,                    device=device,                    )

结果:

Epoch 0/4----------  0%|          | 0/1563 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)100%|██████████| 1563/1563 [07:50<00:00,  3.32it/s]train Loss: 75.5199 Acc: 3.2809100%|██████████| 313/313 [00:38<00:00,  8.11it/s]test Loss: 73.7274 Acc: 3.1949Epoch 1/4----------100%|██████████| 1563/1563 [07:50<00:00,  3.33it/s]train Loss: 73.8162 Acc: 3.2514100%|██████████| 313/313 [00:38<00:00,  8.13it/s]test Loss: 73.6114 Acc: 3.1949Epoch 2/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7741 Acc: 3.1369100%|██████████| 313/313 [00:38<00:00,  8.11it/s]test Loss: 73.5873 Acc: 3.1949Epoch 3/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7493 Acc: 3.1331100%|██████████| 313/313 [00:38<00:00,  8.12it/s]test Loss: 73.6191 Acc: 3.1949Epoch 4/4----------100%|██████████| 1563/1563 [07:49<00:00,  3.33it/s]train Loss: 73.7289 Acc: 3.1939100%|██████████| 313/313 [00:38<00:00,  8.13it/s]test Loss: 73.5955 Acc: 3.1949Training complete in 42m 22sBest val Acc: 3.194888

TensorFlow
代码:

import tensorflow_datasets as tfdsfrom tensorflow.keras import applications, modelsimport tensorflow as tfimport timeds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip):    image = ip['image']    label = ip['label']    image = tf.image.resize(image, (224, 224))    image = tf.expand_dims(image,0)    label = tf.one_hot(label,10)    label = tf.expand_dims(label,0)    return (image, label)ds_train_ = ds_train.map(resize)ds_test_ = ds_test.map(resize)model = applications.vgg16.VGG16(input_shape = (224, 224, 3), weights=None, classes=10)model.compile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])batch_size = 32since = time.time()history = model.fit(ds_train_,                    batch_size = batch_size,                    steps_per_epoch = len(ds_train)//batch_size,                    epochs = 5,                    validation_steps = len(ds_test),                    validation_data = ds_test_,                    shuffle = True,)time_elapsed = time.time() - sinceprint('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60 ))

结果:

Epoch 1/51562/1562 [==============================] - 125s 69ms/step - loss: 36.9022 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 2/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3031 - accuracy: 0.1005 - val_loss: 2.3033 - val_accuracy: 0.1000Epoch 3/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3035 - accuracy: 0.1069 - val_loss: 2.3031 - val_accuracy: 0.1000Epoch 4/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3038 - accuracy: 0.1024 - val_loss: 2.3030 - val_accuracy: 0.1000Epoch 5/51562/1562 [==============================] - 129s 83ms/step - loss: 2.3028 - accuracy: 0.1024 - val_loss: 2.3033 - val_accuracy: 0.1000Training complete in 11m 23s

回答:

这是因为在您的TensorFlow代码中,数据管道每次步骤向模型输入一个批次的单张图像,而不是一个批次的32张图像。

batch_size传入model.fit真正控制数据集形式的数据的批次大小。日志中显示的每个epoch的步骤看起来正确的原因是您将steps_per_epoch传入了model.fit

要正确设置批次大小:

ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'])def resize(ip):    image = ip['image']    label = ip['label']    image = tf.image.resize(image, (224, 224))    label = tf.one_hot(label,10)    return (image, label)train_size=len(ds_train)test_size=len(ds_test)ds_train_ = ds_train.shuffle(train_size).batch(32).map(resize)ds_test_ = ds_test.shuffle(test_size).batch(32).map(resize)

model.fit调用:

history = model.fit(ds_train_,                    epochs = 1,                    validation_data = ds_test_)

解决这个问题后,TensorFlow的速度表现与PyTorch相似。在我的机器上,PyTorch每个epoch需要大约27分钟,而TensorFlow每个epoch需要大约24分钟。

根据NVIDIA的基准测试,在大多数使用实际数据集和问题规模的流行深度学习应用中,PyTorch和TensorFlow的速度表现相似。(参考:https://developer.nvidia.com/deep-learning-performance-training-inference

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注