我在Udacity上处理Fashion MNIST数据集问题时,发现我的代码实现与Udacity团队共享的解决方案相比,损失差异很大。我认为我的答案中唯一不同的地方是神经网络的定义,除此之外一切都相同。我无法找出这种巨大损失差异的原因。
代码1:我的解决方案:
import torch.nn as nnfrom torch import optimimages, labels = next(iter(trainloader))model = nn.Sequential(nn.Linear(784,256), nn.ReLU(), nn.Linear(256,128), nn.ReLU(), nn.Linear(128,64), nn.ReLU(), nn.Linear(64,10), nn.LogSoftmax(dim=1))# Flatten imagesoptimizer = optim.Adam(model.parameters(),lr=0.003)criterion = nn.NLLLoss()for i in range(10): running_loss = 0 for images,labels in trainloader: images = images.view(images.shape[0], -1) output = model.forward(images) loss = criterion(output,labels) loss.backward() optimizer.step() running_loss += loss.item() else: print(f"Training loss: {running_loss}")# Loss is coming around 4000
代码2:官方解决方案:
from torch import nn, optimimport torch.nn.functional as Fclass Classifier(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 256) self.fc2 = nn.Linear(256, 128) self.fc3 = nn.Linear(128, 64) self.fc4 = nn.Linear(64, 10) def forward(self, x): # make sure input tensor is flattened x = x.view(x.shape[0], -1) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = F.log_softmax(self.fc4(x), dim=1) return xmodel = Classifier()criterion = nn.NLLLoss()optimizer = optim.Adam(model.parameters(), lr=0.003)epochs = 5for e in range(epochs): running_loss = 0 for images, labels in trainloader: log_ps = model(images) loss = criterion(log_ps, labels) optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() else: print(f"Training loss: {running_loss}")# Loss is coming around 200
对于这种巨大的损失差异有解释吗?
回答:
你忘记在你的实现中清零/清除梯度了。也就是说,你缺少了:
optimizer.zero_grad()
换句话说,只需这样做:
for i in range(10): running_loss = 0 for images,labels in trainloader: images = images.view(images.shape[0], -1) output = model.forward(images) loss = criterion(output,labels) # 漏掉了这行! optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() else: print(f"Training loss: {running_loss}")
你就没问题了!