我是一名学生,同时也是Python和PyTorch的初学者。我有一个非常基础的神经网络,在使用时遇到了上述的运行时错误。以下是重现错误的代码:
import torch from torch import nnfrom torch import optimimport torch.nn.functional as Fimport matplotlib.pyplot as plt# Ensure Reproducibilitytorch.manual_seed(0)# Data Generationx = torch.randn((100,1), requires_grad = True)y = 1 + 2 * x + 0.3 * torch.randn(100,1)# Shuffles the indicesidx = np.arange(100)np.random.shuffle(idx)# Uses first 80 random indices for traintrain_idx = idx[:70]# Uses the remaining indices for validationval_idx = idx[70:]# Generates train and validation setsx_train, y_train = x[train_idx], y[train_idx]x_val, y_val = x[val_idx], y[val_idx]class OurFirstNeuralNetwork(nn.Module): def __init__(self): super(OurFirstNeuralNetwork, self).__init__() # Here we "define" our Neural Network Architecture self.fc1 = nn.Linear(1, 5) self.non_linearity_fc1 = nn.ReLU() self.fc2 = nn.Linear(5,1) #self.non_linearity_fc2 = nn.ReLU() def forward(self, x): # The forward pass # Here we define how activations "flow" between neurons. We've already discussed the "Sum" and "Transformation" steps of the forward pass. sum_fc1 = self.fc1(x) transformation_fc1 = self.non_linearity_fc1(sum_fc1) sum_fc2 = self.fc2(transformation_fc1) #transformation_fc2 = self.non_linearity_fc2(sum_fc2) # The transformation_fc2 is also the output of our model which symbolises the end of our forward pass. return sum_fc2# Instantiate the model and trainmodel = OurFirstNeuralNetwork()print(model)print(model.state_dict())n_epochs = 1000loss_fn = nn.MSELoss(reduction='mean')optimizer = optim.Adam(model.parameters())for epoch in range(n_epochs): model.train() optimizer.zero_grad() prediction = model(x_train) loss = loss_fn(y_train, prediction) print(epoch, loss) loss.backward(retain_graph=True) optimizer.step()print(model.state_dict())
一切都很基础且标准,并且运行正常。
然而,当我去掉 “retain_graph=True” 参数时,它会抛出运行时错误。通过阅读各种论坛,我了解到这是因为在第一次迭代后图被丢弃了,但我看到许多教程和博客中 loss.backward()
是推荐的方法,尤其是它能节省内存。但我无法概念上理解为什么对我不起作用。
任何帮助都将不胜感激,如果我的提问方式不符合预期格式,我表示歉意。我愿意接受反馈,并会补充更多细节或重新表述问题,以便大家更容易理解。提前感谢!
回答:
您需要在 optimizer.step()
之后添加 optimizer.zero_grad()
来清零梯度。
为什么需要这样做?
当您执行 loss.backward()
时,PyTorch会计算参数的梯度并更新参数的 .grad
属性。当您执行 optimizer.step()
时,参数会使用 .grad
属性进行更新,即 `parameter = parameter – lr*parameter.grad`。
由于您没有清除梯度并第二次调用 backward
,它将计算 dl/d(updated param)
,这将需要通过第一次传递的 parameter.grad
进行反向传播。在执行反向传播时,这些梯度的计算图不会被存储,因此您必须传递 retain_graph=True
来消除错误。然而,我们不想这样做来更新参数。相反,我们希望清除梯度,并重新开始一个新的计算图,因此,您需要使用 .zero_grad
调用来清零梯度。