RuntimeError: 尝试第二次通过图进行反向传播，但缓冲区已经被释放。请指定 retain_graph=True

我是一名学生，同时也是Python和PyTorch的初学者。我有一个非常基础的神经网络，在使用时遇到了上述的运行时错误。以下是重现错误的代码:

import torch from torch import nnfrom torch import optimimport torch.nn.functional as Fimport matplotlib.pyplot as plt# Ensure Reproducibilitytorch.manual_seed(0)# Data Generationx = torch.randn((100,1), requires_grad = True)y = 1 + 2 * x + 0.3 * torch.randn(100,1)# Shuffles the indicesidx = np.arange(100)np.random.shuffle(idx)# Uses first 80 random indices for traintrain_idx = idx[:70]# Uses the remaining indices for validationval_idx = idx[70:]# Generates train and validation setsx_train, y_train = x[train_idx], y[train_idx]x_val, y_val = x[val_idx], y[val_idx]class OurFirstNeuralNetwork(nn.Module):    def __init__(self):        super(OurFirstNeuralNetwork, self).__init__()        # Here we "define" our Neural Network Architecture        self.fc1 = nn.Linear(1, 5)        self.non_linearity_fc1 = nn.ReLU()        self.fc2 = nn.Linear(5,1)        #self.non_linearity_fc2 = nn.ReLU()    def forward(self, x):        # The forward pass        # Here we define how activations "flow" between neurons. We've already discussed the "Sum" and "Transformation" steps of the forward pass.        sum_fc1 = self.fc1(x)        transformation_fc1 = self.non_linearity_fc1(sum_fc1)        sum_fc2 = self.fc2(transformation_fc1)        #transformation_fc2 = self.non_linearity_fc2(sum_fc2)        # The transformation_fc2 is also the output of our model which symbolises the end of our forward pass.         return sum_fc2# Instantiate the model and trainmodel = OurFirstNeuralNetwork()print(model)print(model.state_dict())n_epochs = 1000loss_fn = nn.MSELoss(reduction='mean')optimizer = optim.Adam(model.parameters())for epoch in range(n_epochs):    model.train()    optimizer.zero_grad()    prediction = model(x_train)    loss = loss_fn(y_train, prediction)    print(epoch, loss)    loss.backward(retain_graph=True)        optimizer.step()print(model.state_dict())

一切都很基础且标准，并且运行正常。

然而，当我去掉 “retain_graph=True” 参数时，它会抛出运行时错误。通过阅读各种论坛，我了解到这是因为在第一次迭代后图被丢弃了，但我看到许多教程和博客中 loss.backward() 是推荐的方法，尤其是它能节省内存。但我无法概念上理解为什么对我不起作用。

任何帮助都将不胜感激，如果我的提问方式不符合预期格式，我表示歉意。我愿意接受反馈，并会补充更多细节或重新表述问题，以便大家更容易理解。提前感谢！

回答：

您需要在 optimizer.step() 之后添加 optimizer.zero_grad() 来清零梯度。

为什么需要这样做？

当您执行 loss.backward() 时，PyTorch会计算参数的梯度并更新参数的 .grad 属性。当您执行 optimizer.step() 时，参数会使用 .grad 属性进行更新，即 `parameter = parameter – lr*parameter.grad`。

由于您没有清除梯度并第二次调用 backward，它将计算 dl/d(updated param)，这将需要通过第一次传递的 parameter.grad 进行反向传播。在执行反向传播时，这些梯度的计算图不会被存储，因此您必须传递 retain_graph=True 来消除错误。然而，我们不想这样做来更新参数。相反，我们希望清除梯度，并重新开始一个新的计算图，因此，您需要使用 .zero_grad 调用来清零梯度。

另见为什么我们在PyTorch中需要调用zero_grad()?

学技术

RuntimeError: 尝试第二次通过图进行反向传播，但缓冲区已经被释放。请指定 retain_graph=True

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复