我想处理Fashion_MNIST数据,我希望看到第一层和第二层之间的均方和的输出梯度
我的代码如下
#import the nescessary libsimport numpy as npimport torchimport time# Loading the Fashion-MNIST datasetfrom torchvision import datasets, transforms# Get GPU Devicedevice = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")torch.cuda.get_device_name(0)# Define a transform to normalize the datatransform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) ])# Download and load the training datatrainset = datasets.FashionMNIST('MNIST_data/', download = True, train = True, transform = transform)testset = datasets.FashionMNIST('MNIST_data/', download = True, train = False, transform = transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size = 128, shuffle = True, num_workers=4)testloader = torch.utils.data.DataLoader(testset, batch_size = 128, shuffle = True, num_workers=4)# Examine a sampledataiter = iter(trainloader)images, labels = dataiter.next()# Define the network architecturefrom torch import nn, optimimport torch.nn.functional as Fmodel = nn.Sequential(nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10), nn.LogSoftmax(dim = 1) )model.to(device)# Define the losscriterion = nn.MSELoss()# Define the optimizeroptimizer = optim.Adam(model.parameters(), lr = 0.001)# Define the epochsepochs = 5train_losses, test_losses = [], []squared_sum = []# start = time.time()for e in range(epochs): running_loss = 0 for images, labels in trainloader: # Flatten Fashion-MNIST images into a 784 long vector images = images.to(device) labels = labels.to(device) images = images.view(images.shape[0], -1) optimizer.zero_grad() output = model[0].forward(images) loss = criterion(output[0], labels.float()) loss.backward() optimizer.step() running_loss += loss.item() else: print(running_loss) test_loss = 0 accuracy = 0 # Turn off gradients for validation, saves memory and computation with torch.no_grad(): # Set the model to evaluation mode model.eval() # Validation pass for images, labels in testloader: images = images.to(device) labels = labels.to(device) images = images.view(images.shape[0], -1) ps = model(images[0]) test_loss += criterion(ps, labels) top_p, top_class = ps.topk(1, dim = 1) equals = top_class == labels.view(*top_class.shape) accuracy += torch.mean(equals.type(torch.FloatTensor)) model.train() print("Epoch: {}/{}..".format(e+1, epochs), "Training loss: {:.3f}..".format(running_loss/len(trainloader)), "Test loss: {:.3f}..".format(test_loss/len(testloader)), "Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
我想得到的是,
for e in range(epochs): running_loss = 0 for images, labels in trainloader: # Flatten Fashion-MNIST images into a 784 long vector images = images.to(device) labels = labels.to(device) images = images.view(images.shape[0], -1) optimizer.zero_grad() output = model[0].forward(images) loss = criterion(output[0], labels.float()) loss.backward() optimizer.step() running_loss += loss.item()
在这里,model[0](这可能是第一层 nn.Linear(784, 128)),我希望仅获取第一层和第二层之间的均方误差,
如果我运行这段代码,我会收到下面的错误
RuntimeError: The size of tensor a (128) must match the size of tensor b (96) at non-singleton dimension 0
如果我想正确运行这段代码以获取MSELoss,我需要做什么?
回答:
错误是由数据集中的样本数量和批次大小引起的。
更详细地说,训练用的MNIST数据集包含60,000个样本,你当前的batch_size
是128,你将需要60000/128=468.75
个循环来完成一个epoch的训练。所以问题出在这里,对于468个循环,你的数据将有128个样本,但最后一个循环仅包含60000 - 468*128 = 96
个样本。
为了解决这个问题,我认为你需要找到合适的batch_size
以及模型中神经元的数量。
我认为这应该能用于计算损失
trainloader = torch.utils.data.DataLoader(trainset, batch_size = 96, shuffle = True, num_workers=0)testloader = torch.utils.data.DataLoader(testset, batch_size = 96, shuffle = True, num_workers=0)model = nn.Sequential(nn.Linear(784, 96), nn.ReLU(), nn.Linear(96, 10), nn.LogSoftmax(dim = 1) )