我正在尝试在预训练模型输出的嵌入之上构建一个神经网络。具体来说:我已经将基础模型的逻辑值保存到磁盘,每个样例是一个形状为512的数组(最初对应于一张图片),并有一个关联的标签(0或1)。这是我目前正在做的:
这是我的模型定义和训练循环。目前它只是一个简单的线性层,只是为了确保它能工作,然而,当我运行这个脚本时,损失值从0.4开始,而不是二元分类的标准值约0.7。有人能看出我哪里做错了么?
from transformers.modeling_outputs import SequenceClassifierOutputclass ClassNet(nn.Module): def __init__(self, num_labels=2): super(ClassNet, self).__init__() self.num_labels = num_labels self.classifier = nn.Linear(512, num_labels) if num_labels > 0 else nn.Identity() def forward(self, inputs): logits = self.classifier(inputs) loss_fct = nn.CrossEntropyLoss() loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) return SequenceClassifierOutput( loss=loss, logits=logits )model = ClassNet()optimizer = optim.Adam(model.parameters(), lr=1e-4,weight_decay=5e-3) # L2 regularizationloss_fct=nn.CrossEntropyLoss()for epoch in range(2): # loop over the dataset multiple timesrunning_loss = 0.0for i, data in enumerate(train_loader, 0): # get the inputs; data is a list of [inputs, labels] #data['embeddings'] -> torch.Size([1, 512]) #data['labels'] -> torch.Size([1]) inputs, labels = data['embeddings'], data['labels'] # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = model(inputs) loss = loss_fct(outputs.logits.squeeze(1), labels.squeeze()) loss.backward() optimizer.step() # print statistics running_loss += loss.item() if i % 2000 == 1: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0
打印outputs.logits.squeeze(1)
和labels.squeeze()
的一个例子:
#outputs.logits.squeeze(1)tensor([[-0.2214, 0.2187], [ 0.3838, -0.3608], [ 0.9043, -0.9065], [-0.3324, 0.4836], [ 0.6775, -0.5908], [-0.8017, 0.9044], [ 0.6669, -0.6488], [ 0.4253, -0.5357], [-1.1670, 1.1966], [-0.0630, -0.1150], [ 0.6025, -0.4755], [ 1.8047, -1.7424], [-1.5618, 1.5331], [ 0.0802, -0.3321], [-0.2813, 0.1259], [ 1.3357, -1.2737]], grad_fn=<SqueezeBackward1>)#labels.squeeze()tensor([1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0])#losstensor(0.4512, grad_fn=<NllLossBackward>)
回答:
您只从第二次迭代开始打印。上述代码实际上会在每200k+1
步时打印一次,但i
是从0
开始的
if i % 2000 == 1: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000))
即一个梯度下降步骤已经发生。这可能足以将初始损失值从-log(1/2) = ~0.69
降低到您观察到的~0.45
。