我正在使用迁移学习为斯坦福汽车数据集构建一个ResNet-18
分类模型。我希望实现标签平滑来惩罚过度自信的预测并提高泛化能力。
TensorFlow
在CrossEntropyLoss
中有一个简单的关键字参数。是否有人为PyTorch
构建了一个类似的函数,我可以直接使用?
回答:
多类神经网络的泛化能力和学习速度通常可以通过使用软目标显著提高,这些软目标是硬目标和标签上的均匀分布的加权平均。以这种方式平滑标签可以防止网络变得过于自信,标签平滑已被用于许多最先进的模型中,包括图像分类、语言翻译和语音识别。
标签平滑已经在Tensorflow
的交叉熵损失函数中实现了。包括BinaryCrossentropy和CategoricalCrossentropy。但目前,PyTorch
中没有标签平滑的官方实现。然而,关于这一点正在进行积极的讨论,希望很快会提供一个官方包。这是讨论线程:Issue #7455。
这里我们将带来一些PyTorch
实践者提供的最佳标签平滑(LS)实现。基本上,有多种方式来实现LS。请参考这个特定的讨论,其中一个是这里,另一个是这里。这里我们将以两种独特的方式各提供两个版本的实现;总共4个。
选项1:CrossEntropyLossWithProbs
这种方式接受one-hot
目标向量。用户必须手动平滑他们的目标向量。这可以在with torch.no_grad()
范围内完成,因为它会暂时将所有requires_grad
标志设置为false。
import torchimport numpy as npimport torch.nn as nnimport torch.nn.functional as Ffrom torch.autograd import Variablefrom torch.nn.modules.loss import _WeightedLossclass LabelSmoothingLoss(nn.Module): def __init__(self, classes, smoothing=0.0, dim=-1, weight = None): """if smoothing == 0, it's one-hot method if 0 < smoothing < 1, it's smooth method """ super(LabelSmoothingLoss, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing self.weight = weight self.cls = classes self.dim = dim def forward(self, pred, target): assert 0 <= self.smoothing < 1 pred = pred.log_softmax(dim=self.dim) if self.weight is not None: pred = pred * self.weight.unsqueeze(0) with torch.no_grad(): true_dist = torch.zeros_like(pred) true_dist.fill_(self.smoothing / (self.cls - 1)) true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence) return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))
此外,我们在self.smoothing
上添加了一个断言检查,并在该实现中添加了损失加权支持。
Shital已经在这里发布了答案。我们在这里指出,这个实现与Devin Yang的上述实现相似。然而,这里我们提到他的代码,并稍微简化了代码语法
。
class SmoothCrossEntropyLoss(_WeightedLoss): def __init__(self, weight=None, reduction='mean', smoothing=0.0): super().__init__(weight=weight, reduction=reduction) self.smoothing = smoothing self.weight = weight self.reduction = reduction def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0): with torch.no_grad(): targets = torch.empty(size=(targets.size(0), n_classes), device=targets.device) \ .fill_(smoothing /(n_classes-1)) \ .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing) return targets def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def forward(self, inputs, targets): assert 0 <= self.smoothing < 1 targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing) log_preds = F.log_softmax(inputs, -1) if self.weight is not None: log_preds = log_preds * self.weight.unsqueeze(0) return self.reduce_loss(-(targets * log_preds).sum(dim=-1))
检查
import torchimport numpy as npimport torch.nn as nnimport torch.nn.functional as Ffrom torch.autograd import Variablefrom torch.nn.modules.loss import _WeightedLossif __name__=="__main__": # 1. Devin Yang crit = LabelSmoothingLoss(classes=5, smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # 2. Shital Shah crit = SmoothCrossEntropyLoss(smoothing=0.5) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)tensor(1.4178)tensor(1.4178)
选项2:LabelSmoothingCrossEntropyLoss
通过这种方式,它接受目标向量并且不手动平滑目标向量,而是内置模块负责标签平滑。这允许我们通过F.nll_loss
来实现标签平滑。
(a). Wangleiofficial: 源码 – (据我所知),原始发布者
(b). Datasaurus: 源码 – 添加了加权支持
此外,我们稍微简化了代码编写,使其更加简洁。
class LabelSmoothingLoss(torch.nn.Module): def __init__(self, smoothing: float = 0.1, reduction="mean", weight=None): super(LabelSmoothingLoss, self).__init__() self.smoothing = smoothing self.reduction = reduction self.weight = weight def reduce_loss(self, loss): return loss.mean() if self.reduction == 'mean' else loss.sum() \ if self.reduction == 'sum' else loss def linear_combination(self, x, y): return self.smoothing * x + (1 - self.smoothing) * y def forward(self, preds, target): assert 0 <= self.smoothing < 1 if self.weight is not None: self.weight = self.weight.to(preds.device) n = preds.size(-1) log_preds = F.log_softmax(preds, dim=-1) loss = self.reduce_loss(-log_preds.sum(dim=-1)) nll = F.nll_loss( log_preds, target, reduction=self.reduction, weight=self.weight ) return self.linear_combination(loss / n, nll)
class LabelSmoothing(nn.Module): """带标签平滑的NLL损失。 """ def __init__(self, smoothing=0.0): """标签平滑模块的构造函数。 :param smoothing: 标签平滑因子 """ super(LabelSmoothing, self).__init__() self.confidence = 1.0 - smoothing self.smoothing = smoothing def forward(self, x, target): logprobs = torch.nn.functional.log_softmax(x, dim=-1) nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1)) nll_loss = nll_loss.squeeze(1) smooth_loss = -logprobs.mean(dim=-1) loss = self.confidence * nll_loss + self.smoothing * smooth_loss return loss.mean()
检查
if __name__=="__main__": # Wangleiofficial crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean") predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v) # NVIDIA crit = LabelSmoothing(smoothing=0.3) predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0], [0, 0.9, 0.2, 0.2, 1], [1, 0.2, 0.7, 0.9, 1]]) v = crit(Variable(predict), Variable(torch.LongTensor([2, 1, 0]))) print(v)tensor(1.3883)tensor(1.3883)
更新:官方添加
torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=- 100, reduce=None, reduction='mean', label_smoothing=0.0)