标签平滑在PyTorch中

我正在使用迁移学习为斯坦福汽车数据集构建一个ResNet-18分类模型。我希望实现标签平滑来惩罚过度自信的预测并提高泛化能力。

TensorFlow在CrossEntropyLoss中有一个简单的关键字参数。是否有人为PyTorch构建了一个类似的函数，我可以直接使用？

回答：

多类神经网络的泛化能力和学习速度通常可以通过使用软目标显著提高，这些软目标是硬目标和标签上的均匀分布的加权平均。以这种方式平滑标签可以防止网络变得过于自信，标签平滑已被用于许多最先进的模型中，包括图像分类、语言翻译和语音识别。

标签平滑已经在Tensorflow的交叉熵损失函数中实现了。包括BinaryCrossentropy和CategoricalCrossentropy。但目前，PyTorch中没有标签平滑的官方实现。然而，关于这一点正在进行积极的讨论，希望很快会提供一个官方包。这是讨论线程：Issue #7455。

这里我们将带来一些PyTorch实践者提供的最佳标签平滑（LS）实现。基本上，有多种方式来实现LS。请参考这个特定的讨论，其中一个是这里，另一个是这里。这里我们将以两种独特的方式各提供两个版本的实现；总共4个。

选项1：CrossEntropyLossWithProbs

这种方式接受one-hot目标向量。用户必须手动平滑他们的目标向量。这可以在with torch.no_grad()范围内完成，因为它会暂时将所有requires_grad标志设置为false。

Devin Yang: 源码

import torchimport numpy as npimport torch.nn as nnimport torch.nn.functional as Ffrom torch.autograd import Variablefrom torch.nn.modules.loss import _WeightedLossclass LabelSmoothingLoss(nn.Module):    def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):        """if smoothing == 0, it's one-hot method           if 0 < smoothing < 1, it's smooth method        """        super(LabelSmoothingLoss, self).__init__()        self.confidence = 1.0 - smoothing        self.smoothing = smoothing        self.weight = weight        self.cls = classes        self.dim = dim    def forward(self, pred, target):        assert 0 <= self.smoothing < 1        pred = pred.log_softmax(dim=self.dim)        if self.weight is not None:            pred = pred * self.weight.unsqueeze(0)           with torch.no_grad():            true_dist = torch.zeros_like(pred)            true_dist.fill_(self.smoothing / (self.cls - 1))            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

此外，我们在self.smoothing上添加了一个断言检查，并在该实现中添加了损失加权支持。

Shital Shah: 源码

Shital已经在这里发布了答案。我们在这里指出，这个实现与Devin Yang的上述实现相似。然而，这里我们提到他的代码，并稍微简化了代码语法。

class SmoothCrossEntropyLoss(_WeightedLoss):    def __init__(self, weight=None, reduction='mean', smoothing=0.0):        super().__init__(weight=weight, reduction=reduction)        self.smoothing = smoothing        self.weight = weight        self.reduction = reduction    def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):        with torch.no_grad():            targets = torch.empty(size=(targets.size(0), n_classes),                                  device=targets.device) \                                  .fill_(smoothing /(n_classes-1)) \                                  .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)        return targets    def reduce_loss(self, loss):        return loss.mean() if self.reduction == 'mean' else loss.sum() \        if self.reduction == 'sum' else loss    def forward(self, inputs, targets):        assert 0 <= self.smoothing < 1        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)        log_preds = F.log_softmax(inputs, -1)        if self.weight is not None:            log_preds = log_preds * self.weight.unsqueeze(0)        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

检查

import torchimport numpy as npimport torch.nn as nnimport torch.nn.functional as Ffrom torch.autograd import Variablefrom torch.nn.modules.loss import _WeightedLossif __name__=="__main__":    # 1. Devin Yang    crit = LabelSmoothingLoss(classes=5, smoothing=0.5)    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                 [0, 0.9, 0.2, 0.2, 1],                                  [1, 0.2, 0.7, 0.9, 1]])    v = crit(Variable(predict),             Variable(torch.LongTensor([2, 1, 0])))    print(v)    # 2. Shital Shah    crit = SmoothCrossEntropyLoss(smoothing=0.5)    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                 [0, 0.9, 0.2, 0.2, 1],                                  [1, 0.2, 0.7, 0.9, 1]])    v = crit(Variable(predict),             Variable(torch.LongTensor([2, 1, 0])))    print(v)tensor(1.4178)tensor(1.4178)

选项2：LabelSmoothingCrossEntropyLoss

通过这种方式，它接受目标向量并且不手动平滑目标向量，而是内置模块负责标签平滑。这允许我们通过F.nll_loss来实现标签平滑。

(a). Wangleiofficial: 源码 – (据我所知)，原始发布者

(b). Datasaurus: 源码 – 添加了加权支持

此外，我们稍微简化了代码编写，使其更加简洁。

class LabelSmoothingLoss(torch.nn.Module):    def __init__(self, smoothing: float = 0.1,                  reduction="mean", weight=None):        super(LabelSmoothingLoss, self).__init__()        self.smoothing   = smoothing        self.reduction = reduction        self.weight    = weight    def reduce_loss(self, loss):        return loss.mean() if self.reduction == 'mean' else loss.sum() \         if self.reduction == 'sum' else loss    def linear_combination(self, x, y):        return self.smoothing * x + (1 - self.smoothing) * y    def forward(self, preds, target):        assert 0 <= self.smoothing < 1        if self.weight is not None:            self.weight = self.weight.to(preds.device)        n = preds.size(-1)        log_preds = F.log_softmax(preds, dim=-1)        loss = self.reduce_loss(-log_preds.sum(dim=-1))        nll = F.nll_loss(            log_preds, target, reduction=self.reduction, weight=self.weight        )        return self.linear_combination(loss / n, nll)

NVIDIA/DeepLearningExamples: 源码

class LabelSmoothing(nn.Module):    """带标签平滑的NLL损失。    """    def __init__(self, smoothing=0.0):        """标签平滑模块的构造函数。        :param smoothing: 标签平滑因子        """        super(LabelSmoothing, self).__init__()        self.confidence = 1.0 - smoothing        self.smoothing = smoothing    def forward(self, x, target):        logprobs = torch.nn.functional.log_softmax(x, dim=-1)        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))        nll_loss = nll_loss.squeeze(1)        smooth_loss = -logprobs.mean(dim=-1)        loss = self.confidence * nll_loss + self.smoothing * smooth_loss        return loss.mean()

检查

if __name__=="__main__":    # Wangleiofficial    crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                 [0, 0.9, 0.2, 0.2, 1],                                  [1, 0.2, 0.7, 0.9, 1]])    v = crit(Variable(predict),             Variable(torch.LongTensor([2, 1, 0])))    print(v)    # NVIDIA    crit = LabelSmoothing(smoothing=0.3)    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],                                 [0, 0.9, 0.2, 0.2, 1],                                  [1, 0.2, 0.7, 0.9, 1]])    v = crit(Variable(predict),             Variable(torch.LongTensor([2, 1, 0])))    print(v)tensor(1.3883)tensor(1.3883)

更新：官方添加

torch.nn.CrossEntropyLoss(weight=None, size_average=None,                           ignore_index=- 100, reduce=None,                           reduction='mean', label_smoothing=0.0)

学技术

标签平滑在PyTorch中

选项1：CrossEntropyLossWithProbs

选项2：LabelSmoothingCrossEntropyLoss

更新：官方添加

发表回复取消回复

选项1：CrossEntropyLossWithProbs

选项2：LabelSmoothingCrossEntropyLoss

更新：官方添加

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复