随机深度的定义首次在这篇论文中被提及。简而言之，它类似于drop-out，但不是作用于节点，而是会终止跳跃连接结构（残差块）的连接，如ResNet论文中所述。

我的问题是：是否有快速、简单的方法在Pytorch的迁移学习中实现随机深度，就像drop-out一样（只需在分类器块中添加torch.nn.DropOut(p)）。

回答：

基本的随机深度

是的，可以很容易地实现类似这样的功能：

class StochasticDepth(torch.nn.Module):    def __init__(self, module: torch.nn.Module, p: float = 0.5):        super().__init__()        if not 0 < p < 1:            raise ValueError(                "Stochastic Depth p has to be between 0 and 1 but got {}".format(p)            )        self.module: torch.nn.Module = module        self.p: float = p        self._sampler = torch.Tensor(1)    def forward(self, inputs):        if self.training and self._sampler.uniform_():            return inputs        return self.p * self.module(inputs)

请注意以下几点：

inputs的形状必须与self.module(inputs)的形状相同
你可以在这个函数中传入任何块（见下文）

使用示例：

layer = StochasticDepth(    torch.nn.Sequential(        torch.nn.Linear(10, 10),        torch.nn.ReLU(),        torch.nn.Linear(10, 10),        torch.nn.ReLU(),    ),    p=0.5,)

添加到现有模型中

首先，你应该print你想要的模型并分析其权重和输出。

要最容易地应用这个模块（在Conv{1,2,3}d层的情况下），你需要寻找以下条件：

块内in_channels和out_channels的数量相同
如果in_channels和out_channels的数量不同，则需要某种形式的投影

带投影的随机深度

带有projection的StochasticDepth版本：

class StochasticDepth(torch.nn.Module):    def __init__(        self,        module: torch.nn.Module,        p: float = 0.5,        projection: torch.nn.Module = None,    ):        super().__init__()        if not 0 < p < 1:            raise ValueError(                "Stochastic Depth p has to be between 0 and 1 but got {}".format(p)            )        self.module: torch.nn.Module = module        self.p: float = p        self.projection: torch.nn.Module = projection        self._sampler = torch.Tensor(1)    def forward(self, inputs):        if self.training and self._sampler.uniform_():            if self.projection is not None:                return self.projection(inputs)            return inputs        return self.p * self.module(inputs)

projection可以在resnet模块中使用Conv2d(256, 512, kernel_size=1, stride=2)，因为它会增加channels的数量并通过stride=2使图像变小，正如原始论文中所述。

应用随机深度

如果你打印torchvision.models.resnet18()，你会看到重复的块，如下所示：

(layer2): Sequential(                                                                                                                                                                      (0): BasicBlock(                                                                                                                                                                           (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)                                                                                                  (relu): ReLU(inplace=True)                                                                                                                                                               (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)      (downsample): Sequential(                                                                                                                                                                  (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)                                                                                                                      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)      )                 )                                                                                                                                                                                        (1): BasicBlock(                                                                                                                                                                           (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)                                                                                                  (relu): ReLU(inplace=True)                                                                                                                                                               (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)    )                                                                                         )

每个layer是一个较大的彩色块，你可能希望随机跳过。对于resnet18和layer，可以这样做：

model = torchvision.models.resnet18()model.layer1 = StochasticDepth(model.layer1)model.layer2 = StochasticDepth(    model.layer2,    projection=torch.nn.Conv2d(        64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False    ),)model.layer3 = StochasticDepth(    model.layer3,    projection=torch.nn.Conv2d(        128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False    ),)model.layer4 = StochasticDepth(    model.layer4,    projection=torch.nn.Conv2d(        256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False    ),)

第一个块的通道数量保持不变，因此不需要投影
第二、三、四个块增加了通道数量并通过stride使图像变小，因此使用了简单的投影

你可以使用这种方法修改神经网络的任何部分，只是记得测试形状是否一致。

更简单的投影

你还可以将特定块中第一个卷积层的权重绑定，并使用该module作为投影，如下所示：

model = torchvision.models.resnet18()model.layer1 = StochasticDepth(model.layer1)model.layer2 = StochasticDepth(model.layer2, projection=model.layer2[0].conv1)model.layer3 = StochasticDepth(model.layer3, projection=model.layer3[0].conv1)model.layer4 = StochasticDepth(model.layer4, projection=model.layer4[0].conv1)

优点：

权重不是随机初始化的
更易于编写

缺点：

权重是绑定的，一个层将不得不执行两个任务：
- 在块中第一个（不丢弃）
- 在块中唯一的一个（丢弃）
这可能不会有太好的结果，因为它负责冲突的任务

你也可以复制这个module而不是共享权重，这可能是最好的方法。

学技术

使用Pytorch实现随机深度

基本的随机深度

添加到现有模型中

带投影的随机深度

应用随机深度

更简单的投影

发表回复取消回复

基本的随机深度

添加到现有模型中

带投影的随机深度

应用随机深度

更简单的投影

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复