为什么我的全卷积自编码器不对称？

我正在开发一个全卷积自编码器，它接受3个通道作为输入，并输出2个通道（输入：LAB，输出：AB）。因为输出应该与输入大小相同，所以我使用了全卷积。

代码如下：

import torch.nn as nnclass AE(nn.Module):   def __init__(self):       super(AE, self).__init__()        self.encoder = nn.Sequential(           # conv 1           nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(64),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 2           nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(128),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 3           nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(256),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 4           nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(512),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 5           nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(1024),           nn.ReLU()       )       self.decoder = nn.Sequential(           # conv 6           nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(512),           nn.ReLU(),           # conv 7           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(256),           nn.ReLU(),           # conv 8           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(128),           nn.ReLU(),           # conv 9           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(64),           nn.ReLU(),           # conv 10 out           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=1),           nn.Softmax()    # multi-class classification           # TODO softmax deprecated       )   def forward(self, x):       x = self.encoder(x)       x = self.decoder(x)       return x

输出张量应该的大小是：torch.Size([1, 2, 199, 253])

输出张量实际的大小是：torch.Size([1, 2, 190, 238])

我的主要问题是如何结合Conv2d和MaxPool2d，以及如何在ConvTranspose2d中设置正确的参数值。因此，我分别处理它们，使用Upsample函数处理MaxPool2d，而仅对Conv2d使用ConvTranspose2d。但我仍然存在一些不对称性，我实在不知道原因。

感谢您的帮助！

回答：

存在两个问题。

首先是填充不足：当kernel_size=5时，每次应用卷积都会使图像缩小4个像素（每边2个像素），所以在所有地方都需要padding=2，而不仅仅是1。

其次是输入大小“不均匀”。我的意思是，一旦你的卷积被正确填充，你就剩下降采样操作，这些操作在每个点上尝试将图像分辨率减半。当它们失败时，它们只返回一个较小的结果（整数除法会丢弃余数）。由于你的网络有4个连续的2倍降采样操作，你需要输入的H, W维度是2^4=16的倍数。这样你实际上会得到形状相同的输出。下面是一个示例：

import torchimport torch.nn as nnclass AE(nn.Module):    def __init__(self):        super(AE, self).__init__()        self.encoder = nn.Sequential(            # conv 1            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(64),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 2            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(128),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 3            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(256),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 4            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(512),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 5            nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(1024),            nn.ReLU()        )        self.decoder = nn.Sequential(            # conv 6            nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(512),            nn.ReLU(),            # conv 7            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(256),            nn.ReLU(),            # conv 8            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(128),            nn.ReLU(),            # conv 9            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(64),            nn.ReLU(),            # conv 10 out            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=2),            nn.Softmax()    # multi-class classification        )    def forward(self, x):        x = self.encoder(x)        x = self.decoder(x)        return xinput = torch.randn(1, 3, 6*16, 7*16)output = AE()(input)print(input.shape)print(output.shape)

学技术

为什么我的全卷积自编码器不对称？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复