我正在开发一个全卷积自编码器,它接受3个通道作为输入,并输出2个通道(输入:LAB,输出:AB)。因为输出应该与输入大小相同,所以我使用了全卷积。
代码如下:
import torch.nn as nnclass AE(nn.Module): def __init__(self): super(AE, self).__init__() self.encoder = nn.Sequential( # conv 1 nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 2 nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 3 nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 4 nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 5 nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(1024), nn.ReLU() ) self.decoder = nn.Sequential( # conv 6 nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(512), nn.ReLU(), # conv 7 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(256), nn.ReLU(), # conv 8 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(128), nn.ReLU(), # conv 9 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=1), nn.BatchNorm2d(64), nn.ReLU(), # conv 10 out nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=1), nn.Softmax() # multi-class classification # TODO softmax deprecated ) def forward(self, x): x = self.encoder(x) x = self.decoder(x) return x
输出张量应该的大小是:torch.Size([1, 2, 199, 253])
输出张量实际的大小是:torch.Size([1, 2, 190, 238])
我的主要问题是如何结合Conv2d和MaxPool2d,以及如何在ConvTranspose2d中设置正确的参数值。因此,我分别处理它们,使用Upsample函数处理MaxPool2d,而仅对Conv2d使用ConvTranspose2d。但我仍然存在一些不对称性,我实在不知道原因。
感谢您的帮助!
回答:
存在两个问题。
首先是填充不足:当kernel_size=5
时,每次应用卷积都会使图像缩小4个像素(每边2个像素),所以在所有地方都需要padding=2
,而不仅仅是1。
其次是输入大小“不均匀”。我的意思是,一旦你的卷积被正确填充,你就剩下降采样操作,这些操作在每个点上尝试将图像分辨率减半。当它们失败时,它们只返回一个较小的结果(整数除法会丢弃余数)。由于你的网络有4个连续的2倍降采样操作,你需要输入的H, W
维度是2^4=16
的倍数。这样你实际上会得到形状相同的输出。下面是一个示例:
import torchimport torch.nn as nnclass AE(nn.Module): def __init__(self): super(AE, self).__init__() self.encoder = nn.Sequential( # conv 1 nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 2 nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 3 nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(256), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 4 nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(512), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2), # conv 5 nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(1024), nn.ReLU() ) self.decoder = nn.Sequential( # conv 6 nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(512), nn.ReLU(), # conv 7 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(256), nn.ReLU(), # conv 8 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(128), nn.ReLU(), # conv 9 nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(64), nn.ReLU(), # conv 10 out nn.Upsample(scale_factor=2, mode='bilinear'), nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=2), nn.Softmax() # multi-class classification ) def forward(self, x): x = self.encoder(x) x = self.decoder(x) return xinput = torch.randn(1, 3, 6*16, 7*16)output = AE()(input)print(input.shape)print(output.shape)