为什么我的全卷积自编码器不对称?

我正在开发一个全卷积自编码器,它接受3个通道作为输入,并输出2个通道(输入:LAB,输出:AB)。因为输出应该与输入大小相同,所以我使用了全卷积。

代码如下:

import torch.nn as nnclass AE(nn.Module):   def __init__(self):       super(AE, self).__init__()        self.encoder = nn.Sequential(           # conv 1           nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(64),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 2           nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(128),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 3           nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(256),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 4           nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(512),           nn.ReLU(),           nn.MaxPool2d(kernel_size=2, stride=2),           # conv 5           nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(1024),           nn.ReLU()       )       self.decoder = nn.Sequential(           # conv 6           nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(512),           nn.ReLU(),           # conv 7           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(256),           nn.ReLU(),           # conv 8           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(128),           nn.ReLU(),           # conv 9           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=1),           nn.BatchNorm2d(64),           nn.ReLU(),           # conv 10 out           nn.Upsample(scale_factor=2, mode='bilinear'),           nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=1),           nn.Softmax()    # multi-class classification           # TODO softmax deprecated       )   def forward(self, x):       x = self.encoder(x)       x = self.decoder(x)       return x

输出张量应该的大小是:torch.Size([1, 2, 199, 253])

输出张量实际的大小是:torch.Size([1, 2, 190, 238])

我的主要问题是如何结合Conv2d和MaxPool2d,以及如何在ConvTranspose2d中设置正确的参数值。因此,我分别处理它们,使用Upsample函数处理MaxPool2d,而仅对Conv2d使用ConvTranspose2d。但我仍然存在一些不对称性,我实在不知道原因。

感谢您的帮助!


回答:

存在两个问题。

首先是填充不足:当kernel_size=5时,每次应用卷积都会使图像缩小4个像素(每边2个像素),所以在所有地方都需要padding=2,而不仅仅是1。

其次是输入大小“不均匀”。我的意思是,一旦你的卷积被正确填充,你就剩下降采样操作,这些操作在每个点上尝试将图像分辨率减半。当它们失败时,它们只返回一个较小的结果(整数除法会丢弃余数)。由于你的网络有4个连续的2倍降采样操作,你需要输入的H, W维度是2^4=16的倍数。这样你实际上会得到形状相同的输出。下面是一个示例:

import torchimport torch.nn as nnclass AE(nn.Module):    def __init__(self):        super(AE, self).__init__()        self.encoder = nn.Sequential(            # conv 1            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(64),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 2            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(128),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 3            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(256),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 4            nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(512),            nn.ReLU(),            nn.MaxPool2d(kernel_size=2, stride=2),            # conv 5            nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(1024),            nn.ReLU()        )        self.decoder = nn.Sequential(            # conv 6            nn.ConvTranspose2d(in_channels=1024, out_channels=512, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(512),            nn.ReLU(),            # conv 7            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(256),            nn.ReLU(),            # conv 8            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=256, out_channels=128, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(128),            nn.ReLU(),            # conv 9            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=128, out_channels=64, kernel_size=5, stride=1, padding=2),            nn.BatchNorm2d(64),            nn.ReLU(),            # conv 10 out            nn.Upsample(scale_factor=2, mode='bilinear'),            nn.ConvTranspose2d(in_channels=64, out_channels=2, kernel_size=5, stride=1, padding=2),            nn.Softmax()    # multi-class classification        )    def forward(self, x):        x = self.encoder(x)        x = self.decoder(x)        return xinput = torch.randn(1, 3, 6*16, 7*16)output = AE()(input)print(input.shape)print(output.shape)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注