如何在灰度图像上使用EfficientNets进行迁移学习？

我的问题更多是关于算法的工作原理。我已经成功地在灰度图像上实现了EfficientNet的集成和建模，现在我想了解它为什么能工作。

这里最重要的方面是灰度及其单通道。当我设置channels=1时，算法无法工作，因为如果我理解正确的话，它是基于3通道图像设计的。当我设置channels=3时，它能完美工作。

所以我的问题是，当我设置channels = 3并向模型输入预处理后的channels=1的图像时，为什么它仍然能工作？

EfficientNetB5的代码

# 变量赋值num_classes = 9img_height = 84img_width = 112channels = 3batch_size = 32# 创建输入层new_input = Input(shape=(img_height, img_width, channels),                  name='image_input')# 下载并使用EfficientNetB5tmp = tf.keras.applications.EfficientNetB5(include_top=False,                                           weights='imagenet',                                           input_tensor=new_input,                                           pooling='max')model = Sequential()model.add(tmp)  # 添加EfficientNetB5model.add(Flatten())...

灰度预处理的代码

data_generator = ImageDataGenerator(        validation_split=0.2)train_generator = data_generator.flow_from_directory(        train_path,        target_size=(img_height, img_width),        batch_size=batch_size,        color_mode="grayscale", ###################################        class_mode="categorical",        subset="training")

回答：

我研究了当你向具有三通道输入的EfficientNet模型提供灰度图像时会发生什么。这是EfficientNet B5的前几层，其输入形状为(128,128,3)

 Layer (type)                   Output Shape         Param #     Connected to                     ================================================================================================== input_7 (InputLayer)           [(None, 128, 128, 3  0           []                                                               )]                                                                                                                                                                   rescaling_7 (Rescaling)        (None, 128, 128, 3)  0           ['input_7[0][0]']                                                                                                                   normalization_13 (Normalizatio  (None, 128, 128, 3)  7          ['rescaling_7[0][0]']             n)                                                                                                                                                                                                  tf.math.truediv_4 (TFOpLambda)  (None, 128, 128, 3)  0          ['normalization_13[0][0]']                                                                                                          stem_conv_pad (ZeroPadding2D)  (None, 129, 129, 3)  0           ['tf.math.truediv_4[0][0]']

这是当模型输入为灰度图像时，这些层的输出形状：

input_7 (128, 128, 1)rescaling_7 (128, 128, 1)normalization_13 (128, 128, 3)tf.math.truediv_4 (128, 128, 3)stem_conv_pad (129, 129, 3)

如你所见，输出张量的通道数在进入normalization_13层时从1变为3，所以让我们看看这一层到底在做什么。Normalization层对输入张量执行以下操作：

(input_tensor - self.mean) / sqrt(self.var) // 见 https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization

减法操作后通道数发生了变化。事实上，self.mean看起来像这样：

<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[0.485, 0.456, 0.406]]]], dtype=float32)>

所以self.mean有三个通道，当在一个通道张量和一个三通道张量之间进行减法操作时，输出看起来像这样：[firstTensor - secondTensorFirstChannel, firstTensor - secondTensorSecondChannel, firstTensor - secondTensorThirdChannel]这就是魔法发生的方式，这就是为什么模型可以接受灰度图像作为输入！我已经用EfficientNet B5和EfficientNet B2V2验证了这一点。即使它们在Normalization层的声明方式上有所不同，但过程是相同的。我认为其他EfficientNet模型也是如此。

希望这些解释足够清楚！

学技术

如何在灰度图像上使用EfficientNets进行迁移学习？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复