我正在尝试重现这篇论文中的逻辑。该逻辑可以总结为以下图表:
突出我的问题:
- 我有一个256×256的输入图像。它通过densenet处理(下面有工作示例)
- 这张相同的图像被分割成4个相等且不重叠的128×128片段。它们也都通过densenet处理并进行平均。
工作代码:
from keras.applications.densenet import DenseNet201from keras.layers import Dense, Flatten, Concatenatefrom keras.activations import relu#主图像sin1 = tf.keras.Input(shape=(256,256,3))#主图像的4个子块patch1 = tf.keras.Input(shape=(128,128,3))patch2 = tf.keras.Input(shape=(128,128,3))patch3 = tf.keras.Input(shape=(128,128,3))patch4 = tf.keras.Input(shape=(128,128,3))# CNN cnn = DenseNet201(include_top=False, pooling='avg')#完整256x256图像的输出out1 = cnn(in1)#4个128x128块的输出path_out1 = cnn(patch1)path_out2 = cnn(patch2)path_out3 = cnn(patch3)path_out4 = cnn(patch4)#平均块patch_out_average = tf.keras.layers.Average()([path_out1, path_out2, path_out3, path_out4])#合并特征out_combined = tf.stack([out1, patch_out_average])
我的问题:有没有办法让这个过程更优雅、更少手动操作?我不想手动生成16个64×64的输入行。有没有办法将图像‘分割’成若干部分并返回一个平均张量,或者只是让这个过程更简短?
谢谢。
更新(使用下方答案中的代码):
from keras.applications.densenet import DenseNet201from keras.layers import Dense, Flatten, Concatenatefrom keras.activations import reluclass CreatePatches(tf.keras.layers.Layer): def __init__(self , patch_size, cnn): super(CreatePatches , self).__init__() self.patch_size = patch_size self.cnn = cnn def call(self, inputs): patches = [] #仅适用于正方形图像(因为inputs.shape[1] = inputs.shape[2]) input_image_size = inputs.shape[1] for i in range(0 ,input_image_size , self.patch_size): for j in range(0 ,input_image_size , self.patch_size): patches.append(self.cnn(inputs[ : , i : i + self.patch_size , j : j + self.patch_size , : ])) return patches#主图像in1 = tf.keras.Input(shape=(256,256,3))# CNN cnn = DenseNet201(include_top=False, pooling='avg')#完整256x256图像的输出out256 = cnn(in1)#4个128x128块的输出out128 = CreatePatches(patch_size=128, cnn = cnn)(in1)#16个64x64块的输出out64 = CreatePatches(patch_size=64, cnn = cnn)(in1)#平均块out128 = tf.keras.layers.Average()(out128)out64 = tf.keras.layers.Average()(out64)#合并特征out_combined = tf.stack([out256, out128, out64], axis = 1)#平均out_averaged = tf.keras.layers.GlobalAveragePooling1D()(out_combined)out_averaged
回答:
更新(2021年7月16日)
我在Keras的Vision Transformers教程中发现了这个代码,其中实现了一个自定义Keras层,使用tf.image.extract_patches
函数从图像中创建块。
class Patches(layers.Layer): def __init__(self, patch_size): super(Patches, self).__init__() self.patch_size = patch_size def call(self, images): batch_size = tf.shape(images)[0] patches = tf.image.extract_patches( images=images, sizes=[1, self.patch_size, self.patch_size, 1], strides=[1, self.patch_size, self.patch_size, 1], rates=[1, 1, 1, 1], padding="VALID", ) patch_dims = patches.shape[-1] patches = tf.reshape(patches, [batch_size, -1, patch_dims]) return patches
现有解决方案
您可以创建一个自定义KerasLayer
,它可以将给定的正方形图像(宽度=高度)分割成块,如下所示,
class CreatePatches( tf.keras.layers.Layer ): def __init__( self , patch_size ): super( CreatePatches , self ).__init__() self.patch_size = patch_size def call(self, inputs ): patches = [] # 仅适用于正方形图像(因为inputs.shape[ 1 ] = inputs.shape[ 2 ]) input_image_size = inputs.shape[ 1 ] for i in range( 0 , input_image_size , self.patch_size ): for j in range( 0 , input_image_size , self.patch_size ): patches.append( inputs[ : , i : i + self.patch_size , j : j + self.patch_size , : ] ) return patchessample_image = np.random.rand( 1 , 256 , 256 , 3 ) layer = CreatePatches( 128 )layer( sample_image )
请确保
inputs.shape[ 1 ]
能被patch_size
整除。
您还可以将此层包含在Model
中,如下所示,
inputs = tf.keras.layers.Input( shape=( 256 , 256 , 3 ) ) patches = CreatePatches( patch_size=128 )( inputs )model = tf.keras.models.Model( inputs , patches )model.summary()
上述代码片段的输出,
Model: "model_1"_________________________________________________________________Layer (type) Output Shape Param # =================================================================input_3 (InputLayer) [(None, 256, 256, 3)] 0 _________________________________________________________________create_patches_5 (CreatePatc [(None, 128, 128, 3), (No 0 =================================================================Total params: 0Trainable params: 0Non-trainable params: 0_________________________________________________________________
有关模型输出的更多详细信息,
>> model.outputs[<KerasTensor: shape=(None, 128, 128, 3) dtype=float32 (created by layer 'create_patches_5')>, <KerasTensor: shape=(None, 128, 128, 3) dtype=float32 (created by layer 'create_patches_5')>, <KerasTensor: shape=(None, 128, 128, 3) dtype=float32 (created by layer 'create_patches_5')>, <KerasTensor: shape=(None, 128, 128, 3) dtype=float32 (created by layer 'create_patches_5')>]