为了生成类似Google-Dream
的图像,我正在尝试通过梯度上升优化inceptionV3
网络来修改输入图像。
期望效果:https://github.com/google/deepdream/blob/master/dream.ipynb
(更多信息请参考[https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html。)
为此,我使用迁移学习
方法微调了一个Inception网络,并生成了模型:inceptionv3-ft.model
model.summary()
打印出以下架构(此处因篇幅限制已缩短):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, None, 3 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, None, None, 3 864 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, None, 3 96 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, None, None, 3 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, None, None, 3 9216 activation_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, None, 3 96 conv2d_2[0][0]
__________________________________________________________________________________________________
activation_2 (Activation) (None, None, None, 3 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, None, None, 6 18432 activation_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, None, 6 192 conv2d_3[0][0]
__________________________________________________________________________________________________
activation_3 (Activation) (None, None, None, 6 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, None, None, 6 0 activation_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, None, None, 8 5120 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, None, 8 240 conv2d_4[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, None, None, 8 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, None, None, 1 138240 activation_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, None, 1 576 conv2d_5[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, None, None, 1 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, None, None, 1 0 activation_5[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, None, None, 6 192 conv2d_9[0][0]
__________________________________________________________________________________________________
activation_9 (Activation) (None, None, None, 6 0 batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D) (None, None, None, 4 9216 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, None, None, 9 55296 activation_9[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, None, 4 144 conv2d_7[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, None, None, 9 288 conv2d_10[0][0]
__________________________________________________________________________________________________
activation_7 (Activation) (None, None, None, 4 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
activation_10 (Activation) (None, None, None, 9 0 batch_normalization_10[0][0]
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, None, None, 1 0 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D) (None, None, None, 6 12288 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
(...) mixed9_1 (Concatenate) (None, None, None, 7 0 activation_88[0][0] activation_89[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, None, None, 7 0 activation_92[0][0] activation_93[0][0]
__________________________________________________________________________________________________
activation_94 (Activation) (None, None, None, 1 0 batch_normalization_94[0][0]
__________________________________________________________________________________________________
mixed10 (Concatenate) (None, None, None, 2 0 activation_86[0][0] mixed9_1[0][0] concatenate_2[0][0] activation_94[0][0]
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 2048) 0 mixed10[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1024) 2098176 global_average_pooling2d_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 1025 dense_1[0][0]
==================================================================================================
Total params: 23,901,985
Trainable params: 18,315,137
Non-trainable params: 5,586,848
____________________________________
现在,我正在使用以下设置和代码尝试调整和激活特定的高层对象,以便在输入图像上显现完整的对象:
settings = {
'features': {
'mixed2': 0.,
'mixed3': 0.,
'mixed4': 0.,
'mixed10': 0., #最高
},
}
model = load_model('inceptionv3-ft.model')
#获取每个“关键”层的符号输出(我们给它们起了独特的名字)。
layer_dict = dict([(layer.name, layer) for layer in model.layers])
#定义损失。
loss = K.variable(0.)
for layer_name in settings['features']:
#将层的特征的L2范数添加到损失中。
assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
coeff = settings['features'][layer_name]
x = layer_dict[layer_name].output
print (x)
#通过只涉及非边界像素来避免边界伪影。
scaling = K.prod(K.cast(K.shape(x), 'float32'))
if K.image_data_format() == 'channels_first':
loss += coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
else:
loss += coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling
#计算梦想相对于损失的梯度。
grads = K.gradients(loss, dream)[0]
#标准化梯度。
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
#设置函数以检索给定输入图像的损失和梯度值
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
outs = fetch_loss_and_grads([x])
loss_value = outs[0]
grad_values = outs[1]
return loss_value, grad_values
def resize_img(img, size):
img = np.copy(img)
if K.image_data_format() == 'channels_first':
factors = (1, 1,
float(size[0]) / img.shape[2],
float(size[1]) / img.shape[3])
else:
factors = (1,
float(size[0]) / img.shape[1],
float(size[1]) / img.shape[2],
1)
return scipy.ndimage.zoom(img, factors, order=1)
def gradient_ascent(x, iterations, step, max_loss=None):
for i in range(iterations):
loss_value, grad_values = eval_loss_and_grads(x)
if max_loss is not None and loss_value > max_loss:
break
print('..损失值在', i, ':', loss_value)
x += step * grad_values
return x
def save_img(img, fname):
pil_img = deprocess_image(np.copy(img))
scipy.misc.imsave(fname, pil_img)
"""过程:
- 加载原始图像。
- 定义若干处理尺度(即图像形状),从最小到最大。
- 将原始图像调整到最小的尺度。
- 对于每个尺度,从最小的(即当前的)开始:
- 运行梯度上升
- 将图像放大到下一个尺度
- 重新注入放大时丢失的细节
- 在恢复到原始尺寸时停止。
为了获得放大过程中丢失的细节,我们只需
取原始图像,缩小它,放大它,
并将结果与(调整大小后的)原始图像进行比较。"""
# 调整这些超参数也可以实现新的效果
step = 0.01 # 梯度上升步长
num_octave = 3 # 运行梯度上升的尺度数量
octave_scale = 1.4 # 尺度之间的尺寸比率
iterations = 20 # 每个尺度的上升步骤数量
max_loss = 10.
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
original_shape = img.shape[2:]
else:
original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:
print('处理图像形状', shape)
img = resize_img(img, shape)
img = gradient_ascent(img,
iterations=iterations,
step=step,
max_loss=max_loss)
upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
same_size_original = resize_img(original_img, shape)
lost_detail = same_size_original - upscaled_shrunk_original_img
img += lost_detail
shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname=result_prefix + '.png')
但无论我如何调整设置值,似乎只能激活低级特征,如边缘和曲线,或者最多是混合特征。
理想情况下,设置应该能够访问到各个层面的通道和单元,例如,Layer4c – Unit 0,但我在Keras
文档中没有找到实现这一点的方法:
参见此处:https://distill.pub/2017/feature-visualization/appendix/googlenet/4c.html
我了解到使用Caffe
框架可以提供更多的灵活性,但系统范围内的安装是一个依赖地狱。
那么,在Keras框架内,或除了Caffe之外的任何其他框架中,我如何激活这个网络上的单个类别?
回答: