Keras – 在CNN网络上可视化类别

为了生成类似Google-Dream的图像,我正在尝试通过梯度上升优化inceptionV3网络来修改输入图像。

期望效果:https://github.com/google/deepdream/blob/master/dream.ipynb

(更多信息请参考[https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html。)

为此,我使用迁移学习方法微调了一个Inception网络,并生成了模型:inceptionv3-ft.model

model.summary()打印出以下架构(此处因篇幅限制已缩短):

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, None, None, 3 0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, None, None, 3 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, None, None, 3 96          conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, None, None, 3 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, None, None, 3 9216        activation_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, None, None, 3 96          conv2d_2[0][0]                   
__________________________________________________________________________________________________
activation_2 (Activation)       (None, None, None, 3 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, None, None, 6 18432       activation_2[0][0]               
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, None, None, 6 192         conv2d_3[0][0]                   
__________________________________________________________________________________________________
activation_3 (Activation)       (None, None, None, 6 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, None, None, 6 0           activation_3[0][0]               
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, None, None, 8 5120        max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, None, None, 8 240         conv2d_4[0][0]                   
__________________________________________________________________________________________________
activation_4 (Activation)       (None, None, None, 8 0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, None, None, 1 138240      activation_4[0][0]               
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, None, None, 1 576         conv2d_5[0][0]                   
__________________________________________________________________________________________________
activation_5 (Activation)       (None, None, None, 1 0           batch_normalization_5[0][0]      
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, None, None, 1 0           activation_5[0][0]               
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, None, None, 6 12288       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, None, None, 6 192         conv2d_9[0][0]                   
__________________________________________________________________________________________________
activation_9 (Activation)       (None, None, None, 6 0           batch_normalization_9[0][0]      
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, None, None, 4 9216        max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, None, None, 9 55296       activation_9[0][0]               
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, None, None, 4 144         conv2d_7[0][0]                   
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, None, None, 9 288         conv2d_10[0][0]                  
__________________________________________________________________________________________________
activation_7 (Activation)       (None, None, None, 4 0           batch_normalization_7[0][0]      
__________________________________________________________________________________________________
activation_10 (Activation)      (None, None, None, 9 0           batch_normalization_10[0][0]     
__________________________________________________________________________________________________
average_pooling2d_1 (AveragePoo (None, None, None, 1 0           max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, None, None, 6 12288       max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
(...) mixed9_1 (Concatenate)          (None, None, None, 7 0           activation_88[0][0]                                                                                       activation_89[0][0]                      
__________________________________________________________________________________________________
        concatenate_2 (Concatenate)     (None, None, None, 7 0           activation_92[0][0]                                                                                       activation_93[0][0]                      
__________________________________________________________________________________________________
        activation_94 (Activation)      (None, None, None, 1 0           batch_normalization_94[0][0]             
__________________________________________________________________________________________________
        mixed10 (Concatenate)           (None, None, None, 2 0           activation_86[0][0]                                                                                       mixed9_1[0][0]                                                                                            concatenate_2[0][0]                                                                                       activation_94[0][0]                      
__________________________________________________________________________________________________
        global_average_pooling2d_1 (Glo (None, 2048)         0           mixed10[0][0]                            
__________________________________________________________________________________________________
        dense_1 (Dense)                 (None, 1024)         2098176     global_average_pooling2d_1[0][0]         
__________________________________________________________________________________________________
        dense_2 (Dense)                 (None, 1)            1025        dense_1[0][0]                            
==================================================================================================
        Total params: 23,901,985
        Trainable params: 18,315,137
        Non-trainable params: 5,586,848
        ____________________________________

现在,我正在使用以下设置和代码尝试调整和激活特定的高层对象,以便在输入图像上显现完整的对象:

settings = {
    'features': {
        'mixed2': 0.,
        'mixed3': 0.,
        'mixed4': 0.,
        'mixed10': 0., #最高
    },
}
model = load_model('inceptionv3-ft.model')
#获取每个“关键”层的符号输出(我们给它们起了独特的名字)。
layer_dict = dict([(layer.name, layer) for layer in model.layers])
#定义损失。
loss = K.variable(0.)
for layer_name in settings['features']:
    #将层的特征的L2范数添加到损失中。
    assert layer_name in layer_dict.keys(), 'Layer ' + layer_name + ' not found in model.'
    coeff = settings['features'][layer_name]
    x = layer_dict[layer_name].output
    print (x)
    #通过只涉及非边界像素来避免边界伪影。
    scaling = K.prod(K.cast(K.shape(x), 'float32'))
    if K.image_data_format() == 'channels_first':
        loss += coeff * K.sum(K.square(x[:, :, 2: -2, 2: -2])) / scaling
    else:
        loss += coeff * K.sum(K.square(x[:, 2: -2, 2: -2, :])) / scaling
#计算梦想相对于损失的梯度。
grads = K.gradients(loss, dream)[0]
#标准化梯度。
grads /= K.maximum(K.mean(K.abs(grads)), K.epsilon())
#设置函数以检索给定输入图像的损失和梯度值
outputs = [loss, grads]
fetch_loss_and_grads = K.function([dream], outputs)
def eval_loss_and_grads(x):
    outs = fetch_loss_and_grads([x])
    loss_value = outs[0]
    grad_values = outs[1]
    return loss_value, grad_values
def resize_img(img, size):
    img = np.copy(img)
    if K.image_data_format() == 'channels_first':
        factors = (1, 1,
                   float(size[0]) / img.shape[2],
                   float(size[1]) / img.shape[3])
    else:
        factors = (1,
                   float(size[0]) / img.shape[1],
                   float(size[1]) / img.shape[2],
                   1)
    return scipy.ndimage.zoom(img, factors, order=1)
def gradient_ascent(x, iterations, step, max_loss=None):
    for i in range(iterations):
        loss_value, grad_values = eval_loss_and_grads(x)
        if max_loss is not None and loss_value > max_loss:
            break
        print('..损失值在', i, ':', loss_value)
        x += step * grad_values
    return x
def save_img(img, fname):
    pil_img = deprocess_image(np.copy(img))
    scipy.misc.imsave(fname, pil_img)
"""过程:
- 加载原始图像。
- 定义若干处理尺度(即图像形状),从最小到最大。
- 将原始图像调整到最小的尺度。
- 对于每个尺度,从最小的(即当前的)开始:
    - 运行梯度上升
    - 将图像放大到下一个尺度
    - 重新注入放大时丢失的细节
- 在恢复到原始尺寸时停止。
为了获得放大过程中丢失的细节,我们只需
取原始图像,缩小它,放大它,
并将结果与(调整大小后的)原始图像进行比较。"""
# 调整这些超参数也可以实现新的效果
step = 0.01  # 梯度上升步长
num_octave = 3  # 运行梯度上升的尺度数量
octave_scale = 1.4  # 尺度之间的尺寸比率
iterations = 20  # 每个尺度的上升步骤数量
max_loss = 10.
img = preprocess_image(base_image_path)
if K.image_data_format() == 'channels_first':
    original_shape = img.shape[2:]
else:
    original_shape = img.shape[1:3]
successive_shapes = [original_shape]
for i in range(1, num_octave):
    shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])
    successive_shapes.append(shape)
successive_shapes = successive_shapes[::-1]
original_img = np.copy(img)
shrunk_original_img = resize_img(img, successive_shapes[0])
for shape in successive_shapes:
    print('处理图像形状', shape)
    img = resize_img(img, shape)
    img = gradient_ascent(img,
                          iterations=iterations,
                          step=step,
                          max_loss=max_loss)
    upscaled_shrunk_original_img = resize_img(shrunk_original_img, shape)
    same_size_original = resize_img(original_img, shape)
    lost_detail = same_size_original - upscaled_shrunk_original_img
    img += lost_detail
    shrunk_original_img = resize_img(original_img, shape)
save_img(img, fname=result_prefix + '.png')

但无论我如何调整设置值,似乎只能激活低级特征,如边缘和曲线,或者最多是混合特征。

理想情况下,设置应该能够访问到各个层面的通道和单元,例如,Layer4c – Unit 0,但我在Keras文档中没有找到实现这一点的方法:

参见此处:https://distill.pub/2017/feature-visualization/appendix/googlenet/4c.html

我了解到使用Caffe框架可以提供更多的灵活性,但系统范围内的安装是一个依赖地狱。

那么,在Keras框架内,或除了Caffe之外的任何其他框架中,我如何激活这个网络上的单个类别?


回答:

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注