在Keras中训练目标检测模型时遇到不兼容的张量形状问题

我正在尝试将一个基本的分类模型（https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/）扩展为一个用于单一对象的简单目标检测模型。

分类模型只是对图像中填满大部分图像的 handwritten digits 进行分类。为了制作有意义的目标检测数据集，我使用 MNIST 数据集作为基础，并通过以下步骤将其转换为新的数据集：

将图像画布大小从 28×28 增加到 100×100
将 handwritten digit 移动到 100×100 图像内的随机位置
创建真实边界框

图1：步骤1和2的说明。

图2：一些生成的真实边界框。

模型的输出向量受到YOLO定义的启发，但适用于单个对象：

y = [p, x, y, w, h, c0, ..., c9]

其中 p = 对象存在的概率，(x, y, w, h) = 边界框中心、宽度和高度占图像大小的比例，c0-c9 = 类别概率（每个数字一个）。

因此，为了将分类模型转换为目标检测模型，我只需将最后的 softmax 层替换为具有15个节点的全连接层（对应于 y 中的每个值），并编写了一个自定义损失函数来比较预测与真实值。

然而，当我尝试训练模型时，我遇到了一个神秘的错误 tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15] vs. [200]，其中 [15] 是我的最终层的节点数，[200] 是我指定的用于训练的批次大小（我通过更改值并重新运行来验证这一点）。它们合理地不应该相同，所以我猜我在模型的张量维度方面遗漏了一些关键的东西，但我无法弄清楚是什么。

注意：我对批次的理解是模型在训练期间一次处理多少个样本（图像）。因此，批次大小应该合理地是训练数据大小的整数倍。但没有任何东西应该将它与模型的输出节点数联系起来。

任何帮助都将不胜感激。

以下是完整的代码：


import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K

def increase_image_size(im_set, new_size):
    num_images = im_set.shape[0]
    orig_size = im_set[0].shape[0]
    im_stack = np.zeros((num_images, new_size, new_size), dtype='uint8')
    # 将 MNIST 数字放置在新图像的随机位置
    for i in range(num_images):
        x0 = int(np.random.random() * (new_size - orig_size - 1))
        y0 = int(np.random.random() * (new_size - orig_size - 1))
        x1 = x0 + orig_size
        y1 = y0 + orig_size
        im_stack[i, y0:y1, x0:x1] = im_set[i]
    return im_stack

# 从图像和对象标签获取边界框标注
def get_image_annotations(X_train, y_train):
    num_images = len(X_train)
    annotations = np.zeros((num_images, 15), dtype='float')
    for i in range(num_images):
        annotations[i] = get_image_annotation(X_train[i], y_train[i])
    return annotations

def get_image_annotation(X, y):
    sz_y, sz_x = X.shape
    y_indices, x_indices = np.where(X > 0)
    y_min = max(np.min(y_indices) - 1, 0)
    y_max = min(np.max(y_indices) + 1, sz_y)
    x_min = max(np.min(x_indices) - 1, 0)
    x_max = min(np.max(x_indices) + 1, sz_x)
    bb_x = (x_min + x_max) / 2.0 / sz_x
    bb_y = (y_min + y_max) / 2.0 / sz_y
    bb_w = (x_max - x_min) / sz_x
    bb_h = (y_max - y_min) / sz_y
    classes = np.zeros(10, dtype='float')
    classes[y] = 1
    output = np.concatenate(([1, bb_x, bb_y, bb_w, bb_h], classes))
    return output

def custom_cost_function(y_true, y_pred):
    p_p = y_pred[0]
    x_p = y_pred[1]
    y_p = y_pred[2]
    w_p = y_pred[3]
    h_p = y_pred[4]
    p_t = y_true[0]
    x_t = y_true[1]
    y_t = y_true[2]
    w_t = y_true[3]
    h_t = y_true[4]
    c_pred = y_pred[5:]
    c_true = y_true[5:]
    c1 = K.sum((c_pred - c_true) * (c_pred - c_true))
    c2 = (x_p - x_t) * (x_p - x_t) + (y_p - y_t) * (y_p - y_t) \
         + (K.sqrt(w_p) - K.sqrt(w_t)) * (K.sqrt(w_p) - K.sqrt(w_t)) \
         + (K.sqrt(h_p) - K.sqrt(h_t)) * (K.sqrt(h_p) - K.sqrt(h_t))
    lambda_class = 1.0
    lambda_coord = 1.0
    return lambda_class * c1 + lambda_coord * c2

def baseline_model():
    # 创建模型
    model = Sequential()
    model.add(Conv2D(32, (5, 5), input_shape=(1, 100, 100), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(15, activation='linear'))
    # 编译模型
    model.compile(loss=custom_cost_function, optimizer='adam', metrics=['accuracy'])
    return model

def mnist_object_detection():
    K.set_image_dim_ordering('th')
    # 固定随机种子以确保可重复性
    np.random.seed(7)
    # 加载数据
    print("正在加载数据")
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    # 调整输入图像
    print("调整输入图像（增加图像大小并移动数字）")
    X_train = increase_image_size(X_train, 100)
    X_test = increase_image_size(X_test, 100)
    print("创建标注")
    y_train_prim = get_image_annotations(X_train, y_train)
    y_test_prim = get_image_annotations(X_test, y_test)
    print("...完成")
    # 重塑为 [samples][pixels][width][height]
    X_train = X_train.reshape(X_train.shape[0], 1, 100, 100).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], 1, 100, 100).astype('float32')
    # 将输入从 0-255 标准化为 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # 构建模型
    print("构建模型")
    model = baseline_model()
    # 拟合模型
    print("训练模型")
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=10, batch_size=200, verbose=1)

if __name__ == '__main__':
    mnist_object_detection()

当我运行它时，我得到以下错误：


/Users/gedda/anaconda3/envs/keras-obj-det/bin/pythonn /Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py
Using TensorFlow backend.
Loading data
Adjust input images (increasing image sizes and moving digits)
Creating annotations
...done
Building model
2018-11-30 13:26:34.030159: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-11-30 13:26:34.030463: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.
Training model
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
Traceback (most recent call last):
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 140, in <module>
    mnist_object_detection()
  File "/Users/gedda/devel/tensorflow/digit-recognition/object_detection_reduced.py", line 136, in mnist_object_detection
    model.fit(X_train, y_train_prim, validation_data=(X_test, y_test_prim), epochs=3, batch_size=200, verbose=1)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/Users/gedda/anaconda3/envs/keras-obj-det/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15] vs. [200]
     [[{{node training/Adam/gradients/loss/dense_2_loss/mul_7_grad/BroadcastGradientArgs}} = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Reshape"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape, training/Adam/gradients/loss/dense_2_loss/mul_7_grad/Shape_1)]]
Process finished with exit code 1

回答：

所有张量的第一个维度是批次大小。

您的损失函数可能应该在第二个维度上工作：


def custom_cost_function(y_true, y_pred):
    p_p = y_pred[:,0]
    x_p = y_pred[:,1]
    y_p = y_pred[:,2]
    w_p = y_pred[:,3]
    h_p = y_pred[:,4]
    p_t = y_true[:,0]
    x_t = y_true[:,1]
    y_t = y_true[:,2]
    w_t = y_true[:,3]
    h_t = y_true[:,4]
    c_pred = y_pred[:,5:]
    c_true = y_true[:,5:]
    ........

学技术

在Keras中训练目标检测模型时遇到不兼容的张量形状问题

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复