模型的目标是对视频输入进行分类,分类依据是视频中发出的单词。每个输入的维度为45帧,1个灰度颜色通道,100像素行和150像素列(45, 1, 100, 150),而每个对应的输出是一个三种可能单词之一的独热编码表示(例如,“yes” => [0, 0, 1])。
在编译模型时,出现了以下错误:
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with input shapes: [?,100,150,1], [3,3,3,32].
以下是用于训练模型的脚本:
video = Input(shape=(self.frames_per_sequence, 1, self.rows, self.columns))cnn = InceptionV3(weights="imagenet", include_top=False)cnn.trainable = Falseencoded_frames = TimeDistributed(cnn)(video)encoded_vid = LSTM(256)(encoded_frames)hidden_layer = Dense(output_dim=1024, activation="relu")(encoded_vid)outputs = Dense(output_dim=class_count, activation="softmax")(hidden_layer)osr = Model(, outputs)optimizer = Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-08, schedule_decay=0.004)osr.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["categorical_accuracy"])
回答:
根据Keras中的Convolution2D,输入和滤波器的形状应该如下所示。
shape of input = [batch, in_height, in_width, in_channels]shape of filter = [filter_height, filter_width, in_channels, out_channels]
因此,您遇到的错误的含义是 –
ValueError: Dimensions must be equal, but are 1 and 3 for 'Conv2D_94' (op: 'Conv2D') with input shapes: [?,100,150,1], [3,3,3,32].
[?,100,150,1]
表示in_channels
的值为1,而[3,3,3,32]
表示in_channels
的值为3。这就是您得到错误的原因 – Dimensions must be equal, but are 1 and 3
。
因此,您可以将滤波器的形状更改为[3, 3, 1, 32]
。