首先,我知道这里有一个类似的讨论:https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn
但遗憾的是,这对我没有帮助。我的代码中可能存在一个我找不到的错误。我试图对一些WAV文件进行分类,但模型无法学习。
首先,我收集文件并将它们保存到一个数组中。
其次,创建新的目录,一个用于训练数据,一个用于验证数据。接下来,我读取WAV文件,创建频谱图,并将它们全部保存到训练目录中。之后,我将训练目录中20%的数据移动到验证目录中。
注意:在创建频谱图时,我会检查WAV文件的长度。如果它太短(少于2秒),我会将它翻倍。从这个频谱图中,我会随机裁剪一块并只保存这一块。因此,所有图像的高度和宽度都是相同的。
然后,在下一步中,我加载训练和验证图像。在这里,我还进行了归一化处理。
IMG_WIDTH=300IMG_HEIGHT=300IMG_DIM = (IMG_WIDTH, IMG_HEIGHT, 3)train_files = glob.glob(DBMEL_PATH + "*",recursive=True)train_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in train_files]train_imgs = np.array(train_imgs) / 255 # normalizing Datatrain_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in train_files]validation_files = glob.glob(DBMEL_VAL_PATH + "*",recursive=True)validation_imgs = [img_to_array(load_img(img, target_size=IMG_DIM)) for img in validation_files]validation_imgs = np.array(validation_imgs) / 255 # normalizing Datavalidation_labels = [fn.split('\\')[-1].split('.')[1].strip() for fn in validation_files]
我已经检查了变量并打印了它们。我认为这部分工作得相当好。数组包含了总数据的80%和20%。
#Train dataset shape: (3756, 300, 300, 3) #Validation dataset shape: (939, 300, 300, 3)
接下来,我也实现了一个独热编码器。到目前为止,一切顺利。下一步我创建了空的数据生成器,因此没有进行任何数据增强。当调用数据生成器时,一次用于训练数据,一次用于验证数据,我会传递图像数组(train_imgs, validation_imgs)和独热编码标签(train_labels_enc, validation_labels_enc)。
好了,现在来到了棘手的部分。首先,创建/加载一个预训练网络
from tensorflow.keras.applications.resnet50 import ResNet50from tensorflow.keras.models import Modelimport tensorflow.kerasinput_shape=(IMG_HEIGHT,IMG_WIDTH,3)restnet = ResNet50(include_top=False, weights='imagenet', input_shape=(IMG_HEIGHT,IMG_WIDTH,3))output = restnet.layers[-1].outputoutput = tensorflow.keras.layers.Flatten()(output)restnet = Model(restnet.input, output)for layer in restnet.layers: layer.trainable = False
现在终于要创建模型本身了。在创建模型时,我使用预训练网络进行迁移学习。我猜测这里一定有问题。
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, InputLayerfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras import optimizersmodel = Sequential()model.add(restnet) # <-- transfer learningmodel.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)model.add(Dropout(0.3))model.add(Dense(512, activation='relu'))model.add(Dropout(0.3))model.add(Dense(7, activation='softmax'))model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary()
模型运行如下
history = model.fit_generator(train_generator, steps_per_epoch=100, epochs=100, validation_data=val_generator, validation_steps=10, verbose=1 )
但即使经过50个周期,准确率仍停留在0.15左右
Epoch 1/100100/100 [==============================] - 711s 7s/step - loss: 10.6419 - accuracy: 0.1530 - val_loss: 1.9416 - val_accuracy: 0.1467Epoch 2/100100/100 [==============================] - 733s 7s/step - loss: 1.9595 - accuracy: 0.1550 - val_loss: 1.9372 - val_accuracy: 0.1267Epoch 3/100100/100 [==============================] - 731s 7s/step - loss: 1.9940 - accuracy: 0.1444 - val_loss: 1.9388 - val_accuracy: 0.1400Epoch 4/100100/100 [==============================] - 735s 7s/step - loss: 1.9416 - accuracy: 0.1535 - val_loss: 1.9380 - val_accuracy: 0.1733Epoch 5/100100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1656 - val_loss: 1.9345 - val_accuracy: 0.1533Epoch 6/100100/100 [==============================] - 741s 7s/step - loss: 1.9364 - accuracy: 0.1667 - val_loss: 1.9286 - val_accuracy: 0.1767Epoch 7/100100/100 [==============================] - 740s 7s/step - loss: 1.9389 - accuracy: 0.1523 - val_loss: 1.9305 - val_accuracy: 0.1400Epoch 8/100100/100 [==============================] - 737s 7s/step - loss: 1.9394 - accuracy: 0.1623 - val_loss: 1.9441 - val_accuracy: 0.1667Epoch 9/100100/100 [==============================] - 735s 7s/step - loss: 1.9391 - accuracy: 0.1582 - val_loss: 1.9458 - val_accuracy: 0.1333Epoch 10/100100/100 [==============================] - 734s 7s/step - loss: 1.9381 - accuracy: 0.1602 - val_loss: 1.9372 - val_accuracy: 0.1700Epoch 11/100100/100 [==============================] - 739s 7s/step - loss: 1.9392 - accuracy: 0.1623 - val_loss: 1.9302 - val_accuracy: 0.2167Epoch 12/100100/100 [==============================] - 741s 7s/step - loss: 1.9368 - accuracy: 0.1627 - val_loss: 1.9326 - val_accuracy: 0.1467Epoch 13/100100/100 [==============================] - 740s 7s/step - loss: 1.9381 - accuracy: 0.1513 - val_loss: 1.9312 - val_accuracy: 0.1733Epoch 14/100100/100 [==============================] - 736s 7s/step - loss: 1.9396 - accuracy: 0.1542 - val_loss: 1.9407 - val_accuracy: 0.1367Epoch 15/100100/100 [==============================] - 741s 7s/step - loss: 1.9393 - accuracy: 0.1597 - val_loss: 1.9336 - val_accuracy: 0.1333
请问有人能帮助我找到问题吗?
回答:
我自己解决了这个问题。
我替换了这段代码
model = Sequential()model.add(restnet) # <-- transfer learningmodel.add(Dense(512, activation='relu', input_dim=input_shape))# 512 (num_classes)model.add(Dropout(0.3))model.add(Dense(512, activation='relu'))model.add(Dropout(0.3))model.add(Dense(7, activation='softmax'))model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary()
替换为以下代码:
base_model = tf.keras.applications.MobileNetV2(input_shape = (224, 224, 3), include_top = False, weights = "imagenet")model = Sequential()model.add(base_model)model.add(tf.keras.layers.GlobalAveragePooling2D())model.add(Dropout(0.2))model.add(Dense(number_classes, activation="softmax"))model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001), loss="categorical_crossentropy", metrics=['accuracy'])model.summary()
我还发现了一件事。与一些教程相反,使用数据增强在处理频谱图时并不有用。没有数据增强时,我在训练准确率上得到了0.99,在验证准确率上得到了0.72。但使用数据增强后,我在训练准确率上只得到了0.75,在验证准确率上得到了0.16。