我正在尝试在一个图像集上训练一个CNN。有两个文件夹:training_set和test_set,每个文件夹包含两个类。它们看起来像这样:
training_set/ classA/ img1.png img2.png ... classB/ img1.png img2.png ...
test_set/ classA/ img1.png img2.png ... classB/ img1.png img2.png ...
代码看起来像这样,其中训练集被分为训练和验证集:
import osimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.keras import layersfrom tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom tensorflow.python.client import device_lib import numpy as npimport matplotlib.pyplot as pltprint("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))print(device_lib.list_local_devices())# Set image propertiesimg_height = 369img_width = 496batch_size = 32# Import data set from directorytrain_images = tf.keras.preprocessing.image_dataset_from_directory( "path_to_training_set", labels='inferred', label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1' class_names = ['classA', 'classB'], color_mode = 'rgb', batch_size = batch_size, image_size = (img_height, img_width), shuffle = True, seed = 123, validation_split = 0.2, subset = "training")val_images = tf.keras.preprocessing.image_dataset_from_directory( "path_to_training_set", labels='inferred', label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1' class_names = ['classA', 'classB'], color_mode = 'rgb', batch_size = batch_size, image_size = (img_height, img_width), shuffle = True, seed = 123, validation_split = 0.2, subset = "validation")
然后:
from matplotlib import pyplotimg_height = 369img_width = 496epochs = 25model = tf.keras.Sequential()model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)))model.add(layers.MaxPooling2D((2, 2)))model.add(layers.Conv2D(64, (3, 3), activation='relu'))model.add(layers.MaxPooling2D((2, 2)))model.add(layers.Conv2D(64, (3, 3), activation='relu'))model.add(layers.MaxPooling2D((2, 2)))model.add(layers.Flatten())model.add(layers.Dense(64, activation='relu'))# Since we have two classes:model.add(layers.Dense(1, activation='sigmoid'))# BinaryCrossentropy because there are 2 classes optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)model.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), metrics=['accuracy'])# Feed the modelhistory = model.fit(train_images, epochs=epochs, batch_size=32, verbose=1, validation_data=val_images)# Plotacc = history.history['accuracy']val_acc = history.history['val_accuracy']loss = history.history['loss']val_loss = history.history['val_loss']epochs_range = range(epochs)plt.figure(figsize=(8, 8))plt.subplot(1, 2, 1)plt.plot(epochs_range, acc, label='Training Accuracy')plt.plot(epochs_range, val_acc, label='Validation Accuracy')plt.legend(loc='lower right')plt.title('Training and Validation Accuracy')plt.subplot(1, 2, 2)plt.plot(epochs_range, loss, label='Training Loss')plt.plot(epochs_range, val_loss, label='Validation Loss')plt.legend(loc='upper right')plt.title('Training and Validation Loss')plt.show()
现在模型已经训练好了,它显示了训练和验证的准确率和损失。我尝试使用以下代码加载我的测试集:
test_images = tf.keras.preprocessing.image_dataset_from_directory( "path_to_test_set", labels='inferred', label_mode="binary", class_names = ['classA', 'classB'], color_mode = 'rgb', batch_size = batch_size, # not really applicable as I want to use the whole set? image_size = (img_height, img_width), shuffle = True, seed = 123, validation_split = None)
但这是正确的方法吗?如何处理batch_size?我认为我会使用我的测试集评估模型,如下所示:
test_loss, test_acc = model.evaluate(test_images, verbose=2)print('\nTest accuracy:', test_acc)
但我认为这还不够,因为我还想要准确率、精确度、召回率和F1分数。我甚至不确定这里是否做对了(关于测试集的加载方式)。
所以基本上:如何加载我的测试集并计算准确率、精确度、召回率和F1分数?
回答:
您需要遍历数据,然后可以收集预测和真实类别。
predicted_probs = np.array([])true_classes = np.array([])for images, labels in test_images: predicted_probs = np.concatenate([predicted_probs, model(images)]) true_classes = np.concatenate([true_classes, labels.numpy()])
由于它们是Sigmoid输出,您需要使用一个阈值将其转换为类别,这里是0.5:
predicted_classes = [1 * (x[0]>=0.5) for x in predicted_probs]
之后您可以得到混淆矩阵等:
conf_matrix = tf.math.confusion_matrix(true_classes, predicted_classes)