我有三个数据框用于机器学习程序的训练、验证和测试。这些数据框是从读取csv文件的pandas数据框中分割出来的。以下是该文件的一个样本:
id,label904797024fe2c8ebe4c12f54baf34c62c05ec1ff,10ad0a93569e96a95ed1e777b983452e9dbd445f9,0
ID是图像文件名,不包括扩展名.tif
。
之前,我的训练和验证数据在同一个数据框中,但为了避免方差,我将它们分成了两个数据框。
这是我之前的代码:
train_datagen = ImageDataGenerator(rescale = 1./255, validation_split = 0.1)test_datagen = ImageDataGenerator(rescale = 1./255)train_val_path = "../input/train/"train_generator = train_datagen.flow_from_dataframe( dataframe = df_train_val, directory = train_val_path, x_col = "id", y_col = "label", subset = "training", target_size = (96, 96), batch_size = 32, class_mode="binary", validate_filenames=False )validation_generator = train_datagen.flow_from_dataframe(dataframe = df_train_val, directory = train_val_path, x_col = "id", y_col = "label", subset = "validation", target_size = (96, 96), batch_size = 32, class_mode="binary", validate_filenames=False )
如您在第一行所见,ImageDataGenerator
设置了0.1的验证分割。如果我已经进行了这种分割,我该如何调整代码使其工作?
回答:
默认情况下,验证分割为0.0,即没有样本用于验证。
train_datagen = ImageDataGenerator(rescale = 1./255)test_datagen = ImageDataGenerator(rescale = 1./255) # 不需要train_val_path = "../input/train/"train_generator = train_datagen.flow_from_dataframe( dataframe = df_train_val, directory = train_val_path, x_col = "id", y_col = "label", subset = None, # 因为未指定验证分割 target_size = (96, 96), batch_size = 32, class_mode="binary", validate_filenames=False )validation_generator = train_datagen.flow_from_dataframe(dataframe = df_train_val, # 传递测试数据框 directory = train_val_path, x_col = "id", y_col = "label", subset = None, # 因为未指定验证分割 target_size = (96, 96), batch_size = 32, class_mode="binary", validate_filenames=False )