深度学习:训练集表现良好而验证集表现差

我遇到了一个问题,我很难理解为什么会出现这种行为。

我尝试使用预训练的ResNet50(Keras)模型进行二元图像分类,我也构建了一个简单的CNN。我大约有8000张平衡的RGB图像,尺寸为200×200,我将这些图像分成了三个子集(训练集70%,验证集15%,测试集15%)。

我构建了一个基于keras.utils.Sequence的数据生成器来为我的模型提供数据。

我的问题是,模型在训练集上倾向于学习,但在验证集上的表现很差,无论是预训练的ResNet50还是简单的CNN。我尝试了多种方法来解决这个问题,但没有任何改善。

  • 训练集上是否使用数据增强(旋转)
  • 图像在[0,1]之间进行了归一化
  • 是否使用正则化器
  • 学习率的变化

这是一个获得的结果示例:

Epoch 1/200716/716 [==============================] - 320s 447ms/step - loss: 8.6096 - acc: 0.4728 - val_loss: 8.6140 - val_acc: 0.5335Epoch 00001: val_loss improved from inf to 8.61396, saving model to ../models_saved/resnet_adam_best.h5Epoch 2/200716/716 [==============================] - 287s 401ms/step - loss: 8.1217 - acc: 0.5906 - val_loss: 10.9314 - val_acc: 0.4632Epoch 00002: val_loss did not improve from 8.61396Epoch 3/200716/716 [==============================] - 249s 348ms/step - loss: 7.5357 - acc: 0.6695 - val_loss: 11.1432 - val_acc: 0.4657Epoch 00003: val_loss did not improve from 8.61396Epoch 4/200716/716 [==============================] - 284s 397ms/step - loss: 7.5092 - acc: 0.6828 - val_loss: 10.0665 - val_acc: 0.5351Epoch 00004: val_loss did not improve from 8.61396Epoch 5/200716/716 [==============================] - 261s 365ms/step - loss: 7.0679 - acc: 0.7102 - val_loss: 4.2205 - val_acc: 0.5351Epoch 00005: val_loss improved from 8.61396 to 4.22050, saving model to ../models_saved/resnet_adam_best.h5Epoch 6/200716/716 [==============================] - 285s 398ms/step - loss: 6.9945 - acc: 0.7161 - val_loss: 10.2276 - val_acc: 0.5335....

这是用于加载数据到模型中的类。

 class DataGenerator(keras.utils.Sequence):    def __init__(self, inputs,                 labels, img_size,                 input_shape,                 batch_size, num_classes,                 validation=False):        self.inputs = inputs        self.labels = labels        self.img_size = img_size        self.input_shape = input_shape        self.batch_size = batch_size        self.num_classes = num_classes        self.validation = validation        self.indexes = np.arange(len(self.inputs))        self.inc = 0    def __getitem__(self, index):        """Generate one batch of data        Parameters        ----------        index :the index from which batch will be taken        Returns        -------        out : a tuple that contains (inputs and labels associated)        """        batch_inputs = np.zeros((self.batch_size, *self.input_shape))        batch_labels = np.zeros((self.batch_size, self.num_classes))        # Generate data        for i in range(self.batch_size):            # choose random index in features            if self.validation:                index = self.indexes[self.inc]                self.inc += 1                if self.inc == len(self.inputs):                    self.inc = 0            else:                index = random.randint(0, len(self.inputs) - 1)            batch_inputs[i] = self.rgb_processing(self.inputs[index])            batch_labels[i] = to_categorical(self.labels[index], num_classes=self.num_classes)        return batch_inputs, batch_labels    def __len__(self):        """Denotes the number of batches per epoch        Returns        -------        out : number of batches per epochs        """        return int(np.floor(len(self.inputs) / self.batch_size))    def rgb_processing(self, path):        img = load_img(path)        rgb = img.get_rgb_array()        if not self.validation:            if random.choice([True, False]):                rgb = random_rotation(rgb)        return rgb/np.max(rgb)class Models:    def __init__(self, input_shape, classes):        self.input_shape = input_shape        self.classes = classes        pass    def simpleCNN(self, optimizer):        model = Sequential()        model.add(Conv2D(32, kernel_size=(3, 3),                         activation='relu',                         input_shape=self.input_shape))        model.add(Conv2D(64, (3, 3), activation='relu'))        model.add(MaxPooling2D(pool_size=(2, 2)))        model.add(Dropout(0.25))        model.add(Flatten())        model.add(Dense(128, activation='relu'))        model.add(Dropout(0.5))        model.add(Dense(len(self.classes), activation='softmax'))        model.compile(loss=keras.losses.binary_crossentropy,                      optimizer=optimizer,                      metrics=['accuracy'])        return model    def resnet50(self, optimizer):        model = keras.applications.resnet50.ResNet50(include_top=False,                                                     input_shape=self.input_shape,                                                     weights='imagenet')        model.summary()        model.layers.pop()        model.summary()        for layer in model.layers:            layer.trainable = False        output = Flatten()(model.output)        #I also tried to add dropout layers here with batch normalization but it does not change results           output = Dense(len(self.classes), activation='softmax')(output)        finetuned_model = Model(inputs=model.input,                                outputs=output)        finetuned_model.compile(optimizer=optimizer,                                loss=keras.losses.binary_crossentropy,                                metrics=['accuracy'])        return finetuned_model

这些函数的调用方式如下:

train_batches = DataGenerator(inputs=train.X.values,                              labels=train.y.values,                              img_size=img_size,                              input_shape=input_shape,                              batch_size=batch_size,                              num_classes=len(CLASSES))validate_batches = DataGenerator(inputs=validate.X.values,                                 labels=validate.y.values,                                 img_size=img_size,                                 input_shape=input_shape,                                 batch_size=batch_size,                                 num_classes=len(CLASSES),                                 validation=True)if model_name == "cnn":    model = models.simpleCNN(optimizer=Adam(lr=0.0001))elif model_name == "resnet":    model = models.resnet50(optimizer=Adam(lr=0.0001))early_stopping = EarlyStopping(patience=15)checkpointer = ModelCheckpoint(output_name + '_best.h5', verbose=1, save_best_only=True)        history = model.fit_generator(train_batches, steps_per_epoch=num_train_steps, epochs=epochs,                                  callbacks=[early_stopping, checkpointer], validation_data=validate_batches,                                  validation_steps=num_valid_steps)

回答:

我最终找到了导致过拟合的主要因素。由于我使用了预训练模型,我将层设置为不可训练。因此我尝试将它们设置为可训练,看来这个问题解决了。

       for layer in model.layers:        layer.trainable = False

我的假设是,我的图像与用于训练模型的数据相差太远。

我还在ResNet模型的末端添加了一些dropout和批量归一化。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注