使用TensorFlow进行迁移学习时的验证和评估指标问题

我在使用TensorFlow 2.0训练CNN时遇到了一些奇怪的行为，希望能得到一些帮助来解决这些问题。我正在进行迁移学习（只训练分类头），使用’tensorflow.keras.applications’中提供的预训练网络，并注意到了以下几点：

在第一个epoch中，无论我做什么，验证指标总是零。
在第一个epoch之后的训练中，训练指标如预期那样有所改善，但验证指标基本上是随机猜测，即使使用完全相同的训练和验证数据集。好像它并没有使用正在训练的模型进行评估。

我尝试了VGG16、MobileNetV2和ResNet50V2，它们都表现出了相同的行为。

我能够在以下配置上重现这个问题：

Ubuntu 18.04LTS，Nvidia RTX2080ti，驱动版本430.50，CUDA10.0，TensorFlow-gpu==2.0.0
MacBook Pro，TensorFlow==2.0.0（cpu）

两者都在Conda环境中运行，我是通过pip安装的TensorFlow。万一我做了一些明显愚蠢的事情，我在下面提供了一些示例代码来展示我的工作流程的本质。任何帮助都将非常感激，因为我不知道如何修复它。

def parse_function(example_proto):    image_feature_description = {        'label': tf.io.FixedLenFeature([], tf.int64),        'image_raw': tf.io.FixedLenFeature([], tf.string)    }    parsed_example = tf.io.parse_single_example(example_proto, image_feature_description)    image = tf.io.decode_image(                            parsed_example['image_raw'],                             channels = 3,                             dtype = tf.float32,                             expand_animations = False                            )    image = tf.image.per_image_standardization(image)    label = tf.one_hot(parsed_example['label'], 24, dtype=tf.float32)     return (image, label)def load_dataset(TFRecord_dir, record_name):    record_files = tf.io.matching_files(os.path.join(TFRecord_dir, record_name + '.tfrecords-????'))    shards = tf.data.TFRecordDataset(record_files)    shards = shards.shuffle(tf.cast(tf.shape(record_files)[0], tf.int64))    dataset = shards.map(map_func=parse_function)    dataset = dataset.batch(batch_size=16, drop_remainder = True)    dataset = dataset.prefetch(16)    return datasetbase_model = tf.keras.applications.ResNet50V2(                                            input_shape=(224,224,3),                                            weights='imagenet',                                            include_top = False                                            )base_model.trainable = Falsemodel = tf.keras.Sequential([        base_model,        tf.keras.layers.GlobalAveragePooling2D(),        tf.keras.layers.Dropout(0.5),        tf.keras.layers.Dense(24, activation = 'softmax')        ])model.compile(    optimizer=tf.keras.optimizers.Adam(),    loss=tf.keras.losses.CategoricalCrossentropy(),    metrics=[             tf.keras.metrics.CategoricalAccuracy(),            tf.keras.metrics.TopKCategoricalAccuracy(),            tf.keras.metrics.Precision(),            tf.keras.metrics.Recall()            ])train_dataset = load_dataset(train_dir, 'train')model.fit(train_dataset,                verbose = 1,                epochs= 5,                validation_data = train_dataset)model.evaluate(train_dataset)

回答：

自从我开始使用提供的Docker镜像后，这个问题就不再出现了。一定是安装了什么东西出错了，但我不知道是什么。

另外，对于任何遇到相同问题的人，我在调试过程中发现，如果你使用image = (image/127.5) - 1来标准化图像，如预训练CNN的迁移学习教程中那样，请更改为image = tf.image.per_image_standardization(image)，因为它表现出了相同的行为，即使在Docker容器中也是如此，即训练指标会有所改善，但验证指标在使用相同的数据集进行训练时仍然是随机的。

学技术

使用TensorFlow进行迁移学习时的验证和评估指标问题

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复