Tensorflow Slim – 模型训练正常但评估时总是预测相同的结果

https://kwotsin.github.io/tech/2017/02/11/transfer-learning.html我按照上面的链接制作了一个图像分类器

训练代码:

slim = tf.contrib.slimdataset_dir = './data'log_dir = './log'checkpoint_file = './inception_resnet_v2_2016_08_30.ckpt'image_size = 299num_classes = 21vlabels_file = './labels.txt'labels = open(labels_file, 'r')labels_to_name = {}for line in labels:    label, string_name = line.split(':')    string_name = string_name[:-1]    labels_to_name[int(label)] = string_namefile_pattern = 'test_%s_*.tfrecord'items_to_descriptions = {    'image': 'A 3-channel RGB coloured product image',    'label': 'A label that from 20 labels'}num_epochs = 10batch_size = 16initial_learning_rate = 0.001learning_rate_decay_factor = 0.7num_epochs_before_decay = 4def get_split(split_name, dataset_dir, file_pattern=file_pattern, file_pattern_for_counting='products'):    if split_name not in ['train', 'validation']:        raise ValueError(            'The split_name %s is not recognized. Please input either train or validation as the split_name' % (            split_name))    file_pattern_path = os.path.join(dataset_dir, file_pattern % (split_name))    num_samples = 0    file_pattern_for_counting = file_pattern_for_counting + '_' + split_name    tfrecords_to_count = [os.path.join(dataset_dir, file) for file in os.listdir(dataset_dir) if                          file.startswith(file_pattern_for_counting)]    for tfrecord_file in tfrecords_to_count:        for record in tf.python_io.tf_record_iterator(tfrecord_file):            num_samples += 1    test = num_samples    reader = tf.TFRecordReader    keys_to_features = {        'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''),        'image/format': tf.FixedLenFeature((), tf.string, default_value='jpg'),        'image/class/label': tf.FixedLenFeature(            [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)),    }    items_to_handlers = {        'image': slim.tfexample_decoder.Image(),        'label': slim.tfexample_decoder.Tensor('image/class/label'),    }    decoder = slim.tfexample_decoder.TFExampleDecoder(keys_to_features, items_to_handlers)    labels_to_name_dict = labels_to_name    dataset = slim.dataset.Dataset(        data_sources=file_pattern_path,        decoder=decoder,        reader=reader,        num_readers=4,        num_samples=num_samples,        num_classes=num_classes,        labels_to_name=labels_to_name_dict,        items_to_descriptions=items_to_descriptions)    return datasetdef load_batch(dataset, batch_size, height=image_size, width=image_size, is_training=True):    '''    Loads a batch for training.    INPUTS:    - dataset(Dataset): a Dataset class object that is created from the get_split function    - batch_size(int): determines how big of a batch to train    - height(int): the height of the image to resize to during preprocessing    - width(int): the width of the image to resize to during preprocessing    - is_training(bool): to determine whether to perform a training or evaluation preprocessing    OUTPUTS:    - images(Tensor): a Tensor of the shape (batch_size, height, width, channels) that contain one batch of images    - labels(Tensor): the batch's labels with the shape (batch_size,) (requires one_hot_encoding).    '''    # First create the data_provider object    data_provider = slim.dataset_data_provider.DatasetDataProvider(        dataset,        common_queue_capacity=24 + 3 * batch_size,        common_queue_min=24)    # Obtain the raw image using the get method    raw_image, label = data_provider.get(['image', 'label'])    # Perform the correct preprocessing for this image depending if it is training or evaluating    image = inception_preprocessing.preprocess_image(raw_image, height, width, is_training)    # As for the raw images, we just do a simple reshape to batch it up    raw_image = tf.expand_dims(raw_image, 0)    raw_image = tf.image.resize_nearest_neighbor(raw_image, [height, width])    raw_image = tf.squeeze(raw_image)    # Batch up the image by enqueing the tensors internally in a FIFO queue and dequeueing many elements with tf.train.batch.    images, raw_images, labels = tf.train.batch(        [image, raw_image, label],        batch_size=batch_size,        num_threads=4,        capacity=4 * batch_size,        allow_smaller_final_batch=True)    return images, raw_images, labelsdef run():    # Create the log directory here. Must be done here otherwise import will activate this unneededly.    if not os.path.exists(log_dir):        os.mkdir(log_dir)    # ======================= TRAINING PROCESS =========================    # Now we start to construct the graph and build our model    with tf.Graph().as_default() as graph:        tf.logging.set_verbosity(tf.logging.INFO)  # Set the verbosity to INFO level        # First create the dataset and load one batch        dataset = get_split('train', dataset_dir, file_pattern=file_pattern)        images, _, labels = load_batch(dataset, batch_size=batch_size)        # Know the number steps to take before decaying the learning rate and batches per epoch        num_batches_per_epoch = int(dataset.num_samples / batch_size)        num_steps_per_epoch = num_batches_per_epoch  # Because one step is one batch processed        decay_steps = int(num_epochs_before_decay * num_steps_per_epoch)        # Create the model inference        with slim.arg_scope(inception_resnet_v2_arg_scope()):            logits, end_points = inception_resnet_v2(images, num_classes=dataset.num_classes, is_training=True)        # Define the scopes that you want to exclude for restoration        exclude = ['InceptionResnetV2/Logits', 'InceptionResnetV2/AuxLogits']        variables_to_restore = slim.get_variables_to_restore(exclude=exclude)        # Perform one-hot-encoding of the labels (Try one-hot-encoding within the load_batch function!)        one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)        # Performs the equivalent to tf.nn.sparse_softmax_cross_entropy_with_logits but enhanced with checks        loss = tf.losses.softmax_cross_entropy(onehot_labels=one_hot_labels, logits=logits)        total_loss = tf.losses.get_total_loss()  # obtain the regularization losses as well        # Create the global step for monitoring the learning_rate and training.        global_step = get_or_create_global_step()        # Define your exponentially decaying learning rate        lr = tf.train.exponential_decay(            learning_rate=initial_learning_rate,            global_step=global_step,            decay_steps=decay_steps,            decay_rate=learning_rate_decay_factor,            staircase=True)        # Now we can define the optimizer that takes on the learning rate        optimizer = tf.train.AdamOptimizer(learning_rate=lr)        # Create the train_op.        train_op = slim.learning.create_train_op(total_loss, optimizer)        # State the metrics that you want to predict. We get a predictions that is not one_hot_encoded.        predictions = tf.argmax(end_points['Predictions'], 1)        probabilities = end_points['Predictions']        accuracy, accuracy_update = tf.contrib.metrics.streaming_accuracy(predictions, labels)        metrics_op = tf.group(accuracy_update, probabilities)        # Now finally create all the summaries you need to monitor and group them into one summary op.        tf.summary.scalar('losses/Total_Loss', total_loss)        tf.summary.scalar('accuracy', accuracy)        tf.summary.scalar('learning_rate', lr)        my_summary_op = tf.summary.merge_all()        # Now we need to create a training step function that runs both the train_op, metrics_op and updates the global_step concurrently.        def train_step(sess, train_op, global_step):            '''            Simply runs a session for the three arguments provided and gives a logging on the time elapsed for each global step            '''            # Check the time for each sess run            start_time = time.time()            total_loss, global_step_count, _ = sess.run([train_op, global_step, metrics_op])            time_elapsed = time.time() - start_time            # Run the logging to print some results            logging.info('global step %s: loss: %.4f (%.2f sec/step)', global_step_count, total_loss, time_elapsed)            return total_loss, global_step_count        # Now we create a saver function that actually restores the variables from a checkpoint file in a sess        saver = tf.train.Saver(variables_to_restore)        def restore_fn(sess):            return saver.restore(sess, checkpoint_file)        # Define your supervisor for running a managed session. Do not run the summary_op automatically or else it will consume too much memory        sv = tf.train.Supervisor(logdir=log_dir, summary_op=None, init_fn=restore_fn)        # Run the managed session        with sv.managed_session() as sess:            for step in xrange(num_steps_per_epoch * num_epochs):                # At the start of every epoch, show the vital information:                if step % num_batches_per_epoch == 0:                    logging.info('Epoch %s/%s', step / num_batches_per_epoch + 1, num_epochs)                    learning_rate_value, accuracy_value = sess.run([lr, accuracy])                    logging.info('Current Learning Rate: %s', learning_rate_value)                    logging.info('Current Streaming Accuracy: %s', accuracy_value)                    # optionally, print your logits and predictions for a sanity check that things are going fine.                    logits_value, probabilities_value, predictions_value, labels_value = sess.run(                        [logits, probabilities, predictions, labels])                    print 'logits: \n', logits_value                    print 'Probabilities: \n', probabilities_value                    print 'predictions: \n', predictions_value                    print 'Labels:\n:', labels_value                # Log the summaries every 10 step.                if step % 10 == 0:                    loss, _ = train_step(sess, train_op, sv.global_step)                    summaries = sess.run(my_summary_op)                    sv.summary_computed(sess, summaries)                # If not, simply run the training step                else:                    loss, _ = train_step(sess, train_op, sv.global_step)            # We log the final training loss and accuracy            logging.info('Final Loss: %s', loss)            logging.info('Final Accuracy: %s', sess.run(accuracy))            # Once all the training has been done, save the log files and checkpoint model            logging.info('Finished training! Saving model to disk now.')            sv.saver.save(sess, sv.save_path, global_step=sv.global_step)

这个代码看起来是有效的，我已经在一些样本数据上运行了训练，并且得到了94%的准确率

评估代码:

log_dir = './log'log_eval = './log_eval_test'dataset_dir = './data'batch_size = 10num_epochs = 1checkpoint_file = tf.train.latest_checkpoint('./')def run():    if not os.path.exists(log_eval):        os.mkdir(log_eval)    with tf.Graph().as_default() as graph:        tf.logging.set_verbosity(tf.logging.INFO)        dataset = get_split('train', dataset_dir)        images, raw_images, labels = load_batch(dataset, batch_size=batch_size, is_training=False)        num_batches_per_epoch = dataset.num_samples / batch_size        num_steps_per_epoch = num_batches_per_epoch        with slim.arg_scope(inception_resnet_v2_arg_scope()):            logits, end_points = inception_resnet_v2(images, num_classes=dataset.num_classes, is_training=False)        variables_to_restore = slim.get_variables_to_restore()        saver = tf.train.Saver(variables_to_restore)        def restore_fn(sess):            return saver.restore(sess, checkpoint_file)        predictions = tf.argmax(end_points['Predictions'], 1)        accuracy, accuracy_update = tf.contrib.metrics.streaming_accuracy(predictions, labels)        metrics_op = tf.group(accuracy_update)        global_step = get_or_create_global_step()        global_step_op = tf.assign(global_step, global_step + 1)        def eval_step(sess, metrics_op, global_step):            '''            Simply takes in a session, runs the metrics op and some logging information.            '''            start_time = time.time()            _, global_step_count, accuracy_value = sess.run([metrics_op, global_step_op, accuracy])            time_elapsed = time.time() - start_time            logging.info('Global Step %s: Streaming Accuracy: %.4f (%.2f sec/step)', global_step_count, accuracy_value,                         time_elapsed)            return accuracy_value        tf.summary.scalar('Validation_Accuracy', accuracy)        my_summary_op = tf.summary.merge_all()        sv = tf.train.Supervisor(logdir=log_eval, summary_op=None, saver=None, init_fn=restore_fn)        with sv.managed_session() as sess:            for step in xrange(num_steps_per_epoch * num_epochs):                sess.run(sv.global_step)                if step % num_batches_per_epoch == 0:                    logging.info('Epoch: %s/%s', step / num_batches_per_epoch + 1, num_epochs)                    logging.info('Current Streaming Accuracy: %.4f', sess.run(accuracy))                if step % 10 == 0:                    eval_step(sess, metrics_op=metrics_op, global_step=sv.global_step)                    summaries = sess.run(my_summary_op)                    sv.summary_computed(sess, summaries)                else:                    eval_step(sess, metrics_op=metrics_op, global_step=sv.global_step)            logging.info('Final Streaming Accuracy: %.4f', sess.run(accuracy))            raw_images, labels, predictions = sess.run([raw_images, labels, predictions])            for i in range(10):                image, label, prediction = raw_images[i], labels[i], predictions[i]                prediction_name, label_name = dataset.labels_to_name[prediction], dataset.labels_to_name[label]                text = 'Prediction: %s \n Ground Truth: %s' % (prediction_name, label_name)                img_plot = plt.imshow(image)                plt.title(text)                img_plot.axes.get_yaxis().set_ticks([])                img_plot.axes.get_xaxis().set_ticks([])                plt.show()            logging.info(                'Model evaluation has completed! Visit TensorBoard for more information regarding your evaluation.')

在训练模型并获得94%的准确率后，我尝试评估模型。在评估时，我始终得到0-1%的准确率。我调查了这个问题，发现它每次都在预测同一个类别

labels: [7, 11, 5, 1, 20, 0, 18, 1, 0, 7]predictions: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

有谁能帮我找出我可能犯的错误吗？

EDIT:

TensorBoard训练的准确率和损失

TensorBoard评估的准确率

EDIT:

我仍然无法解决这个问题。我以为在评估脚本中恢复图的方式可能有问题，所以我尝试使用以下方法来恢复模型

saver = tf.train.import_meta_graph('/log/model.ckpt.meta')def restore_fn(sess):    return saver.restore(sess, checkpoint_file)

而不是

variables_to_restore = slim.get_variables_to_restore()    saver = tf.train.Saver(variables_to_restore)def restore_fn(sess):    return saver.restore(sess, checkpoint_file)

这需要很长时间才能开始，最终报错。我还尝试在保存器中使用V1版本的写入器（saver = tf.train.Saver(variables_to_restore, write_version=saver_pb2.SaveDef.V1)），并重新训练，结果无法加载此检查点，因为它说变量缺失。

我也尝试使用训练时相同的数据运行我的评估脚本，看看这是否会产生不同的结果，但我得到的结果还是一样的。

最后，我重新克隆了教程中的同一数据集的仓库，并运行了一个训练，结果在评估时得到0-3%的准确率，即使在训练时达到了84%。此外，我的检查点必须包含正确的信息，因为当我重新开始训练时，准确率会从上次停止的地方继续。这感觉像是我在恢复模型时没有正确执行某些操作。目前我已经走投无路了，非常希望能得到任何建议:(

回答：

我最终解决了我的问题。这听起来很奇怪，但加载模型时的is_training参数需要在训练脚本和评估脚本中都设置为False，或者都设置为True。这是由于当is_training为False时，BatchNormalization会被移除所致。

这一点可以在tensorflow/tensorflow的GitHub讨论中得到验证，链接如下：https://github.com/tensorflow/models/issues/391#issuecomment-247392028

还有在这个slim walkthrough Jupyter笔记本中也有提到：https://github.com/tensorflow/models/blob/master/slim/slim_walkthrough.ipynb enter link description here

如果你滚动到页面底部，找到标题为’Apply fine tuned model to some images’的部分，你会看到一个代码块，展示了如何重新加载一个经过微调的预训练模型。当他们加载模型时，你会看到这行代码，以及解释的注释:

# Create the model, use the default arg scope to configure the batch norm parameters.with slim.arg_scope(inception.inception_v1_arg_scope()):logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)

尽管这是Inception_v1，但原理是相同的，这表明将两者都设置为False或True是有效的，但你不能将一个设置为与另一个不同，除非你编辑slim中的inception_resnet_v2.py代码

学技术

Tensorflow Slim – 模型训练正常但评估时总是预测相同的结果

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复