Tensorflow对象检测：自定义数据未能进行预测

我使用这个仓库作为基础代码（https://github.com/ndaidong/tf-object-detection），它使用Tensorflow对象检测来检测评论截图中的评论、日期、点赞数和评分。

我处理了100张图片（只是想测试这是否可行），用4个标签（评论、日期、点赞数和评分）对图片进行标注，将XML格式转换为CSV格式，然后生成TFrecords。这对于训练和评估数据都完成了。100张图片用于训练，20张图片用于评估。以下是我标注的截图。点击此处查看标注图片的示例

对于训练，我使用了以下配置

model {  ssd {    num_classes: 4    image_resizer {      fixed_shape_resizer {      height: 500      width: 2000      }    }    feature_extractor {      type: "ssd_mobilenet_v2"      depth_multiplier: 1.0      min_depth: 16      conv_hyperparams {        regularizer {          l2_regularizer {            weight: 3.99999989895e-05          }        }        initializer {          truncated_normal_initializer {            mean: 0.0            stddev: 0.0299999993294          }        }        activation: RELU_6        batch_norm {          decay: 0.999700009823          center: true          scale: true          epsilon: 0.0010000000475          train: true        }      }      use_depthwise: true    }    box_coder {      faster_rcnn_box_coder {        y_scale: 10.0        x_scale: 10.0        height_scale: 5.0        width_scale: 5.0      }    }    matcher {      argmax_matcher {        matched_threshold: 0.5        unmatched_threshold: 0.5        ignore_thresholds: false        negatives_lower_than_unmatched: true        force_match_for_each_row: true      }    }    similarity_calculator {      iou_similarity {      }    }    box_predictor {      convolutional_box_predictor {        conv_hyperparams {          regularizer {            l2_regularizer {              weight: 3.99999989895e-05            }          }          initializer {            truncated_normal_initializer {              mean: 0.0              stddev: 0.0299999993294            }          }          activation: RELU_6          batch_norm {            decay: 0.999700009823            center: true            scale: true            epsilon: 0.0010000000475            train: true          }        }        min_depth: 0        max_depth: 0        num_layers_before_predictor: 0        use_dropout: false        dropout_keep_probability: 0.800000011921        kernel_size: 3        box_code_size: 4        apply_sigmoid_to_scores: false      }    }    anchor_generator {      ssd_anchor_generator {        num_layers: 6        min_scale: 0.20000000298        max_scale: 0.949999988079        aspect_ratios: 1.0        aspect_ratios: 2.0        aspect_ratios: 0.5        aspect_ratios: 3.0        aspect_ratios: 0.333299994469      }    }    post_processing {      batch_non_max_suppression {        score_threshold: 0.300000011921        iou_threshold: 0.600000023842        max_detections_per_class: 100        max_total_detections: 100      }      score_converter: SIGMOID    }    normalize_loss_by_num_matches: true    loss {      localization_loss {        weighted_smooth_l1 {        }      }      classification_loss {        weighted_sigmoid {        }      }      hard_example_miner {        num_hard_examples: 3000        iou_threshold: 0.990000009537        loss_type: CLASSIFICATION        max_negatives_per_positive: 3        min_negatives_per_image: 3      }      classification_weight: 1.0      localization_weight: 1.0    }  }}train_config {  batch_size: 35  data_augmentation_options {    random_horizontal_flip {    }  }  data_augmentation_options {    ssd_random_crop {    }  }  optimizer {    rms_prop_optimizer {      learning_rate {        exponential_decay_learning_rate {          initial_learning_rate: 0.00400000018999          decay_steps: 800720          decay_factor: 0.949999988079        }      }      momentum_optimizer_value: 0.899999976158      decay: 0.899999976158      epsilon: 1.0    }  }  num_steps: 200}train_input_reader: {  tf_record_input_reader {    input_path: "temp/data/train.record"  }  label_map_path: "configs/reviews/labels.pbtxt"}eval_config {  num_examples: 20  max_evals: 10  use_moving_averages: false}eval_input_reader: {  tf_record_input_reader {    input_path: "temp/data/test.record"  }  label_map_path: "configs/reviews/labels.pbtxt"  shuffle: false  num_readers: 1}

如你所见，我尝试从头开始训练数据，而不是使用现有模型。这样做的原因是我不是在寻找已经训练过的通用对象。

我调整了fixed_shape_resizer，因为评论图片的尺寸大约是宽2000像素，高500像素。

我只用了200个步骤进行训练（我需要做更多吗？），因为我在Tensorboard上注意到，在几百个步骤后它开始学习，’Loss’结果开始显著下降。点击此处查看Tensorboard结果

然而，当我导出/冻结图形并尝试预测时，却什么也没预测出来。看起来一切正常。我做错了什么吗？

回答：

使用这个API涉及许多步骤，所以不确定你可能在哪个步骤上犯了错误，但我觉得如果你是从头开始训练对象检测模型，那么你需要一个庞大的数据集。100张图片是一个非常小的数据集。你可以看看预训练的对象检测模型（如ResNet、ImageNet）所使用的训练数据集的规模，它们的数据集非常庞大。因此，要获得结果，你需要使用预训练模型或更大的数据集。为了增加数据集，你可以在API之外对现有图片进行图像增强，并使用算法调整它们的标注。当我做类似的练习时，我使用了大约10,000张图片和预训练模型以及图像增强技术。

如果你使用labelImg进行图像标注，请注意工具中的这个问题

学技术

Tensorflow对象检测：自定义数据未能进行预测

发表回复取消回复

相关文章：

Related Posts

为什么我们在K-means聚类方法中使用kmeans.fit函数？

如何获取Keras中ImageDataGenerator的.flow_from_directory函数扫描的类名？

如何查看每个词的tf-idf得分

如何修复 ‘ValueError: Found input variables with inconsistent numbers of samples: [32979, 21602]’？

如何向神经网络输入两个不同大小的输入？

逻辑回归与机器学习有何关联

发表回复 取消回复

发表回复取消回复