孪生网络输出 – 学技术

我正在尝试在Caffe中实现一个孪生网络，该网络由两个不共享权重的ImageNet组成。我的基本目标是为每个网络提供一张图像，并最终尝试找出它们之间的距离以确定相似性。以下是我的prototxt文件。我的主要问题是应该将“num_output”设置为多少？我有两个训练类别，0表示它们不相似，1表示它们相似。

name: "Siamese_ImageNet"layers {  name: "data"  type: IMAGE_DATA  top: "data"  top: "label"  image_data_param {    source: "train1.txt"    batch_size: 32    new_height: 256    new_width: 256  }  include: { phase: TRAIN }}layers {  name: "data"  type: IMAGE_DATA  top: "data"  top: "label"  image_data_param {    source: "test1.txt"    batch_size: 32    new_height: 256    new_width: 256  }  include: { phase: TEST }}layers {  name: "data_p"  type: IMAGE_DATA  top: "data_p"  top: "label_p"  image_data_param {    source: "train2.txt"    batch_size: 32    new_height: 256    new_width: 256  }  include: { phase: TRAIN }}layers {  name: "data_p"  type: IMAGE_DATA  top: "data_p"  top: "label_p"  image_data_param {    source: "test2.txt"    batch_size: 32    new_height: 256    new_width: 256  }  include: { phase: TEST }}layers {  name: "conv1"  type: CONVOLUTION  bottom: "data"  top: "conv1"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 96    kernel_size: 11    stride: 4    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layers {  name: "relu1"  type: RELU  bottom: "conv1"  top: "conv1"}layers {  name: "pool1"  type: POOLING  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "norm1"  type: LRN  bottom: "pool1"  top: "norm1"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layers {  name: "conv2"  type: CONVOLUTION  bottom: "norm1"  top: "conv2"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 256    pad: 2    kernel_size: 5    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu2"  type: RELU  bottom: "conv2"  top: "conv2"}layers {  name: "pool2"  type: POOLING  bottom: "conv2"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "norm2"  type: LRN  bottom: "pool2"  top: "norm2"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layers {  name: "conv3"  type: CONVOLUTION  bottom: "norm2"  top: "conv3"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 384    pad: 1    kernel_size: 3    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layers {  name: "relu3"  type: RELU  bottom: "conv3"  top: "conv3"}layers {  name: "conv4"  type: CONVOLUTION  bottom: "conv3"  top: "conv4"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 384    pad: 1    kernel_size: 3    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu4"  type: RELU  bottom: "conv4"  top: "conv4"}layers {  name: "conv5"  type: CONVOLUTION  bottom: "conv4"  top: "conv5"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 256    pad: 1    kernel_size: 3    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu5"  type: RELU  bottom: "conv5"  top: "conv5"}layers {  name: "pool5"  type: POOLING  bottom: "conv5"  top: "pool5"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "fc6"  type: INNER_PRODUCT  bottom: "pool5"  top: "fc6"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  inner_product_param {    num_output: 4096    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu6"  type: RELU  bottom: "fc6"  top: "fc6"}layers {  name: "drop6"  type: DROPOUT  bottom: "fc6"  top: "fc6"  dropout_param {    dropout_ratio: 0.5  }}layers {  name: "fc7"  type: INNER_PRODUCT  bottom: "fc6"  top: "fc7"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  inner_product_param {    num_output: 2    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu7"  type: RELU  bottom: "fc7"  top: "fc7"}layers {  name: "drop7"  type: DROPOUT  bottom: "fc7"  top: "fc7"  dropout_param {    dropout_ratio: 0.5  }}layers {  name: "conv1_p"  type: CONVOLUTION  bottom: "data_p"  top: "conv1_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 96    kernel_size: 11    stride: 4    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layers {  name: "relu1_p"  type: RELU  bottom: "conv1_p"  top: "conv1_p"}layers {  name: "pool1_p"  type: POOLING  bottom: "conv1_p"  top: "pool1_p"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "norm1_p"  type: LRN  bottom: "pool1_p"  top: "norm1_p"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layers {  name: "conv2_p"  type: CONVOLUTION  bottom: "norm1_p"  top: "conv2_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 256    pad: 2    kernel_size: 5    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu2_p"  type: RELU  bottom: "conv2_p"  top: "conv2_p"}layers {  name: "pool2_p"  type: POOLING  bottom: "conv2_p"  top: "pool2_p"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "norm2_p"  type: LRN  bottom: "pool2_p"  top: "norm2_p"  lrn_param {    local_size: 5    alpha: 0.0001    beta: 0.75  }}layers {  name: "conv3_p"  type: CONVOLUTION  bottom: "norm2_p"  top: "conv3_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 384    pad: 1    kernel_size: 3    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layers {  name: "relu3_p"  type: RELU  bottom: "conv3_p"  top: "conv3_p"}layers {  name: "conv4_p"  type: CONVOLUTION  bottom: "conv3_p"  top: "conv4_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 384    pad: 1    kernel_size: 3    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu4_p"  type: RELU  bottom: "conv4_p"  top: "conv4_p"}layers {  name: "conv5_p"  type: CONVOLUTION  bottom: "conv4_p"  top: "conv5_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  convolution_param {    num_output: 256    pad: 1    kernel_size: 3    group: 2    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu5_p"  type: RELU  bottom: "conv5_p"  top: "conv5_p"}layers {  name: "pool5_p"  type: POOLING  bottom: "conv5_p"  top: "pool5_p"  pooling_param {    pool: MAX    kernel_size: 3    stride: 2  }}layers {  name: "fc6_p"  type: INNER_PRODUCT  bottom: "pool5_p"  top: "fc6_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  inner_product_param {    num_output: 4096    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu6_p"  type: RELU  bottom: "fc6_p"  top: "fc6_p"}layers {  name: "drop6_p"  type: DROPOUT  bottom: "fc6_p"  top: "fc6_p"  dropout_param {    dropout_ratio: 0.5  }}layers {  name: "fc7_p"  type: INNER_PRODUCT  bottom: "fc6_p"  top: "fc7_p"  blobs_lr: 1  blobs_lr: 2  weight_decay: 1  weight_decay: 0  inner_product_param {    num_output: 2    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layers {  name: "relu7_p"  type: RELU  bottom: "fc7_p"  top: "fc7_p"}layers {  name: "drop7_p"  type: DROPOUT  bottom: "fc7_p"  top: "fc7_p"  dropout_param {    dropout_ratio: 0.5  }}layers {    name: "loss"    type: CONTRASTIVE_LOSS    contrastive_loss_param {        margin: 1.0    }    bottom: "fc7"    bottom: "fc7_p"    bottom: "label"    top: "loss"}

我的训练文件结构：0表示不相似，1表示相似

 train1.txt: /aer/img1_1.jpg 0 /aer/img1_2.jpg 1 /aer/img1_3.jpg 1 train2.txt: /tpd/img2_1.jpg 0 /tpd/img2_2.jpg 1 /tpd/img2_3.jpg 1

回答：

我应该将“num_output”设置为多少？

在了解应该将num_output设置为多少之前，让我们先解释一下它的含义。实际上，你可以将孪生网络的两部分，data -> fc7和data_p -> fc7_p视为两个特征提取器。每个提取器从相应的数据层中的图像中提取特征，例如fc7和fc7_p。因此，num_output定义了提取的特征向量的维度。

在训练过程中，ContrastiveLoss层总是试图在图像相似时（label == 1）最小化两个提取的特征向量的距离，而在图像不相似时（label == 0）最大化距离。也就是说，特征向量的距离越小，图像就越相似。

那么，特征向量的最佳维度是多少，才能最好地包含指示相似性的信息？或者说，你应该将num_output设置为多少？可能没有一个确切的值，这取决于特征提取器的编码质量（你可以将特征视为图像的编码）和识别图像相似性的难度。因此，基本上如果网络（特征提取器）足够深，并且识别相似性并不太困难，你可以选择一个相对较小的num_output值，例如200，因为较大的网络可以更好地编码特征，使其更具区分性。如果不是这样，你可以尝试更大的值，例如500、1000，或者尝试更复杂的网络。

如果你想尝试使用MultinomialLogisticLoss层而不是ContrastiveLoss层，你应该首先使用像CONCAT这样的层将两个特征向量fc7和fc7_p融合成一个，然后将其输入到SOFTMAX_LOSS层中，像这样：

...#原始层layers {  name: "concat"  type: CONCAT  bottom: "fc7"  bottom: "fc7_p"    top: "fc_concat" # 沿着通道轴连接fc7和fc7_p}layer {  name: "fc_cls"  type: INNER_PRODUCT  bottom: "fc_concat"  top: "fc_cls"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  inner_product_param {    num_output: 2 # 在这种情况下是一个二元分类问题    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"    }  }}layer {  name: "accuracy"  type: ACCURACY  bottom: "fc_cls"  bottom: "label"  top: "accuracy"  include {    phase: TEST  }}layer {  name: "loss"  type: SOFTMAX_LOSS  bottom: "fc_cls"  bottom: "label"  top: "loss"}

更新

为了比较相似性并用于部署，哪种方法是最好的，Contrastive Loss还是SoftMax Loss？

Softmax Loss简单且易于部署。但它只能给你二元预测，即相似或不相似。它给出的两个类别（相似、不相似）的概率分布通常过于硬（不均匀），例如[0.9*, 0.0*]、[0.0*, 0.9*]等，在许多情况下不能很好地反映输入的相似度。

而使用Contrastive Loss，你可以为图像获得一个具有区分性的特征向量。你可以使用该向量计算相似性的概率，就像CVPR 2005年论文Learning a Similarity Metric Discriminatively, with Application to Face Verification在第4.1节中所做的那样。（关键点是使用属于同一主体的图像生成的特征向量计算多元正态密度）。此外，你还可以使用阈值来控制模型的假阳性率和假阴性率，以获得ROC曲线，从而更好地评估模型。

顺便说一下，要发掘更多用于预测相似性的CNN架构，你可以参考CVPR 2015年论文Learning to Compare Image Patches via Convolutional Neural Networks。

更新

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复