如何在Caffe中阻止权重更新

我的网络中的某些层加载了预训练模型。我希望固定这些层的参数，并训练其他层。

我按照这个页面的指示，将lr_multi和decay_multi设置为0，propagate_down: false，甚至在求解器中将base_lr: 0和weight_decay: 0设置为0。然而，测试损失（每次测试使用所有测试图像）在每次迭代中仍然非常缓慢地变化。经过数千次迭代后，准确率会从加载预训练模型时的80%降至0。

这里是一个两层示例，我只是初始化权重并将上述参数设置为0。我希望在这个示例中冻结所有层，但当训练开始时，损失值一直在变化…

  layer {    name: "data"    type: "ImageData"    top: "data"    top: "label"    include {      phase: TRAIN    }    transform_param {      scale: 0.017      mirror: true      crop_size: 32      mean_value: 115      mean_value: 126      mean_value: 130      color: true      contrast: true      brightness: true    }    image_data_param {      source: "/data/zhuhao5/data/cifar100/cifar100_train_replicate.txt"      batch_size: 64      shuffle: true      #pair_size: 3    }  }  layer {    name: "data"    type: "ImageData"    top: "data"    top: "label"    include {      phase: TEST    }    transform_param {      scale: 0.017      mirror: false      crop_size: 32      mean_value: 115      mean_value: 126      mean_value: 130    }    image_data_param {      source: "/data/zhuhao5/data/cifar100/cifar100_test.txt"      batch_size: 100      shuffle: false    }  }  #-------------- TEACHER --------------------  layer {    name: "conv1"    type: "Convolution"    bottom: "data"    propagate_down: false    top: "conv1"    param {       lr_mult: 0       decay_mult: 0     }    convolution_param {      num_output: 16      bias_term: false      pad: 1      kernel_size: 3      stride: 1      weight_filler {        type: "msra"      }    }  }  layer {    name: "res2_1a_1_bn"    type: "BatchNorm"    bottom: "conv1"    propagate_down: false    top: "res2_1a_1_bn"    param {       lr_mult: 0       decay_mult: 0     }        param {       lr_mult: 0       decay_mult: 0     }  }  layer {    name: "res2_1a_1_scale"    type: "Scale"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "res2_1a_1_bn"      param {       lr_mult: 0       decay_mult: 0     }    scale_param {      bias_term: true    }  }  layer {    name: "res2_1a_1_relu"    type: "ReLU"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "res2_1a_1_bn"  }  layer {    name: "pool_5"    type: "Pooling"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "pool_5"    pooling_param {      pool: AVE      global_pooling: true    }  }  layer {    name: "fc100"    type: "InnerProduct"    bottom: "pool_5"    propagate_down: false    top: "fc100"    param {      lr_mult: 0      decay_mult: 0    }    param {      lr_mult: 0      decay_mult: 0    }    inner_product_param {      num_output: 100      weight_filler {        type: "msra"      }      bias_filler {        type: "constant"        value: 0      }    }  }  #---------------------------------  layer {    name: "tea_soft_loss"    type: "SoftmaxWithLoss"    bottom: "fc100"    bottom: "label"    propagate_down: false    propagate_down: false    top: "tea_soft_loss"    loss_weight: 0  }  ##----------- ACCURACY----------------  layer {    name: "teacher_accuracy"    type: "Accuracy"    bottom: "fc100"    bottom: "label"    top: "teacher_accuracy"    accuracy_param {      top_k: 1    }  }

这是求解器的配置：

test_iter: 100test_interval: 10base_lr: 0momentum: 0weight_decay: 0lr_policy: "poly"power: 1display: 10000max_iter: 80000snapshot: 5000type: "SGD"solver_mode: GPUrandom_seed: 10086

以及日志：

I0829 16:31:39.363433 14986 net.cpp:200] teacher_accuracy does not need backward computation.I0829 16:31:39.363438 14986 net.cpp:200] tea_soft_loss does not need backward computation.I0829 16:31:39.363442 14986 net.cpp:200] fc100_fc100_0_split does not need backward computation.I0829 16:31:39.363446 14986 net.cpp:200] fc100 does not need backward computation.I0829 16:31:39.363451 14986 net.cpp:200] pool_5 does not need backward computation.I0829 16:31:39.363454 14986 net.cpp:200] res2_1a_1_relu does not need backward computation.I0829 16:31:39.363458 14986 net.cpp:200] res2_1a_1_scale does not need backward computation.I0829 16:31:39.363462 14986 net.cpp:200] res2_1a_1_bn does not need backward computation.I0829 16:31:39.363466 14986 net.cpp:200] conv1 does not need backward computation.I0829 16:31:39.363471 14986 net.cpp:200] label_data_1_split does not need backward computation.I0829 16:31:39.363485 14986 net.cpp:200] data does not need backward computation.I0829 16:31:39.363490 14986 net.cpp:242] This network produces output tea_soft_lossI0829 16:31:39.363494 14986 net.cpp:242] This network produces output teacher_accuracyI0829 16:31:39.363507 14986 net.cpp:255] Network initialization done.I0829 16:31:39.363559 14986 solver.cpp:56] Solver scaffolding done.I0829 16:31:39.363852 14986 caffe.cpp:248] Starting OptimizationI0829 16:31:39.363862 14986 solver.cpp:272] Solving WRN_22_12_to_WRN_18_4_v5_netI0829 16:31:39.363865 14986 solver.cpp:273] Learning Rate Policy: polyI0829 16:31:39.365981 14986 solver.cpp:330] Iteration 0, Testing net (#0)I0829 16:31:39.366190 14986 blocking_queue.cpp:49] Waiting for dataI0829 16:31:39.742347 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 85.9064I0829 16:31:39.742437 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0113I0829 16:31:39.749806 14986 solver.cpp:218] Iteration 0 (0 iter/s, 0.385886s/10000 iters), loss = 0I0829 16:31:39.749862 14986 solver.cpp:237]     Train net output #0: tea_soft_loss = 4.97483I0829 16:31:39.749877 14986 solver.cpp:237]     Train net output #1: teacher_accuracy = 0I0829 16:31:39.749908 14986 sgd_solver.cpp:105] Iteration 0, lr = 0

我想知道在Caffe的更新过程中我错过了什么 🙁

回答：

找到了原因。

BatchNorm层在训练和测试阶段使用不同的use_global_stats参数。

在我的问题中，我应该在训练过程中设置use_global_stats: true。

另外，不要忘记Scale层。

修改后的层应该是

layer {  name: "res2_1a_1_bn"  type: "BatchNorm"  bottom: "conv1"  top: "res2_1a_1_bn"  batch_norm_param {      use_global_stats: true  }}layer {  name: "res2_1a_1_scale"  type: "Scale"  bottom: "res2_1a_1_bn"  top: "res2_1a_1_bn"  param {    lr_mult: 0    decay_mult: 0  }  param {    lr_mult: 0    decay_mult: 0  }  scale_param {    bias_term: true  }}

学技术

如何在Caffe中阻止权重更新

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复