如何在Caffe中阻止权重更新

我的网络中的某些层加载了预训练模型。我希望固定这些层的参数,并训练其他层。

我按照这个页面的指示,将lr_multidecay_multi设置为0,propagate_down: false,甚至在求解器中将base_lr: 0weight_decay: 0设置为0。然而,测试损失(每次测试使用所有测试图像)在每次迭代中仍然非常缓慢地变化。经过数千次迭代后,准确率会从加载预训练模型时的80%降至0。

这里是一个两层示例,我只是初始化权重并将上述参数设置为0。我希望在这个示例中冻结所有层,但当训练开始时,损失值一直在变化…

  layer {    name: "data"    type: "ImageData"    top: "data"    top: "label"    include {      phase: TRAIN    }    transform_param {      scale: 0.017      mirror: true      crop_size: 32      mean_value: 115      mean_value: 126      mean_value: 130      color: true      contrast: true      brightness: true    }    image_data_param {      source: "/data/zhuhao5/data/cifar100/cifar100_train_replicate.txt"      batch_size: 64      shuffle: true      #pair_size: 3    }  }  layer {    name: "data"    type: "ImageData"    top: "data"    top: "label"    include {      phase: TEST    }    transform_param {      scale: 0.017      mirror: false      crop_size: 32      mean_value: 115      mean_value: 126      mean_value: 130    }    image_data_param {      source: "/data/zhuhao5/data/cifar100/cifar100_test.txt"      batch_size: 100      shuffle: false    }  }  #-------------- TEACHER --------------------  layer {    name: "conv1"    type: "Convolution"    bottom: "data"    propagate_down: false    top: "conv1"    param {       lr_mult: 0       decay_mult: 0     }    convolution_param {      num_output: 16      bias_term: false      pad: 1      kernel_size: 3      stride: 1      weight_filler {        type: "msra"      }    }  }  layer {    name: "res2_1a_1_bn"    type: "BatchNorm"    bottom: "conv1"    propagate_down: false    top: "res2_1a_1_bn"    param {       lr_mult: 0       decay_mult: 0     }        param {       lr_mult: 0       decay_mult: 0     }  }  layer {    name: "res2_1a_1_scale"    type: "Scale"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "res2_1a_1_bn"      param {       lr_mult: 0       decay_mult: 0     }    scale_param {      bias_term: true    }  }  layer {    name: "res2_1a_1_relu"    type: "ReLU"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "res2_1a_1_bn"  }  layer {    name: "pool_5"    type: "Pooling"    bottom: "res2_1a_1_bn"    propagate_down: false    top: "pool_5"    pooling_param {      pool: AVE      global_pooling: true    }  }  layer {    name: "fc100"    type: "InnerProduct"    bottom: "pool_5"    propagate_down: false    top: "fc100"    param {      lr_mult: 0      decay_mult: 0    }    param {      lr_mult: 0      decay_mult: 0    }    inner_product_param {      num_output: 100      weight_filler {        type: "msra"      }      bias_filler {        type: "constant"        value: 0      }    }  }  #---------------------------------  layer {    name: "tea_soft_loss"    type: "SoftmaxWithLoss"    bottom: "fc100"    bottom: "label"    propagate_down: false    propagate_down: false    top: "tea_soft_loss"    loss_weight: 0  }  ##----------- ACCURACY----------------  layer {    name: "teacher_accuracy"    type: "Accuracy"    bottom: "fc100"    bottom: "label"    top: "teacher_accuracy"    accuracy_param {      top_k: 1    }  }

这是求解器的配置:

test_iter: 100test_interval: 10base_lr: 0momentum: 0weight_decay: 0lr_policy: "poly"power: 1display: 10000max_iter: 80000snapshot: 5000type: "SGD"solver_mode: GPUrandom_seed: 10086

以及日志:

I0829 16:31:39.363433 14986 net.cpp:200] teacher_accuracy does not need backward computation.I0829 16:31:39.363438 14986 net.cpp:200] tea_soft_loss does not need backward computation.I0829 16:31:39.363442 14986 net.cpp:200] fc100_fc100_0_split does not need backward computation.I0829 16:31:39.363446 14986 net.cpp:200] fc100 does not need backward computation.I0829 16:31:39.363451 14986 net.cpp:200] pool_5 does not need backward computation.I0829 16:31:39.363454 14986 net.cpp:200] res2_1a_1_relu does not need backward computation.I0829 16:31:39.363458 14986 net.cpp:200] res2_1a_1_scale does not need backward computation.I0829 16:31:39.363462 14986 net.cpp:200] res2_1a_1_bn does not need backward computation.I0829 16:31:39.363466 14986 net.cpp:200] conv1 does not need backward computation.I0829 16:31:39.363471 14986 net.cpp:200] label_data_1_split does not need backward computation.I0829 16:31:39.363485 14986 net.cpp:200] data does not need backward computation.I0829 16:31:39.363490 14986 net.cpp:242] This network produces output tea_soft_lossI0829 16:31:39.363494 14986 net.cpp:242] This network produces output teacher_accuracyI0829 16:31:39.363507 14986 net.cpp:255] Network initialization done.I0829 16:31:39.363559 14986 solver.cpp:56] Solver scaffolding done.I0829 16:31:39.363852 14986 caffe.cpp:248] Starting OptimizationI0829 16:31:39.363862 14986 solver.cpp:272] Solving WRN_22_12_to_WRN_18_4_v5_netI0829 16:31:39.363865 14986 solver.cpp:273] Learning Rate Policy: polyI0829 16:31:39.365981 14986 solver.cpp:330] Iteration 0, Testing net (#0)I0829 16:31:39.366190 14986 blocking_queue.cpp:49] Waiting for dataI0829 16:31:39.742347 14986 solver.cpp:397]     Test net output #0: tea_soft_loss = 85.9064I0829 16:31:39.742437 14986 solver.cpp:397]     Test net output #1: teacher_accuracy = 0.0113I0829 16:31:39.749806 14986 solver.cpp:218] Iteration 0 (0 iter/s, 0.385886s/10000 iters), loss = 0I0829 16:31:39.749862 14986 solver.cpp:237]     Train net output #0: tea_soft_loss = 4.97483I0829 16:31:39.749877 14986 solver.cpp:237]     Train net output #1: teacher_accuracy = 0I0829 16:31:39.749908 14986 sgd_solver.cpp:105] Iteration 0, lr = 0

我想知道在Caffe的更新过程中我错过了什么 🙁


回答:

找到了原因。

BatchNorm层在训练和测试阶段使用不同的use_global_stats参数。

在我的问题中,我应该在训练过程中设置use_global_stats: true

另外,不要忘记Scale层。

修改后的层应该是

layer {  name: "res2_1a_1_bn"  type: "BatchNorm"  bottom: "conv1"  top: "res2_1a_1_bn"  batch_norm_param {      use_global_stats: true  }}layer {  name: "res2_1a_1_scale"  type: "Scale"  bottom: "res2_1a_1_bn"  top: "res2_1a_1_bn"  param {    lr_mult: 0    decay_mult: 0  }  param {    lr_mult: 0    decay_mult: 0  }  scale_param {    bias_term: true  }}

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注