当我训练我的Caffe模型时,损失值始终保持在一个较高的水平,并且准确率没有提高。

这是我在训练自己的模型时得到的结果

I0510 20:53:16.677439  3591 solver.cpp:337] Iteration 0, Testing net (#0)I0510 20:57:20.822933  3591 solver.cpp:404]     Test net output #0: accuracy = 3.78788e-05I0510 20:57:20.823001  3591 solver.cpp:404]     Test net output #1: loss = 9.27223 (* 1 = 9.27223 loss)I0510 20:57:21.423084  3591 solver.cpp:228] Iteration 0, loss = 9.29181I0510 20:57:21.423110  3591 solver.cpp:244]     Train net output #0: loss = 9.29181 (* 1 = 9.29181 loss)I0510 20:57:21.423120  3591 sgd_solver.cpp:106] Iteration 0, lr = 0.001I0510 21:06:57.498831  3591 solver.cpp:337] Iteration 1000, Testing net (#0)I0510 21:10:59.477396  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00186553I0510 21:10:59.477463  3591 solver.cpp:404]     Test net output #1: loss = 8.86572 (* 1 = 8.86572 loss)I0510 21:20:35.828510  3591 solver.cpp:337] Iteration 2000, Testing net (#0)I0510 21:24:42.838196  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00144886I0510 21:24:42.838245  3591 solver.cpp:404]     Test net output #1: loss = 8.83859 (* 1 = 8.83859 loss)I0510 21:24:43.412120  3591 solver.cpp:228] Iteration 2000, loss = 8.81461I0510 21:24:43.412145  3591 solver.cpp:244]     Train net output #0: loss = 8.81461 (* 1 = 8.81461 loss)I0510 21:24:43.412150  3591 sgd_solver.cpp:106] Iteration 2000, lr = 0.001I0510 21:38:50.990823  3591 solver.cpp:337] Iteration 3000, Testing net (#0)I0510 21:42:52.918418  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00140152I0510 21:42:52.918493  3591 solver.cpp:404]     Test net output #1: loss = 8.81789 (* 1 = 8.81789 loss)I0510 22:00:09.519151  3591 solver.cpp:337] Iteration 4000, Testing net (#0)I0510 22:09:13.918016  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00149621I0510 22:09:13.918102  3591 solver.cpp:404]     Test net output #1: loss = 8.80909 (* 1 = 8.80909 loss)I0510 22:09:15.127683  3591 solver.cpp:228] Iteration 4000, loss = 8.8597I0510 22:09:15.127722  3591 solver.cpp:244]     Train net output #0: loss = 8.8597 (* 1 = 8.8597 loss)I0510 22:09:15.127729  3591 sgd_solver.cpp:106] Iteration 4000, lr = 0.001I0510 22:28:39.320019  3591 solver.cpp:337] Iteration 5000, Testing net (#0)I0510 22:37:43.847064  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00118371I0510 22:37:43.847173  3591 solver.cpp:404]     Test net output #1: loss = 8.80527 (* 1 = 8.80527 loss)I0510 23:58:17.120088  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.caffemodelI0510 23:58:17.238307  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_10000.solverstateI0510 23:58:17.491825  3591 solver.cpp:337] Iteration 10000, Testing net (#0)I0511 00:02:19.412715  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00186553I0511 00:02:19.412762  3591 solver.cpp:404]     Test net output #1: loss = 8.79114 (* 1 = 8.79114 loss)I0511 00:02:19.986547  3591 solver.cpp:228] Iteration 10000, loss = 8.83457I0511 00:02:19.986570  3591 solver.cpp:244]     Train net output #0: loss = 8.83457 (* 1 = 8.83457 loss)I0511 00:02:19.986578  3591 sgd_solver.cpp:106] Iteration 10000, lr = 0.001I0511 00:11:55.546052  3591 solver.cpp:337] Iteration 11000, Testing net (#0)I0511 00:15:57.490486  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00164773I0511 00:15:57.490532  3591 solver.cpp:404]     Test net output #1: loss = 8.78702 (* 1 = 8.78702 loss)I0511 00:25:33.666496  3591 solver.cpp:337] Iteration 12000, Testing net (#0)I0511 00:29:35.603062  3591 solver.cpp:404]     Test net output #0: accuracy = 0.0016572I0511 00:29:35.603109  3591 solver.cpp:404]     Test net output #1: loss = 8.7848 (* 1 = 8.7848 loss)I0511 00:29:36.177078  3591 solver.cpp:228] Iteration 12000, loss = 9.00561I0511 00:29:36.177105  3591 solver.cpp:244]     Train net output #0: loss = 9.00561 (* 1 = 9.00561 loss)I0511 00:29:36.177114  3591 sgd_solver.cpp:106] Iteration 12000, lr = 0.001I0511 00:39:11.729369  3591 solver.cpp:337] Iteration 13000, Testing net (#0)I0511 00:43:13.678067  3591 solver.cpp:404]     Test net output #0: accuracy = 0.001875I0511 00:43:13.678113  3591 solver.cpp:404]     Test net output #1: loss = 8.78359 (* 1 = 8.78359 loss)I0511 00:52:49.851985  3591 solver.cpp:337] Iteration 14000, Testing net (#0)I0511 00:56:51.767343  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00154356I0511 00:56:51.767390  3591 solver.cpp:404]     Test net output #1: loss = 8.77998 (* 1 = 8.77998 loss)I0511 00:56:52.341564  3591 solver.cpp:228] Iteration 14000, loss = 8.83385I0511 00:56:52.341591  3591 solver.cpp:244]     Train net output #0: loss = 8.83385 (* 1 = 8.83385 loss)I0511 00:56:52.341598  3591 sgd_solver.cpp:106] Iteration 14000, lr = 0.001I0511 02:14:38.224290  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.caffemodelI0511 02:14:38.735008  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_20000.solverstateI0511 02:14:38.805809  3591 solver.cpp:337] Iteration 20000, Testing net (#0)I0511 02:18:40.681993  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00179924I0511 02:18:40.682086  3591 solver.cpp:404]     Test net output #1: loss = 8.78129 (* 1 = 8.78129 loss)I0511 02:18:41.255969  3591 solver.cpp:228] Iteration 20000, loss = 8.82502I0511 02:18:41.255995  3591 solver.cpp:244]     Train net output #0: loss = 8.82502 (* 1 = 8.82502 loss)I0511 02:18:41.256001  3591 sgd_solver.cpp:106] Iteration 20000, lr = 0.001I0511 04:30:58.924096  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.caffemodelI0511 04:31:00.742739  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_30000.solverstateI0511 04:31:01.151980  3591 solver.cpp:337] Iteration 30000, Testing net (#0)I0511 04:35:03.075263  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00186553I0511 04:35:03.075307  3591 solver.cpp:404]     Test net output #1: loss = 8.77867 (* 1 = 8.77867 loss)I0511 04:35:03.649479  3591 solver.cpp:228] Iteration 30000, loss = 8.82915I0511 04:35:03.649507  3591 solver.cpp:244]     Train net output #0: loss = 8.82915 (* 1 = 8.82915 loss)I0511 04:35:03.649513  3591 sgd_solver.cpp:106] Iteration 30000, lr = 0.001I0511 07:55:36.848265  3591 solver.cpp:337] Iteration 45000, Testing net (#0)I0511 07:59:38.834043  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00179924I0511 07:59:38.834095  3591 solver.cpp:404]     Test net output #1: loss = 8.77432 (* 1 = 8.77432 loss)I0511 09:03:48.141854  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.caffemodelI0511 09:03:49.736464  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_50000.solverstateI0511 09:03:49.797582  3591 solver.cpp:337] Iteration 50000, Testing net (#0)I0511 09:07:51.777150  3591 solver.cpp:404]     Test net output #0: accuracy = 0.001875I0511 09:07:51.777207  3591 solver.cpp:404]     Test net output #1: loss = 8.77058 (* 1 = 8.77058 loss)I0511 09:07:52.351323  3591 solver.cpp:228] Iteration 50000, loss = 9.11435I0511 09:07:52.351351  3591 solver.cpp:244]     Train net output #0: loss = 9.11435 (* 1 = 9.11435 loss)I0511 09:07:52.351357  3591 sgd_solver.cpp:106] Iteration 50000, lr = 0.001I0511 09:17:28.188742  3591 solver.cpp:337] Iteration 51000, Testing net (#0)I0511 09:21:30.200623  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00186553I0511 09:21:30.200716  3591 solver.cpp:404]     Test net output #1: loss = 8.77026 (* 1 = 8.77026 loss)I0511 09:31:06.596501  3591 solver.cpp:337] Iteration 52000, Testing net (#0)I0511 09:35:08.580215  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00182765I0511 09:35:08.580313  3591 solver.cpp:404]     Test net output #1: loss = 8.76917 (* 1 = 8.76917 loss)I0511 09:35:09.154428  3591 solver.cpp:228] Iteration 52000, loss = 8.89758I0511 09:35:09.154453  3591 solver.cpp:244]     Train net output #0: loss = 8.89758 (* 1 = 8.89758 loss)I0511 09:35:09.154459  3591 sgd_solver.cpp:106] Iteration 52000, lr = 0.001I0511 09:44:44.906309  3591 solver.cpp:337] Iteration 53000, Testing net (#0)I0511 09:48:46.866353  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00185606I0511 09:48:46.866430  3591 solver.cpp:404]     Test net output #1: loss = 8.7708 (* 1 = 8.7708 loss)I0511 09:58:23.097244  3591 solver.cpp:337] Iteration 54000, Testing net (#0)I0511 10:02:25.056555  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00192235I0511 10:02:25.056605  3591 solver.cpp:404]     Test net output #1: loss = 8.76884 (* 1 = 8.76884 loss)I0511 10:02:25.630312  3591 solver.cpp:228] Iteration 54000, loss = 8.90552I0511 10:02:25.630337  3591 solver.cpp:244]     Train net output #0: loss = 8.90552 (* 1 = 8.90552 loss)I0511 10:02:25.630342  3591 sgd_solver.cpp:106] Iteration 54000, lr = 0.001I0511 14:44:51.563555  3591 solver.cpp:337] Iteration 75000, Testing net (#0)I0511 14:48:53.573640  3591 solver.cpp:404]     Test net output #0: accuracy = 0.0016572I0511 14:48:53.573724  3591 solver.cpp:404]     Test net output #1: loss = 8.76967 (* 1 = 8.76967 loss)I0511 14:58:30.080453  3591 solver.cpp:337] Iteration 76000, Testing net (#0)I0511 15:02:32.076011  3591 solver.cpp:404]     Test net output #0: accuracy = 0.001875I0511 15:02:32.076077  3591 solver.cpp:404]     Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)I0511 15:02:32.650342  3591 solver.cpp:228] Iteration 76000, loss = 9.0084I0511 15:02:32.650367  3591 solver.cpp:244]     Train net output #0: loss = 9.0084 (* 1 = 9.0084 loss)I0511 15:02:32.650373  3591 sgd_solver.cpp:106] Iteration 76000, lr = 0.001I0511 15:12:08.597450  3591 solver.cpp:337] Iteration 77000, Testing net (#0)I0511 15:16:10.636613  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00181818I0511 15:16:10.636693  3591 solver.cpp:404]     Test net output #1: loss = 8.76889 (* 1 = 8.76889 loss)I0511 15:25:47.167667  3591 solver.cpp:337] Iteration 78000, Testing net (#0)I0511 15:29:49.204596  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00185606I0511 15:29:49.204649  3591 solver.cpp:404]     Test net output #1: loss = 8.77059 (* 1 = 8.77059 loss)I0511 15:29:49.779094  3591 solver.cpp:228] Iteration 78000, loss = 8.73139I0511 15:29:49.779119  3591 solver.cpp:244]     Train net output #0: loss = 8.73139 (* 1 = 8.73139 loss)I0511 15:29:49.779124  3591 sgd_solver.cpp:106] Iteration 78000, lr = 0.001I0511 15:39:25.730358  3591 solver.cpp:337] Iteration 79000, Testing net (#0)I0511 15:43:27.756417  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00192235I0511 15:43:27.756485  3591 solver.cpp:404]     Test net output #1: loss = 8.76846 (* 1 = 8.76846 loss)I0511 15:53:04.419961  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.caffemodelI0511 15:53:06.138357  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_80000.solverstateI0511 15:53:06.519551  3591 solver.cpp:337] Iteration 80000, Testing net (#0)I0511 15:57:08.719681  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00164773I0511 15:57:08.719737  3591 solver.cpp:404]     Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)I0511 15:57:09.294163  3591 solver.cpp:228] Iteration 80000, loss = 8.56576I0511 15:57:09.294188  3591 solver.cpp:244]     Train net output #0: loss = 8.56576 (* 1 = 8.56576 loss)I0511 15:57:09.294193  3591 sgd_solver.cpp:106] Iteration 80000, lr = 0.001I0511 17:01:19.190099  3591 solver.cpp:337] Iteration 85000, Testing net (#0)I0511 17:05:21.148668  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00185606I0511 17:05:21.148733  3591 solver.cpp:404]     Test net output #1: loss = 8.77196 (* 1 = 8.77196 loss)I0511 17:14:57.670343  3591 solver.cpp:337] Iteration 86000, Testing net (#0)I0511 17:18:59.659850  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00181818I0511 17:18:59.659907  3591 solver.cpp:404]     Test net output #1: loss = 8.77126 (* 1 = 8.77126 loss)I0511 17:19:00.234335  3591 solver.cpp:228] Iteration 86000, loss = 8.72875I0511 17:19:00.234359  3591 solver.cpp:244]     Train net output #0: loss = 8.72875 (* 1 = 8.72875 loss)I0511 17:19:00.234364  3591 sgd_solver.cpp:106] Iteration 86000, lr = 0.001I0511 17:28:36.196920  3591 solver.cpp:337] Iteration 87000, Testing net (#0)I0511 17:32:38.181174  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00181818I0511 17:32:38.181231  3591 solver.cpp:404]     Test net output #1: loss = 8.771 (* 1 = 8.771 loss)I0511 17:42:14.658293  3591 solver.cpp:337] Iteration 88000, Testing net (#0)I0511 17:46:16.614358  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00188447I0511 17:46:16.614415  3591 solver.cpp:404]     Test net output #1: loss = 8.76964 (* 1 = 8.76964 loss)I0511 17:46:17.188212  3591 solver.cpp:228] Iteration 88000, loss = 8.80409I0511 17:46:17.188233  3591 solver.cpp:244]     Train net output #0: loss = 8.80409 (* 1 = 8.80409 loss)I0511 17:46:17.188240  3591 sgd_solver.cpp:106] Iteration 88000, lr = 0.001I0511 17:55:53.358322  3591 solver.cpp:337] Iteration 89000, Testing net (#0)I0511 17:59:55.305763  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00186553I0511 17:59:55.305868  3591 solver.cpp:404]     Test net output #1: loss = 8.76909 (* 1 = 8.76909 loss)I0511 18:09:31.658655  3591 solver.cpp:454] Snapshotting to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.caffemodelI0511 18:09:33.138741  3591 sgd_solver.cpp:273] Snapshotting solver state to binary proto file /home/wang/caffe-master/examples/NN2_iter_90000.solverstateI0511 18:09:33.691995  3591 solver.cpp:337] Iteration 90000, Testing net (#0)I0511 18:13:35.626065  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00168561I0511 18:13:35.626148  3591 solver.cpp:404]     Test net output #1: loss = 8.76973 (* 1 = 8.76973 loss)I0511 18:13:36.200448  3591 solver.cpp:228] Iteration 90000, loss = 8.97326I0511 18:13:36.200469  3591 solver.cpp:244]     Train net output #0: loss = 8.97326 (* 1 = 8.97326 loss)I0511 18:13:36.200474  3591 sgd_solver.cpp:106] Iteration 90000, lr = 0.001I0511 19:31:23.715662  3591 solver.cpp:337] Iteration 96000, Testing net (#0)I0511 19:35:25.677780  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00188447I0511 19:35:25.677836  3591 solver.cpp:404]     Test net output #1: loss = 8.7695 (* 1 = 8.7695 loss)I0511 19:35:26.251850  3591 solver.cpp:228] Iteration 96000, loss = 8.74232I0511 19:35:26.251875  3591 solver.cpp:244]     Train net output #0: loss = 8.74232 (* 1 = 8.74232 loss)I0511 19:35:26.251880  3591 sgd_solver.cpp:106] Iteration 96000, lr = 0.001I0511 19:45:02.057610  3591 solver.cpp:337] Iteration 97000, Testing net (#0)I0511 19:49:04.029269  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00188447I0511 19:49:04.029357  3591 solver.cpp:404]     Test net output #1: loss = 8.77655 (* 1 = 8.77655 loss)I0511 19:58:40.265120  3591 solver.cpp:337] Iteration 98000, Testing net (#0)I0511 20:02:42.182787  3591 solver.cpp:404]     Test net output #0: accuracy = 0.00183712I0511 20:02:42.182859  3591 solver.cpp:404]     Test net output #1: loss = 8.77069 (* 1 = 8.77069 loss)I0511 20:02:42.756922  3591 solver.cpp:228] Iteration 98000, loss = 8.61745I0511 20:02:42.756944  3591 solver.cpp:244]     Train net output #0: loss = 8.61745 (* 1 = 8.61745 loss)

由于代码字符数的限制,我不得不删除日志的一些行。不过,这并不重要。正如你所看到的,”Iteration 98000“和”Iteration 0“之间没有区别。我对这种情况感到非常困惑。

这是我的模型架构

name: "NN2"layer {  name: "data"  type: "Data"  top: "data"  top: "label"  include {    phase: TRAIN  }  transform_param {    mirror: true    mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"    data_param {    source: "/home/jiayi-wei/caffe/examples/NN2/img_train_lmdb"    batch_size: 30    backend: LMDB  }}layer {  name: "data"  type: "Data"  top: "data"  top: "label"  include {    phase: TEST  }  transform_param {    mirror: false    mean_file :"/home/jiayi-wei/caffe/examples/NN2/image_train_mean.binaryproto"  data_param {    source: "/home/jiayi-wei/caffe/examples/NN2/img_val_lmdb"    batch_size: 11    backend: LMDB  }}#第一层layer {  name: "conv11"  type: "Convolution"  bottom: "data"  top: "conv11"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 64    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu11"   type: "ReLU"  bottom: "conv11"  top: "conv11"}layer {  name: "conv12"  type: "Convolution"  bottom: "conv11"  top: "conv12"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 128    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu12"   type: "ReLU"  bottom: "conv12"  top: "conv12"}layer {  name: "pool1"  type: "Pooling"  bottom: "conv12"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}#第二层layer {  name: "conv21"  type: "Convolution"  bottom: "pool1"  top: "conv21"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 64    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu21"   type: "ReLU"  bottom: "conv21"  top: "conv21"}layer {  name: "conv22"  type: "Convolution"  bottom: "conv21"  top: "conv22"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 128    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu22"   type: "ReLU"  bottom: "conv22"  top: "conv22"}layer {  name: "pool2"  type: "Pooling"  bottom: "conv22"  top: "pool2"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}#第三层layer {  name: "conv31"  type: "Convolution"  bottom: "pool2"  top: "conv31"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 128    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu31"   type: "ReLU"  bottom: "conv31"  top: "conv31"}layer {  name: "conv32"  type: "Convolution"  bottom: "conv31"  top: "conv32"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 128    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu32"   type: "ReLU"  bottom: "conv32"  top: "conv32"}layer {  name: "pool3"  type: "Pooling"  bottom: "conv32"  top: "pool3"  pooling_param {    pool: MAX    pad:1    kernel_size: 2    stride: 2  }}#第四层layer {  name: "conv41"  type: "Convolution"  bottom: "pool3"  top: "conv41"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu41"   type: "ReLU"  bottom: "conv41"  top: "conv41"}layer {  name: "conv42"  type: "Convolution"  bottom: "conv41"  top: "conv42"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu42"   type: "ReLU"  bottom: "conv42"  top: "conv42"}layer {  name: "conv43"  type: "Convolution"  bottom: "conv42"  top: "conv43"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu43"   type: "ReLU"  bottom: "conv43"  top: "conv43"}layer {  name: "pool4"  type: "Pooling"  bottom: "conv43"  top: "pool4"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}#第五层layer {  name: "conv51"  type: "Convolution"  bottom: "pool4"  top: "conv51"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu51"   type: "ReLU"  bottom: "conv51"  top: "conv51"}layer {  name: "conv52"  type: "Convolution"  bottom: "conv51"  top: "conv52"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu52"   type: "ReLU"  bottom: "conv52"  top: "conv52"}layer {  name: "conv53"  type: "Convolution"  bottom: "conv52"  top: "conv53"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  convolution_param {    num_output: 256    pad:1    kernel_size: 3    stride: 1    weight_filler {      type: "gaussian"      std: 0.01    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "pool5"  type: "Pooling"  bottom: "conv53"  top: "pool5"  pooling_param {    pool: AVE    pad:1    kernel_size: 2    stride: 2  }}#Dropout和全连接层layer {  name: "dropout"  type: "Dropout"  bottom: "pool5"  top: "pool5"  dropout_param {    dropout_ratio: 0.5  }}layer {  name: "fc6"  type: "InnerProduct"  bottom: "pool5"  top: "fc6"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output:1000    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layer {  name: "fc7"  type: "InnerProduct"  bottom: "fc6"  top: "fc7"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output:10575    weight_filler {      type: "gaussian"      std: 0.005    }    bias_filler {      type: "constant"      value: 1    }  }}layer {  name: "accuracy"  type: "Accuracy"  bottom: "fc7"  bottom: "label"  top: "accuracy"  include {    phase: TEST  }}layer {  name: "SoftMax"  type: "SoftmaxWithLoss"  bottom: "fc7"  bottom: "label"  top: "SoftMax"}

以下是我的求解器配置。我已经将base_lr更改为"0.001"

net: "train_val.prototxt"test_iter: 10000test_interval: 1000base_lr: 0.001lr_policy: "step"gamma: 0.1stepsize: 100000display: 20max_iter: 450000momentum: 0.9weight_decay: 0.0005snapshot: 10000snapshot_prefix: "/home/jiayi-wei/caffe/examples/NN2"solver_mode: GPU

我尝试过更改一些参数,并且我已经尝试过从拥有三个“conv”层的块中删除一个“conv”层。然而,结果始终如图所示保持不变。

请告诉我如何解决这个问题?谢谢


回答:

从你的日志来看,似乎你的模型在训练过程中倾向于持续预测不变的标签,也就是说,你的训练过程出现了发散。我建议你进行以下检查。

  1. 检查转换训练/验证lmdb数据时的标签。在你的CNN架构中,Dropout层最好放在全连接层,即“fc6”之下,而不是池化层“pool5”之下。
  2. 我不知道你在训练过程中是如何采样训练数据的。原则上,如果你只使用Softmax成本(多项式交叉熵损失),你应该在准备训练/验证lmdb数据时打乱你的训练数据,并设置一个适当大的批量大小,例如在训练时设置为256。
  3. 你的学习率(base_lr)可能太大,你可以进一步将其从0.001降低到0.0001,但我注意到CASIA WebFace基线(http://arxiv.org/abs/1411.7923)使用了0.01的学习率,而你的输入数据规模、激活函数、模型的深度和宽度与其相似,因此不太可能是由学习率引起的。(但你应该检查权重初始化方法是否有很大影响。)
  4. 尝试使用较小的卷积核大小。有时这可能有助于减少由于卷积核与其对应输入特征图之间的对齐问题导致的信息损失。

顺便说一下,你正在训练一个包含10575个类别的分类任务,每个类别只有大约40个训练样本,因此在某种程度上,训练数据是不足的。因此,像基线工作一样,为了增强模型区分相同和不同样本的能力,最好在Softmax成本之外再添加一个对比成本

参考资料 Sun Y, Chen Y, Wang X, et al. Deep learning face representation by joint identification-verification[C]//Advances in Neural Information Processing Systems. 2014: 1988-1996.

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注