咖啡回归损失不收敛

我在Caffe中进行回归分析。我的数据集包含400张128×128大小的RGB图像,标签是范围在(-1,1)之间的浮点数。我对数据集唯一进行的转换是归一化处理(将RGB每个像素值除以255)。但是损失似乎完全没有收敛的迹象。

这可能是什么原因呢?谁能给我一些建议吗?

这是我的训练日志:

Training..Using solver: solver_hdf5.prototxtI0929 21:50:21.657784 13779 caffe.cpp:112] Use CPU.I0929 21:50:21.658033 13779 caffe.cpp:174] Starting OptimizationI0929 21:50:21.658107 13779 solver.cpp:34] Initializing solver from parameters: test_iter: 100test_interval: 500base_lr: 0.0001display: 25max_iter: 10000lr_policy: "inv"gamma: 0.0001power: 0.75momentum: 0.9weight_decay: 0.0005snapshot: 5000snapshot_prefix: "lenet_hdf5"solver_mode: CPUnet: "train_test_hdf5.prototxt"I0929 21:50:21.658143 13779 solver.cpp:75] Creating training net from net file: train_test_hdf5.prototxtI0929 21:50:21.658567 13779 net.cpp:334] The NetState phase (0) differed from the phase (1) specified by a rule in layer dataI0929 21:50:21.658709 13779 net.cpp:46] Initializing net from parameters: name: "MSE regression"state {  phase: TRAIN}layer {  name: "data"  type: "HDF5Data"  top: "data"  top: "label"  include {    phase: TRAIN  }  hdf5_data_param {    source: "train_hdf5file.txt"    batch_size: 64    shuffle: true  }}layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  convolution_param {    num_output: 20    kernel_size: 5    stride: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu1"  type: "ReLU"  bottom: "conv1"  top: "conv1"}layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "dropout1"  type: "Dropout"  bottom: "pool1"  top: "pool1"  dropout_param {    dropout_ratio: 0.1  }}layer {  name: "fc1"  type: "InnerProduct"  bottom: "pool1"  top: "fc1"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output: 500    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "dropout2"  type: "Dropout"  bottom: "fc1"  top: "fc1"  dropout_param {    dropout_ratio: 0.5  }}layer {  name: "fc2"  type: "InnerProduct"  bottom: "fc1"  top: "fc2"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "loss"  type: "EuclideanLoss"  bottom: "fc2"  bottom: "label"  top: "loss"}I0929 21:50:21.658833 13779 layer_factory.hpp:74] Creating layer dataI0929 21:50:21.658859 13779 net.cpp:96] Creating Layer dataI0929 21:50:21.658871 13779 net.cpp:415] data -> dataI0929 21:50:21.658902 13779 net.cpp:415] data -> labelI0929 21:50:21.658926 13779 net.cpp:160] Setting up dataI0929 21:50:21.658936 13779 hdf5_data_layer.cpp:80] Loading list of HDF5 filenames from: train_hdf5file.txtI0929 21:50:21.659220 13779 hdf5_data_layer.cpp:94] Number of HDF5 files: 1I0929 21:50:21.920578 13779 net.cpp:167] Top shape: 64 3 128 128 (3145728)I0929 21:50:21.920656 13779 net.cpp:167] Top shape: 64 1 (64)I0929 21:50:21.920686 13779 layer_factory.hpp:74] Creating layer conv1I0929 21:50:21.920740 13779 net.cpp:96] Creating Layer conv1I0929 21:50:21.920774 13779 net.cpp:459] conv1 <- dataI0929 21:50:21.920825 13779 net.cpp:415] conv1 -> conv1I0929 21:50:21.920877 13779 net.cpp:160] Setting up conv1I0929 21:50:21.921985 13779 net.cpp:167] Top shape: 64 20 124 124 (19681280)I0929 21:50:21.922050 13779 layer_factory.hpp:74] Creating layer relu1I0929 21:50:21.922085 13779 net.cpp:96] Creating Layer relu1I0929 21:50:21.922108 13779 net.cpp:459] relu1 <- conv1I0929 21:50:21.922137 13779 net.cpp:404] relu1 -> conv1 (in-place)I0929 21:50:21.922185 13779 net.cpp:160] Setting up relu1I0929 21:50:21.922227 13779 net.cpp:167] Top shape: 64 20 124 124 (19681280)I0929 21:50:21.922250 13779 layer_factory.hpp:74] Creating layer pool1I0929 21:50:21.922277 13779 net.cpp:96] Creating Layer pool1I0929 21:50:21.922298 13779 net.cpp:459] pool1 <- conv1I0929 21:50:21.922323 13779 net.cpp:415] pool1 -> pool1I0929 21:50:21.922418 13779 net.cpp:160] Setting up pool1I0929 21:50:21.922472 13779 net.cpp:167] Top shape: 64 20 62 62 (4920320)I0929 21:50:21.922495 13779 layer_factory.hpp:74] Creating layer dropout1I0929 21:50:21.922534 13779 net.cpp:96] Creating Layer dropout1I0929 21:50:21.922555 13779 net.cpp:459] dropout1 <- pool1I0929 21:50:21.922582 13779 net.cpp:404] dropout1 -> pool1 (in-place)I0929 21:50:21.922613 13779 net.cpp:160] Setting up dropout1I0929 21:50:21.922652 13779 net.cpp:167] Top shape: 64 20 62 62 (4920320)I0929 21:50:21.922672 13779 layer_factory.hpp:74] Creating layer fc1I0929 21:50:21.922709 13779 net.cpp:96] Creating Layer fc1I0929 21:50:21.922729 13779 net.cpp:459] fc1 <- pool1I0929 21:50:21.922757 13779 net.cpp:415] fc1 -> fc1I0929 21:50:21.922801 13779 net.cpp:160] Setting up fc1I0929 21:50:22.301134 13779 net.cpp:167] Top shape: 64 500 (32000)I0929 21:50:22.301193 13779 layer_factory.hpp:74] Creating layer dropout2I0929 21:50:22.301210 13779 net.cpp:96] Creating Layer dropout2I0929 21:50:22.301218 13779 net.cpp:459] dropout2 <- fc1I0929 21:50:22.301232 13779 net.cpp:404] dropout2 -> fc1 (in-place)I0929 21:50:22.301244 13779 net.cpp:160] Setting up dropout2I0929 21:50:22.301254 13779 net.cpp:167] Top shape: 64 500 (32000)I0929 21:50:22.301259 13779 layer_factory.hpp:74] Creating layer fc2I0929 21:50:22.301270 13779 net.cpp:96] Creating Layer fc2I0929 21:50:22.301275 13779 net.cpp:459] fc2 <- fc1I0929 21:50:22.301285 13779 net.cpp:415] fc2 -> fc2I0929 21:50:22.301295 13779 net.cpp:160] Setting up fc2I0929 21:50:22.301317 13779 net.cpp:167] Top shape: 64 1 (64)I0929 21:50:22.301328 13779 layer_factory.hpp:74] Creating layer lossI0929 21:50:22.301338 13779 net.cpp:96] Creating Layer lossI0929 21:50:22.301343 13779 net.cpp:459] loss <- fc2I0929 21:50:22.301350 13779 net.cpp:459] loss <- labelI0929 21:50:22.301360 13779 net.cpp:415] loss -> lossI0929 21:50:22.301374 13779 net.cpp:160] Setting up lossI0929 21:50:22.301385 13779 net.cpp:167] Top shape: (1)I0929 21:50:22.301391 13779 net.cpp:169]     with loss weight 1I0929 21:50:22.301419 13779 net.cpp:239] loss needs backward computation.I0929 21:50:22.301425 13779 net.cpp:239] fc2 needs backward computation.I0929 21:50:22.301430 13779 net.cpp:239] dropout2 needs backward computation.I0929 21:50:22.301436 13779 net.cpp:239] fc1 needs backward computation.I0929 21:50:22.301441 13779 net.cpp:239] dropout1 needs backward computation.I0929 21:50:22.301446 13779 net.cpp:239] pool1 needs backward computation.I0929 21:50:22.301452 13779 net.cpp:239] relu1 needs backward computation.I0929 21:50:22.301457 13779 net.cpp:239] conv1 needs backward computation.I0929 21:50:22.301463 13779 net.cpp:241] data does not need backward computation.I0929 21:50:22.301468 13779 net.cpp:282] This network produces output lossI0929 21:50:22.301482 13779 net.cpp:531] Collecting Learning Rate and Weight Decay.I0929 21:50:22.301491 13779 net.cpp:294] Network initialization done.I0929 21:50:22.301496 13779 net.cpp:295] Memory required for data: 209652228I0929 21:50:22.301908 13779 solver.cpp:159] Creating test net (#0) specified by net file: train_test_hdf5.prototxtI0929 21:50:22.301935 13779 net.cpp:334] The NetState phase (1) differed from the phase (0) specified by a rule in layer dataI0929 21:50:22.302028 13779 net.cpp:46] Initializing net from parameters: name: "MSE regression"state {  phase: TEST}layer {  name: "data"  type: "HDF5Data"  top: "data"  top: "label"  include {    phase: TEST  }  hdf5_data_param {    source: "test_hdf5file.txt"    batch_size: 30  }}layer {  name: "conv1"  type: "Convolution"  bottom: "data"  top: "conv1"  param {    lr_mult: 1  }  param {    lr_mult: 2  }  convolution_param {    num_output: 20    kernel_size: 5    stride: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "relu1"  type: "ReLU"  bottom: "conv1"  top: "conv1"}layer {  name: "pool1"  type: "Pooling"  bottom: "conv1"  top: "pool1"  pooling_param {    pool: MAX    kernel_size: 2    stride: 2  }}layer {  name: "dropout1"  type: "Dropout"  bottom: "pool1"  top: "pool1"  dropout_param {    dropout_ratio: 0.1  }}layer {  name: "fc1"  type: "InnerProduct"  bottom: "pool1"  top: "fc1"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output: 500    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "dropout2"  type: "Dropout"  bottom: "fc1"  top: "fc1"  dropout_param {    dropout_ratio: 0.5  }}layer {  name: "fc2"  type: "InnerProduct"  bottom: "fc1"  top: "fc2"  param {    lr_mult: 1    decay_mult: 1  }  param {    lr_mult: 2    decay_mult: 0  }  inner_product_param {    num_output: 1    weight_filler {      type: "xavier"    }    bias_filler {      type: "constant"      value: 0    }  }}layer {  name: "loss"  type: "EuclideanLoss"  bottom: "fc2"  bottom: "label"  top: "loss"}I0929 21:50:22.302146 13779 layer_factory.hpp:74] Creating layer dataI0929 21:50:22.302158 13779 net.cpp:96] Creating Layer dataI0929 21:50:22.302165 13779 net.cpp:415] data -> dataI0929 21:50:22.302176 13779 net.cpp:415] data -> labelI0929 21:50:22.302186 13779 net.cpp:160] Setting up dataI0929 21:50:22.302191 13779 hdf5_data_layer.cpp:80] Loading list of HDF5 filenames from: test_hdf5file.txtI0929 21:50:22.302305 13779 hdf5_data_layer.cpp:94] Number of HDF5 files: 1I0929 21:50:22.434798 13779 net.cpp:167] Top shape: 30 3 128 128 (1474560)I0929 21:50:22.434849 13779 net.cpp:167] Top shape: 30 1 (30)I0929 21:50:22.434864 13779 layer_factory.hpp:74] Creating layer conv1I0929 21:50:22.434895 13779 net.cpp:96] Creating Layer conv1I0929 21:50:22.434914 13779 net.cpp:459] conv1 <- dataI0929 21:50:22.434944 13779 net.cpp:415] conv1 -> conv1I0929 21:50:22.434996 13779 net.cpp:160] Setting up conv1I0929 21:50:22.435084 13779 net.cpp:167] Top shape: 30 20 124 124 (9225600)I0929 21:50:22.435119 13779 layer_factory.hpp:74] Creating layer relu1I0929 21:50:22.435205 13779 net.cpp:96] Creating Layer relu1I0929 21:50:22.435237 13779 net.cpp:459] relu1 <- conv1I0929 21:50:22.435292 13779 net.cpp:404] relu1 -> conv1 (in-place)I0929 21:50:22.435328 13779 net.cpp:160] Setting up relu1I0929 21:50:22.435371 13779 net.cpp:167] Top shape: 30 20 124 124 (9225600)I0929 21:50:22.435400 13779 layer_factory.hpp:74] Creating layer pool1I0929 21:50:22.435443 13779 net.cpp:96] Creating Layer pool1I0929 21:50:22.435470 13779 net.cpp:459] pool1 <- conv1I0929 21:50:22.435511 13779 net.cpp:415] pool1 -> pool1I0929 21:50:22.435550 13779 net.cpp:160] Setting up pool1I0929 21:50:22.435597 13779 net.cpp:167] Top shape: 30 20 62 62 (2306400)I0929 21:50:22.435626 13779 layer_factory.hpp:74] Creating layer dropout1I0929 21:50:22.435669 13779 net.cpp:96] Creating Layer dropout1I0929 21:50:22.435698 13779 net.cpp:459] dropout1 <- pool1I0929 21:50:22.435739 13779 net.cpp:404] dropout1 -> pool1 (in-place)I0929 21:50:22.435780 13779 net.cpp:160] Setting up dropout1I0929 21:50:22.435823 13779 net.cpp:167] Top shape: 30 20 62 62 (2306400)I0929 21:50:22.435853 13779 layer_factory.hpp:74] Creating layer fc1I0929 21:50:22.435899 13779 net.cpp:96] Creating Layer fc1I0929 21:50:22.435926 13779 net.cpp:459] fc1 <- pool1I0929 21:50:22.435971 13779 net.cpp:415] fc1 -> fc1I0929 21:50:22.436018 13779 net.cpp:160] Setting up fc1I0929 21:50:22.816076 13779 net.cpp:167] Top shape: 30 500 (15000)I0929 21:50:22.816138 13779 layer_factory.hpp:74] Creating layer dropout2I0929 21:50:22.816154 13779 net.cpp:96] Creating Layer dropout2I0929 21:50:22.816160 13779 net.cpp:459] dropout2 <- fc1I0929 21:50:22.816170 13779 net.cpp:404] dropout2 -> fc1 (in-place)I0929 21:50:22.816182 13779 net.cpp:160] Setting up dropout2I0929 21:50:22.816192 13779 net.cpp:167] Top shape: 30 500 (15000)I0929 21:50:22.816197 13779 layer_factory.hpp:74] Creating layer fc2I0929 21:50:22.816208 13779 net.cpp:96] Creating Layer fc2I0929 21:50:22.816249 13779 net.cpp:459] fc2 <- fc1I0929 21:50:22.816262 13779 net.cpp:415] fc2 -> fc2I0929 21:50:22.816277 13779 net.cpp:160] Setting up fc2I0929 21:50:22.816301 13779 net.cpp:167] Top shape: 30 1 (30)I0929 21:50:22.816316 13779 layer_factory.hpp:74] Creating layer lossI0929 21:50:22.816329 13779 net.cpp:96] Creating Layer lossI0929 21:50:22.816337 13779 net.cpp:459] loss <- fc2I0929 21:50:22.816347 13779 net.cpp:459] loss <- labelI0929 21:50:22.816359 13779 net.cpp:415] loss -> lossI0929 21:50:22.816370 13779 net.cpp:160] Setting up lossI0929 21:50:22.816381 13779 net.cpp:167] Top shape: (1)I0929 21:50:22.816388 13779 net.cpp:169]     with loss weight 1I0929 21:50:22.816407 13779 net.cpp:239] loss needs backward computation.I0929 21:50:22.816416 13779 net.cpp:239] fc2 needs backward computation.I0929 21:50:22.816426 13779 net.cpp:239] dropout2 needs backward computation.I0929 21:50:22.816433 13779 net.cpp:239] fc1 needs backward computation.I0929 21:50:22.816442 13779 net.cpp:239] dropout1 needs backward computation.I0929 21:50:22.816452 13779 net.cpp:239] pool1 needs backward computation.I0929 21:50:22.816460 13779 net.cpp:239] relu1 needs backward computation.I0929 21:50:22.816468 13779 net.cpp:239] conv1 needs backward computation.I0929 21:50:22.816478 13779 net.cpp:241] data does not need backward computation.I0929 21:50:22.816486 13779 net.cpp:282] This network produces output lossI0929 21:50:22.816500 13779 net.cpp:531] Collecting Learning Rate and Weight Decay.I0929 21:50:22.816510 13779 net.cpp:294] Network initialization done.I0929 21:50:22.816517 13779 net.cpp:295] Memory required for data: 98274484I0929 21:50:22.816565 13779 solver.cpp:47] Solver scaffolding done.I0929 21:50:22.816587 13779 solver.cpp:363] Solving MSE regressionI0929 21:50:22.816596 13779 solver.cpp:364] Learning Rate Policy: invI0929 21:50:22.870337 13779 solver.cpp:424] Iteration 0, Testing net (#0)

BeginTrainAfterSomeTime

enter image description here

更新(这是@[隐藏人名]回复之后)

更改数据后的训练图像:

Train1UpdateTrain2Update


回答:

看起来模型正在学习,损失值在下降。然而,你的数据显然存在问题。在学习开始之前(迭代0),损失值已经是0.0006。对于一个随机模型来说,这是一个极其小的损失值。因此,你的数据看起来非常奇怪。检查你的依赖变量,它们是否真的在-1到1之间均匀分布?还是说99%的值都是“0”,只有少数其他值?方法本身没有问题,你需要对数据进行更多的分析。确保数据确实均匀覆盖了[-1, 1]区间。一旦你解决了这个问题,将会有更多的小问题需要调整——但这是目前最大的问题——你用随机模型得到的错误值太小了,因此问题出在数据上,而不是算法/方法/参数上。为了加快进度,你也可以将当前使用的学习率从0.0001提高,但如前所述——首先要修复数据。

Related Posts

Keras Dense层输入未被展平

这是我的测试代码: from keras import…

无法将分类变量输入随机森林

我有10个分类变量和3个数值变量。我在分割后直接将它们…

如何在Keras中对每个输出应用Sigmoid函数?

这是我代码的一部分。 model = Sequenti…

如何选择类概率的最佳阈值?

我的神经网络输出是一个用于多标签分类的预测类概率表: …

在Keras中使用深度学习得到不同的结果

我按照一个教程使用Keras中的深度神经网络进行文本分…

‘MatMul’操作的输入’b’类型为float32,与参数’a’的类型float64不匹配

我写了一个简单的TensorFlow代码,但不断遇到T…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注