我正在使用bvlc_reference_caffenet进行训练。我同时进行训练和测试。下面是我训练网络的日志示例:
I0430 11:49:08.408740 23343 data_layer.cpp:73] Restarting data prefetching from start.I0430 11:49:21.221074 23343 data_layer.cpp:73] Restarting data prefetching from start.I0430 11:49:34.038710 23343 data_layer.cpp:73] Restarting data prefetching from start.I0430 11:49:46.816813 23343 data_layer.cpp:73] Restarting data prefetching from start.I0430 11:49:56.630870 23334 solver.cpp:397] Test net output #0: accuracy = 0.932502I0430 11:49:56.630940 23334 solver.cpp:397] Test net output #1: loss = 0.388662 (* 1 = 0.388662 loss)I0430 11:49:57.218236 23334 solver.cpp:218] Iteration 71000 (0.319361 iter/s, 62.625s/20 iters), loss = 0.00146191I0430 11:49:57.218300 23334 solver.cpp:237] Train net output #0: loss = 0.00146191 (* 1 = 0.00146191 loss)I0430 11:49:57.218308 23334 sgd_solver.cpp:105] Iteration 71000, lr = 0.001I0430 11:50:09.168726 23334 solver.cpp:218] Iteration 71020 (1.67357 iter/s, 11.9505s/20 iters), loss = 0.000806865I0430 11:50:09.168778 23334 solver.cpp:237] Train net output #0: loss = 0.000806868 (* 1 = 0.000806868 loss)I0430 11:50:09.168787 23334 sgd_solver.cpp:105] Iteration 71020, lr = 0.001I0430 11:50:21.127496 23334 solver.cpp:218] Iteration 71040 (1.67241 iter/s, 11.9588s/20 iters), loss = 0.000182312I0430 11:50:21.127539 23334 solver.cpp:237] Train net output #0: loss = 0.000182314 (* 1 = 0.000182314 loss)I0430 11:50:21.127562 23334 sgd_solver.cpp:105] Iteration 71040, lr = 0.001I0430 11:50:33.248086 23334 solver.cpp:218] Iteration 71060 (1.65009 iter/s, 12.1206s/20 iters), loss = 0.000428604I0430 11:50:33.248260 23334 solver.cpp:237] Train net output #0: loss = 0.000428607 (* 1 = 0.000428607 loss)I0430 11:50:33.248272 23334 sgd_solver.cpp:105] Iteration 71060, lr = 0.001I0430 11:50:45.518955 23334 solver.cpp:218] Iteration 71080 (1.62989 iter/s, 12.2707s/20 iters), loss = 0.00108446I0430 11:50:45.519006 23334 solver.cpp:237] Train net output #0: loss = 0.00108447 (* 1 = 0.00108447 loss)I0430 11:50:45.519011 23334 sgd_solver.cpp:105] Iteration 71080, lr = 0.001I0430 11:50:51.287315 23341 data_layer.cpp:73] Restarting data prefetching from start.I0430 11:50:57.851781 23334 solver.cpp:218] Iteration 71100 (1.62169 iter/s, 12.3328s/20 iters), loss = 0.00150949I0430 11:50:57.851828 23334 solver.cpp:237] Train net output #0: loss = 0.0015095 (* 1 = 0.0015095 loss)I0430 11:50:57.851837 23334 sgd_solver.cpp:105] Iteration 71100, lr = 0.001I0430 11:51:09.912184 23334 solver.cpp:218] Iteration 71120 (1.65832 iter/s, 12.0604s/20 iters), loss = 0.00239335I0430 11:51:09.912330 23334 solver.cpp:237] Train net output #0: loss = 0.00239335 (* 1 = 0.00239335 loss)I0430 11:51:09.912340 23334 sgd_solver.cpp:105] Iteration 71120, lr = 0.001I0430 11:51:21.968586 23334 solver.cpp:218] Iteration 71140 (1.65888 iter/s, 12.0563s/20 iters), loss = 0.00161807I0430 11:51:21.968646 23334 solver.cpp:237] Train net output #0: loss = 0.00161808 (* 1 = 0.00161808 loss)I0430 11:51:21.968654 23334 sgd_solver.cpp:105] Iteration 71140, lr = 0.001
让我感到困惑的是损失值。我原本打算在损失值低于0.0001时停止训练我的网络,但有两个损失值:训练损失和测试损失。训练损失似乎稳定在0.0001左右,但测试损失却高达0.388,远超过我设定的阈值。我应该用哪个损失值来决定停止训练?
回答:
测试和训练性能之间存在如此大的差距,可能表明您对数据进行了过拟合。
验证集的目的是确保您不会过拟合。您应该使用验证集上的表现来决定是否停止训练或继续进行。