我在训练一个深度神经网络时,针对-1.0到1.0之间的回归目标值,学习率为0.001,训练/验证样本数为19200/4800,得到以下日志,我对此有些怀疑:
____________________________________________________________________________________________________Layer (type) Output Shape Param # Connected to====================================================================================================cropping2d_1 (Cropping2D) (None, 138, 320, 3) 0 cropping2d_input_1[0][0]____________________________________________________________________________________________________lambda_1 (Lambda) (None, 66, 200, 3) 0 cropping2d_1[0][0]____________________________________________________________________________________________________lambda_2 (Lambda) (None, 66, 200, 3) 0 lambda_1[0][0]____________________________________________________________________________________________________convolution2d_1 (Convolution2D) (None, 31, 98, 24) 1824 lambda_2[0][0]____________________________________________________________________________________________________spatialdropout2d_1 (SpatialDropo (None, 31, 98, 24) 0 convolution2d_1[0][0]____________________________________________________________________________________________________convolution2d_2 (Convolution2D) (None, 14, 47, 36) 21636 spatialdropout2d_1[0][0]____________________________________________________________________________________________________spatialdropout2d_2 (SpatialDropo (None, 14, 47, 36) 0 convolution2d_2[0][0]____________________________________________________________________________________________________convolution2d_3 (Convolution2D) (None, 5, 22, 48) 43248 spatialdropout2d_2[0][0]____________________________________________________________________________________________________spatialdropout2d_3 (SpatialDropo (None, 5, 22, 48) 0 convolution2d_3[0][0]____________________________________________________________________________________________________convolution2d_4 (Convolution2D) (None, 3, 20, 64) 27712 spatialdropout2d_3[0][0]____________________________________________________________________________________________________spatialdropout2d_4 (SpatialDropo (None, 3, 20, 64) 0 convolution2d_4[0][0]____________________________________________________________________________________________________convolution2d_5 (Convolution2D) (None, 1, 18, 64) 36928 spatialdropout2d_4[0][0]____________________________________________________________________________________________________spatialdropout2d_5 (SpatialDropo (None, 1, 18, 64) 0 convolution2d_5[0][0]____________________________________________________________________________________________________flatten_1 (Flatten) (None, 1152) 0 spatialdropout2d_5[0][0]____________________________________________________________________________________________________dropout_1 (Dropout) (None, 1152) 0 flatten_1[0][0]____________________________________________________________________________________________________activation_1 (Activation) (None, 1152) 0 dropout_1[0][0]____________________________________________________________________________________________________dense_1 (Dense) (None, 100) 115300 activation_1[0][0]____________________________________________________________________________________________________dropout_2 (Dropout) (None, 100) 0 dense_1[0][0]____________________________________________________________________________________________________dense_2 (Dense) (None, 50) 5050 dropout_2[0][0]____________________________________________________________________________________________________dense_3 (Dense) (None, 10) 510 dense_2[0][0]____________________________________________________________________________________________________dropout_3 (Dropout) (None, 10) 0 dense_3[0][0]____________________________________________________________________________________________________dense_4 (Dense) (None, 1) 11 dropout_3[0][0]====================================================================================================Total params: 252,219Trainable params: 252,219Non-trainable params: 0____________________________________________________________________________________________________NoneEpoch 1/519200/19200 [==============================] - 795s - loss: 0.0292 - val_loss: 0.0128Epoch 2/519200/19200 [==============================] - 754s - loss: 0.0169 - val_loss: 0.0120Epoch 3/519200/19200 [==============================] - 753s - loss: 0.0161 - val_loss: 0.0114Epoch 4/519200/19200 [==============================] - 723s - loss: 0.0154 - val_loss: 0.0100Epoch 5/519200/19200 [==============================] - 1597s - loss: 0.0151 - val_loss: 0.0098
训练和验证损失都在下降,这乍一看是好消息。但为什么在第一轮训练中训练损失就已经这么低了?而且验证损失怎么会更低?这是否表明我的模型或训练设置中存在系统性错误?
回答:
实际上,验证损失低于训练损失并不是一个罕见现象。它可能发生在例如验证数据中的所有示例都被训练集中的示例很好地“覆盖”,并且你的网络已经学会了数据集的实际结构时。
当你的数据结构并不十分复杂时,这种情况经常发生。实际上,你在第一轮训练后就感到惊讶的低损失值可能就是这种情况的线索。
关于损失值太小的问题——你没有说明你的损失函数是什么,但假设你的任务是回归,我猜测它是mse
——在这种情况下,均方误差达到0.01
的水平,意味着真实值和预测值之间的平均欧几里得距离等于0.1
,这是你值域[-1, 1]
直径的5%
。那么,这个误差真的很小吗?
你也没有说明在一个epoch中分析的批次数量。也许如果你的数据结构不那么复杂,并且批量大小较小,一个epoch就足以很好地学习你的数据。
为了检查你的模型是否训练得当,我建议你绘制一个相关性图
,在图中你可以将y_pred
绘制在例如X轴上,将y_true
绘制在Y轴上。这样你就能真正看到你的模型训练得如何了。
编辑:正如Neil提到的,验证误差小的原因可能还有更多,比如案例分离得不好。我还想补充一点,由于5个epoch总共不超过90分钟,也许最好通过使用经典的交叉验证方案(例如5折)来检查模型的结果。这样可以确保在你的数据集的情况下,你的模型表现良好。