我正在训练一个卷积神经网络,用于10,000张灰度图像。该网络有6个卷积层,一个全连接层和一个输出层。
当我开始训练时,损失值非常高但稳定下降,然而我的准确率从1.0开始也随之下降,并且在72%到30%之间波动。有时会再次上升。此外,当我在未见过的图像上运行acc.eval({x: test_images, y: test_lables})
时,准确率大约为16%。
此外,我有6个类别,所有类别都使用了一热编码。
我认为我可能在比较预测输出时出现了错误,但我在代码中找不到错误…
这是我的代码
pred = convolutional_network(x)loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y, logits = pred))train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)prediction = tf.nn.softmax(pred)correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))acc = tf.reduce_mean(tf.cast(correct, 'float'))with tf.Session() as sess: sess.run(tf.global_variables_initializer()) # 初始化所有变量 saver = tf.train.Saver() time_full_start = time.clock() print("RUNNING SESSION...") for epoch in range(num_epochs): train_batch_x = [] train_batch_y = [] epoch_loss = 0 i = 0 while i < len(images): start = i end = i+ batch_size train_batch_x = images[start:end] train_batch_y = labels[start:end] op , ac, loss_value = sess.run([train_op, acc, loss], feed_dict={x: train_batch_x, y: train_batch_y}) epoch_loss += loss_value i += batch_size print('Epoch : ', epoch+1, ' of ', num_epochs, ' - Loss for epoch: ', epoch_loss, ' Accuracy: ', ac) time_full_end = time.clock() print('Full time elapse:', time_full_end - time_full_start) print('Accuracy:', acc.eval({x: test_images, y: test_labels})) save_path = saver.save(sess, MODEL_PATH) print("Model saved in file: " , save_path)
以下是输出结果
Epoch : 1 of 100 - Loss for epoch: 8.94737603121e+13 Accuracy: 1.0Epoch : 2 of 100 - Loss for epoch: 212052447727.0 Accuracy: 1.0Epoch : 3 of 100 - Loss for epoch: 75150603462.2 Accuracy: 1.0Epoch : 4 of 100 - Loss for epoch: 68164116617.4 Accuracy: 1.0Epoch : 5 of 100 - Loss for epoch: 18505190718.8 Accuracy: 0.99Epoch : 6 of 100 - Loss for epoch: 11373286689.0 Accuracy: 0.96Epoch : 7 of 100 - Loss for epoch: 3129798657.75 Accuracy: 0.07Epoch : 8 of 100 - Loss for epoch: 374790121.375 Accuracy: 0.58Epoch : 9 of 100 - Loss for epoch: 105383792.938 Accuracy: 0.72Epoch : 10 of 100 - Loss for epoch: 49705202.4844 Accuracy: 0.66Epoch : 11 of 100 - Loss for epoch: 30214170.7909 Accuracy: 0.36Epoch : 12 of 100 - Loss for epoch: 18653020.5084 Accuracy: 0.82Epoch : 13 of 100 - Loss for epoch: 14793638.35 Accuracy: 0.39Epoch : 14 of 100 - Loss for epoch: 10196079.7003 Accuracy: 0.73Epoch : 15 of 100 - Loss for epoch: 6727522.37319 Accuracy: 0.47Epoch : 16 of 100 - Loss for epoch: 4593769.05838 Accuracy: 0.68Epoch : 17 of 100 - Loss for epoch: 3669332.09406 Accuracy: 0.44Epoch : 18 of 100 - Loss for epoch: 2850924.81662 Accuracy: 0.59Epoch : 19 of 100 - Loss for epoch: 1780678.12892 Accuracy: 0.51Epoch : 20 of 100 - Loss for epoch: 1855037.40652 Accuracy: 0.61Epoch : 21 of 100 - Loss for epoch: 1012934.52827 Accuracy: 0.53Epoch : 22 of 100 - Loss for epoch: 649319.432669 Accuracy: 0.55Epoch : 23 of 100 - Loss for epoch: 841660.786938 Accuracy: 0.57Epoch : 24 of 100 - Loss for epoch: 490148.861691 Accuracy: 0.55Epoch : 25 of 100 - Loss for epoch: 397315.021568 Accuracy: 0.5 ......................Epoch : 99 of 100 - Loss for epoch: 4412.61703086 Accuracy: 0.57Epoch : 100 of 100 - Loss for epoch: 4530.96991658 Accuracy: 0.62Full time elapse: 794.5787720000001**Test Accuracy: 0.158095**
我尝试了多种学习率和网络大小,但似乎无法使其正常工作。任何帮助都将不胜感激
回答:
请注意,我的回答还包括审查和调试完整代码(在问题中不可见)的信息。尽管如此,我认为下面的问题具有足够的普遍性,如果有人遇到类似问题,值得审查 – 你可能在这里找到解决方案!
极高的损失值可能意味着你没有正确地将输入图像从int8转换为较小的float32值(实际上,他做到了),而且你也没有使用批量归一化和/或正则化(实际上,两者都缺失)。此外,在这段代码中:
prediction = tf.nn.softmax(pred)correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
计算softmax值完全没有必要,因为softmax是一个严格单调的函数,它只会缩放预测,pred
中最大的值将是prediction
中最大的值,你可以通过以下方式得到相同的结果
correct = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
鉴于你的网络运行在极高的值上,tf.nn.softmax()
在进行指数运算和除以总和时,可能会无意中将所有值降为零,然后tf.argmax()
只会选择第0类,直到数值稍微下降。此外,你没有累积ac
:
op , ac, loss_value = sess.run([train_op, acc, loss], feed_dict={x: train_batch_x, y: train_batch_y})
因此,你打印的epoch accuracy
并不是这样的,它只是最后一个批次的准确率。如果你的图像是按类别排序的,并且你没有随机化批次,那么在每个epoch结束时你可能会得到第0类的图像。这可以解释为什么在前几个epoch中你会得到100%的准确率,直到超高数值稍微下降,softmax不再将所有值归零。(实际上情况确实如此。)
即使修复了上述问题,网络仍然没有学到任何东西。结果表明,当他添加随机化时,图像和标签被不同地随机化,自然导致恒定的1/6准确率。
修复了所有问题后,网络能够在100个epoch后在这个任务上学习到98%的准确率。
Epoch: 100/100 loss: 6.20184610883 total loss: 25.4021390676 acc: 97.976191%