我正在进行Udacity深度学习课程的第三次作业。我已经有一个带有一个隐藏层的可运行的神经网络。然而,当我添加第二个隐藏层时,损失值变成了nan
。
这是图形代码:
num_nodes_layer_1 = 1024num_nodes_layer_2 = 128num_inputs = 28 * 28num_labels = 10batch_size = 128graph = tf.Graph()with graph.as_default(): # 输入数据。对于训练数据,我们使用一个占位符,在运行时将使用训练小批量数据填充。 tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, num_inputs)) tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels)) tf_valid_dataset = tf.constant(valid_dataset) tf_test_dataset = tf.constant(test_dataset) # 变量 # 隐藏层 1 hidden_weights_1 = tf.Variable(tf.truncated_normal([num_inputs, num_nodes_layer_1])) hidden_biases_1 = tf.Variable(tf.zeros([num_nodes_layer_1])) # 隐藏层 2 hidden_weights_2 = tf.Variable(tf.truncated_normal([num_nodes_layer_1, num_nodes_layer_2])) hidden_biases_2 = tf.Variable(tf.zeros([num_nodes_layer_2])) # 线性层 weights = tf.Variable(tf.truncated_normal([num_nodes_layer_2, num_labels])) biases = tf.Variable(tf.zeros([num_labels])) # 训练计算。 y1 = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights_1) + hidden_biases_1) y2 = tf.nn.relu(tf.matmul(y1, hidden_weights_2) + hidden_biases_2) logits = tf.matmul(y2, weights) + biases # 计算损失 loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits)) # 优化器。 # 我们将使用梯度下降法来寻找此损失的最小值。 optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss) # 训练、验证和测试数据的预测。 # 这些不属于训练的一部分,只是为了在训练时报告准确率。 train_prediction = tf.nn.softmax(logits) y1_valid = tf.nn.relu(tf.matmul(tf_valid_dataset, hidden_weights_1) + hidden_biases_1) y2_valid = tf.nn.relu(tf.matmul(y1_valid, hidden_weights_2) + hidden_biases_2) valid_prediction = tf.nn.softmax(tf.matmul(y2_valid, weights) + biases) y1_test = tf.nn.relu(tf.matmul(tf_test_dataset, hidden_weights_1) + hidden_biases_1) y2_test = tf.nn.relu(tf.matmul(y1_test, hidden_weights_2) + hidden_biases_2) test_prediction = tf.nn.softmax(tf.matmul(y2_test, weights) + biases)
它不会报错。但是在第一次之后,损失值无法打印并且无法学习。
InitializedMinibatch loss at step 0: 2133.468750Minibatch accuracy: 8.6%Validation accuracy: 10.0%Minibatch loss at step 400: nanMinibatch accuracy: 9.4%Validation accuracy: 10.0%Minibatch loss at step 800: nanMinibatch accuracy: 11.7%Validation accuracy: 10.0%Minibatch loss at step 1200: nanMinibatch accuracy: 4.7%Validation accuracy: 10.0%Minibatch loss at step 1600: nanMinibatch accuracy: 7.8%Validation accuracy: 10.0%Minibatch loss at step 2000: nanMinibatch accuracy: 6.2%Validation accuracy: 10.0%Test accuracy: 10.0%
当我移除第二个层时,它可以训练,并且准确率大约为85%。我认为添加第二个层后,准确率应该在80%到90%之间。
我使用的是错误的优化器吗?还是我遗漏了什么简单的问题?
这是会话代码:
num_steps = 2001with tf.Session(graph=graph) as session: tf.global_variables_initializer().run() print("Initialized") for step in range(num_steps): # 在训练数据中选择一个偏移量,这些数据已经被随机化。 # 注意:我们可以在不同周期之间使用更好的随机化方法。 offset = (step * batch_size) % (train_labels.shape[0] - batch_size) # 生成一个小批量数据。 batch_data = train_dataset[offset:(offset + batch_size), :] batch_labels = train_labels[offset:(offset + batch_size), :] # 准备一个字典,告诉会话在哪里输入小批量数据。 # 字典的键是图形中要填充的占位符节点, # 值是要填充给它的numpy数组。 feed_dict = { tf_train_dataset : batch_data, tf_train_labels : batch_labels, } _, l, predictions = session.run( [optimizer, loss, train_prediction], feed_dict=feed_dict) if (step % 400 == 0): print("Minibatch loss at step %d: %f" % (step, l)) print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels)) print("Validation accuracy: %.1f%%" % accuracy(valid_prediction.eval(), valid_labels)) acc = accuracy(test_prediction.eval(), test_labels) print("Test accuracy: %.1f%%" % acc)
回答:
你的学习率0.5
太高了,将其设置为0.05
,它就会收敛。
Minibatch loss at step 0: 1506.469238Minibatch loss at step 400: 7796.088867Minibatch loss at step 800: 9893.363281Minibatch loss at step 1200: 5089.553711Minibatch loss at step 1600: 6148.481445Minibatch loss at step 2000: 5257.598145Minibatch loss at step 2400: 1716.116455Minibatch loss at step 2800: 1600.826538Minibatch loss at step 3200: 941.884766Minibatch loss at step 3600: 1033.936768Minibatch loss at step 4000: 1808.775757Minibatch loss at step 4400: 113.909866Minibatch loss at step 4800: 49.800560Minibatch loss at step 5200: 20.392700Minibatch loss at step 5600: 6.253595Minibatch loss at step 6000: 4.372780Minibatch loss at step 6400: 6.862935Minibatch loss at step 6800: 6.951239Minibatch loss at step 7200: 3.528607Minibatch loss at step 7600: 2.968611Minibatch loss at step 8000: 3.164592...Minibatch loss at step 19200: 2.141401
此外,这里有一些建议:
-
tf_train_dataset
和tf_train_labels
应该设置为tf.placeholders
,形状为[None, 784]
。使用None
维度可以让你在训练过程中改变批量大小,而不是限制在像128
这样的固定数量上。 -
不要将
tf_valid_dataset
和tf_test_dataset
设置为tf.constant
,而是将你的验证和测试数据集直接传递到相应的feed_dict
中,这样可以去掉图形末尾用于验证和测试准确率的额外操作。 -
我建议从单独的验证和测试数据批次中抽样,而不是在每次检查验证/测试准确率时使用同一批数据。