TensorFlow – 在网络中同时引入L2正则化和dropout,这有意义吗?

我目前正在学习Udacity深度学习课程中的ANN部分。

我成功构建并训练了网络,并在所有权重和偏置上引入了L2正则化。现在我正在尝试在隐藏层上使用dropout来提高泛化能力。我想知道,同时在隐藏层上引入L2正则化和dropout是否有意义?如果有,应该如何正确操作呢?

在使用dropout时,我们实际上关闭了隐藏层一半的激活,并将剩余神经元的输出量加倍。而使用L2时,我们计算所有隐藏权重的L2范数。但我不确定在使用dropout的情况下如何计算L2。我们关闭了一些激活,是不是应该将当前未使用的权重从L2计算中移除?在这方面的任何参考资料都会很有用,我还没有找到相关信息。

以防你感兴趣,我的带有L2正则化的ANN代码如下:

#for NeuralNetwork model code is below#We will use SGD for training to save our time. Code is from Assignment 2#beta is the new parameter - controls level of regularization. Default is 0.01#but feel free to play with it#notice, we introduce L2 for both biases and weights of all layersbeta = 0.01#building tensorflow graphgraph = tf.Graph()with graph.as_default():      # Input data. For the training data, we use a placeholder that will be fed  # at run time with a training minibatch.  tf_train_dataset = tf.placeholder(tf.float32,                                    shape=(batch_size, image_size * image_size))  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))  tf_valid_dataset = tf.constant(valid_dataset)  tf_test_dataset = tf.constant(test_dataset)  #now let's build our new hidden layer  #that's how many hidden neurons we want  num_hidden_neurons = 1024  #its weights  hidden_weights = tf.Variable(    tf.truncated_normal([image_size * image_size, num_hidden_neurons]))  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))  #now the layer itself. It multiplies data by weights, adds biases  #and takes ReLU over result  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)  #time to go for output linear layer  #out weights connect hidden neurons to output labels  #biases are added to output labels    out_weights = tf.Variable(    tf.truncated_normal([num_hidden_neurons, num_labels]))    out_biases = tf.Variable(tf.zeros([num_labels]))    #compute output    out_layer = tf.matmul(hidden_layer,out_weights) + out_biases  #our real output is a softmax of prior result  #and we also compute its cross-entropy to get our loss  #Notice - we introduce our L2 here  loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(    out_layer, tf_train_labels) +    beta*tf.nn.l2_loss(hidden_weights) +    beta*tf.nn.l2_loss(hidden_biases) +    beta*tf.nn.l2_loss(out_weights) +    beta*tf.nn.l2_loss(out_biases)))  #now we just minimize this loss to actually train the network  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)  #nice, now let's calculate the predictions on each dataset for evaluating the  #performance so far  # Predictions for the training, validation, and test data.  train_prediction = tf.nn.softmax(out_layer)  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases)   test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)#now is the actual training on the ANN we built#we will run it for some number of steps and evaluate the progress after #every 500 steps#number of steps we will train our ANNnum_steps = 3001#actual trainingwith tf.Session(graph=graph) as session:  tf.initialize_all_variables().run()  print("Initialized")  for step in range(num_steps):    # Pick an offset within the training data, which has been randomized.    # Note: we could use better randomization across epochs.    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)    # Generate a minibatch.    batch_data = train_dataset[offset:(offset + batch_size), :]    batch_labels = train_labels[offset:(offset + batch_size), :]    # Prepare a dictionary telling the session where to feed the minibatch.    # The key of the dictionary is the placeholder node of the graph to be fed,    # and the value is the numpy array to feed to it.    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}    _, l, predictions = session.run(      [optimizer, loss, train_prediction], feed_dict=feed_dict)    if (step % 500 == 0):      print("Minibatch loss at step %d: %f" % (step, l))      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))      print("Validation accuracy: %.1f%%" % accuracy(        valid_prediction.eval(), valid_labels))      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

回答:

经过一番努力,我成功地解决了这个问题,并在我的网络中同时引入了L2和dropout,代码如下。我在相同网络(已应用L2)中加入dropout后,效果略有提升。尽管我还不确定是否值得同时引入L2和dropout,但至少它有效,并且稍微改善了结果。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注