TensorFlow – 在网络中同时引入L2正则化和dropout，这有意义吗？

我目前正在学习Udacity深度学习课程中的ANN部分。

我成功构建并训练了网络，并在所有权重和偏置上引入了L2正则化。现在我正在尝试在隐藏层上使用dropout来提高泛化能力。我想知道，同时在隐藏层上引入L2正则化和dropout是否有意义？如果有，应该如何正确操作呢？

在使用dropout时，我们实际上关闭了隐藏层一半的激活，并将剩余神经元的输出量加倍。而使用L2时，我们计算所有隐藏权重的L2范数。但我不确定在使用dropout的情况下如何计算L2。我们关闭了一些激活，是不是应该将当前未使用的权重从L2计算中移除？在这方面的任何参考资料都会很有用，我还没有找到相关信息。

以防你感兴趣，我的带有L2正则化的ANN代码如下：

#for NeuralNetwork model code is below#We will use SGD for training to save our time. Code is from Assignment 2#beta is the new parameter - controls level of regularization. Default is 0.01#but feel free to play with it#notice, we introduce L2 for both biases and weights of all layersbeta = 0.01#building tensorflow graphgraph = tf.Graph()with graph.as_default():      # Input data. For the training data, we use a placeholder that will be fed  # at run time with a training minibatch.  tf_train_dataset = tf.placeholder(tf.float32,                                    shape=(batch_size, image_size * image_size))  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))  tf_valid_dataset = tf.constant(valid_dataset)  tf_test_dataset = tf.constant(test_dataset)  #now let's build our new hidden layer  #that's how many hidden neurons we want  num_hidden_neurons = 1024  #its weights  hidden_weights = tf.Variable(    tf.truncated_normal([image_size * image_size, num_hidden_neurons]))  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))  #now the layer itself. It multiplies data by weights, adds biases  #and takes ReLU over result  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)  #time to go for output linear layer  #out weights connect hidden neurons to output labels  #biases are added to output labels    out_weights = tf.Variable(    tf.truncated_normal([num_hidden_neurons, num_labels]))    out_biases = tf.Variable(tf.zeros([num_labels]))    #compute output    out_layer = tf.matmul(hidden_layer,out_weights) + out_biases  #our real output is a softmax of prior result  #and we also compute its cross-entropy to get our loss  #Notice - we introduce our L2 here  loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(    out_layer, tf_train_labels) +    beta*tf.nn.l2_loss(hidden_weights) +    beta*tf.nn.l2_loss(hidden_biases) +    beta*tf.nn.l2_loss(out_weights) +    beta*tf.nn.l2_loss(out_biases)))  #now we just minimize this loss to actually train the network  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)  #nice, now let's calculate the predictions on each dataset for evaluating the  #performance so far  # Predictions for the training, validation, and test data.  train_prediction = tf.nn.softmax(out_layer)  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases)   test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)#now is the actual training on the ANN we built#we will run it for some number of steps and evaluate the progress after #every 500 steps#number of steps we will train our ANNnum_steps = 3001#actual trainingwith tf.Session(graph=graph) as session:  tf.initialize_all_variables().run()  print("Initialized")  for step in range(num_steps):    # Pick an offset within the training data, which has been randomized.    # Note: we could use better randomization across epochs.    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)    # Generate a minibatch.    batch_data = train_dataset[offset:(offset + batch_size), :]    batch_labels = train_labels[offset:(offset + batch_size), :]    # Prepare a dictionary telling the session where to feed the minibatch.    # The key of the dictionary is the placeholder node of the graph to be fed,    # and the value is the numpy array to feed to it.    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}    _, l, predictions = session.run(      [optimizer, loss, train_prediction], feed_dict=feed_dict)    if (step % 500 == 0):      print("Minibatch loss at step %d: %f" % (step, l))      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))      print("Validation accuracy: %.1f%%" % accuracy(        valid_prediction.eval(), valid_labels))      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

回答：

经过一番努力，我成功地解决了这个问题，并在我的网络中同时引入了L2和dropout，代码如下。我在相同网络（已应用L2）中加入dropout后，效果略有提升。尽管我还不确定是否值得同时引入L2和dropout，但至少它有效，并且稍微改善了结果。

学技术

TensorFlow – 在网络中同时引入L2正则化和dropout，这有意义吗？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复