神经网络：神秘的ReLu

我在开发一个编程语言检测器，即代码片段的分类器，作为一个更大项目的组成部分。我的基础模型非常简单：对输入进行标记化，并将代码片段编码为词袋或在这种情况下为标记袋，然后在这些特征之上构建一个简单的神经网络。

神经网络的输入是一个固定长度的数组，包含最具特色的标记的计数器，如"def"、"self"、"function"、"->"、"const"、"#include"等，这些标记是从语料库中自动提取的。想法是这些标记对于编程语言来说是非常独特的，因此即使是这种简单的处理方法也应该能获得高准确率得分。

输入:  def   1  for   2  in    2  True  1  ):    3  ,:    1  ...输出: python

设置

我很快就得到了99%的准确率，并认为这表明它按预期工作。以下是模型（完整可运行的脚本在这里）：

# 占位符x = tf.placeholder(shape=[None, vocab_size], dtype=tf.float32, name='x')y = tf.placeholder(shape=[None], dtype=tf.int32, name='y')training = tf.placeholder_with_default(False, shape=[], name='training')# 一个带有dropout的隐藏层reg = tf.contrib.layers.l2_regularizer(0.01)hidden1 = tf.layers.dense(x, units=96, kernel_regularizer=reg,                           activation=tf.nn.elu, name='hidden1')dropout1 = tf.layers.dropout(hidden1, rate=0.2, training=training, name='dropout1')# 输出层logits = tf.layers.dense(dropout1, units=classes, kernel_regularizer=reg,                         activation=tf.nn.relu, name='logits')# 交叉熵损失loss = tf.reduce_mean(    tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y))# 其他报告：准确率、正确/错误分类的样本等correct_predicted = tf.nn.in_top_k(logits, y, 1, name='in-top-k')prediction = tf.argmax(logits, axis=1)wrong_predicted = tf.logical_not(correct_predicted, name='not-in-top-k')x_misclassified = tf.boolean_mask(x, wrong_predicted, name='misclassified')accuracy = tf.reduce_mean(tf.cast(correct_predicted, tf.float32), name='accuracy')

输出结果非常令人鼓舞：

iteration=5  loss=2.580  train-acc=0.34277iteration=10  loss=2.029  train-acc=0.69434iteration=15  loss=2.054  train-acc=0.92383iteration=20  loss=1.934  train-acc=0.98926iteration=25  loss=1.942  train-acc=0.99609Files.VAL mean accuracy = 0.99121             <-- 在仅经过1个epoch之后！iteration=30  loss=1.943  train-acc=0.99414iteration=35  loss=1.947  train-acc=0.99512iteration=40  loss=1.946  train-acc=0.99707iteration=45  loss=1.946  train-acc=0.99609iteration=50  loss=1.944  train-acc=0.99902iteration=55  loss=1.946  train-acc=0.99902Files.VAL mean accuracy = 0.99414

测试准确率也接近1.0。一切看起来都很完美。

神秘的ReLu

但随后我注意到我在最终的密集层（logits）中放入了activation=tf.nn.relu，这显然是一个错误：在softmax之前没有必要丢弃负分数，因为它们表示概率较低的类。零阈值只会使这些类的人为概率更高，这将是一个错误。去掉它应该只会使模型在正确类别上更稳健和自信。

这就是我的想法。所以我用activation=None替换了它，再次运行模型，然后发生了一件令人惊讶的事情：性能没有改善。一点也没有。实际上，它显著下降了：

iteration=5  loss=5.236  train-acc=0.16602iteration=10  loss=4.068  train-acc=0.18750iteration=15  loss=3.110  train-acc=0.37402iteration=20  loss=5.149  train-acc=0.14844iteration=25  loss=2.880  train-acc=0.18262Files.VAL mean accuracy = 0.28711iteration=30  loss=3.136  train-acc=0.25781iteration=35  loss=2.916  train-acc=0.22852iteration=40  loss=2.156  train-acc=0.39062iteration=45  loss=1.777  train-acc=0.45312iteration=50  loss=2.726  train-acc=0.33105Files.VAL mean accuracy = 0.29362

准确率随着训练而有所改善，但从未超过91-92%。我多次更改激活函数，来回调整不同的参数（层大小、dropout、正则化器、额外层、任何东西），但结果始终相同：“错误”的模型立即达到99%，而“正确”的模型在50个epoch后仅勉强达到90%。根据tensorboard，权重分布没有太大差异：梯度没有消失，两种模型都正常学习。

这是怎么可能的？最终的ReLu如何能使模型如此优越？特别是当这个ReLu是一个错误的时候？

回答：

学技术

神经网络：神秘的ReLu

设置

神秘的ReLu

发表回复取消回复

设置

神秘的ReLu

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复