我实现了一个相当简单的深度神经网络来进行多标签分类。模型的概览如下(为了简化可视化,省略了偏置):
也就是说,这是一个由ReLU单元构成的三层深度神经网络,输出单元为Sigmoid。
损失函数为Sigmoid交叉熵,使用Adam作为优化器。
当我训练这个神经网络不使用Dropout时,得到以下结果:
#Placeholders x = tf.placeholder(tf.float32,[None,num_features],name='x') y = tf.placeholder(tf.float32,[None,num_classes],name='y') keep_prob = tf.placeholder(tf.float32,name='keep_prob') #Layer1 WRelu1 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu1') bRelu1 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu1') layer1 = tf.add(tf.matmul(x,WRelu1),bRelu1,name='layer1') relu1 = tf.nn.relu(layer1,name='relu1') #Layer2 WRelu2 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu2') bRelu2 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu2') layer2 = tf.add(tf.matmul(relu1,WRelu2),bRelu2,name='layer2') relu2 = tf.nn.relu(layer2,name='relu2') #Layer3 WRelu3 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu3') bRelu3 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu3') layer3 = tf.add(tf.matmul(relu2,WRelu3),bRelu3,name='layer3') relu3 = tf.nn.relu(tf.matmul(relu2,WRelu3) + bRelu3,name='relu3') #Out layer Wout = tf.Variable(tf.truncated_normal([num_features,num_classes],stddev=1.0),dtype=tf.float32,name='wout') bout = tf.Variable(tf.zeros([num_classes]),dtype=tf.float32,name='bout') logits = tf.add(tf.matmul(relu3,Wout),bout,name='logits') #Predictions logits_sigmoid = tf.nn.sigmoid(logits,name='logits_sigmoid') #Cost & Optimizer cost = tf.losses.sigmoid_cross_entropy(y,logits) optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)
在测试数据上的评估结果:
ROC AUC - 微平均: 0.6474180196222774ROC AUC - 宏平均: 0.6261438437099212精确度 - 微平均: 0.5112489722699753精确度 - 宏平均: 0.48922193879411413精确度 - 加权平均: 0.5131092162035961召回率 - 微平均: 0.584640369246549召回率 - 宏平均: 0.55746897003228召回率 - 加权平均: 0.584640369246549
当我训练这个神经网络添加Dropout层时,得到以下结果:
#Placeholders x = tf.placeholder(tf.float32,[None,num_features],name='x') y = tf.placeholder(tf.float32,[None,num_classes],name='y') keep_prob = tf.placeholder(tf.float32,name='keep_prob') #Layer1 WRelu1 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu1') bRelu1 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu1') layer1 = tf.add(tf.matmul(x,WRelu1),bRelu1,name='layer1') relu1 = tf.nn.relu(layer1,name='relu1') #DROPOUT relu1 = tf.nn.dropout(relu1,keep_prob=keep_prob,name='relu1drop') #Layer2 WRelu2 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu2') bRelu2 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu2') layer2 = tf.add(tf.matmul(relu1,WRelu2),bRelu2,name='layer2') relu2 = tf.nn.relu(layer2,name='relu2') #DROPOUT relu2 = tf.nn.dropout(relu2,keep_prob=keep_prob,name='relu2drop') #Layer3 WRelu3 = tf.Variable(tf.truncated_normal([num_features,num_features],stddev=1.0),dtype=tf.float32,name='wrelu3') bRelu3 = tf.Variable(tf.zeros([num_features]),dtype=tf.float32,name='brelu3') layer3 = tf.add(tf.matmul(relu2,WRelu3),bRelu3,name='layer3') relu3 = tf.nn.relu(tf.matmul(relu2,WRelu3) + bRelu3,name='relu3') #DROPOUT relu3 = tf.nn.dropout(relu3,keep_prob=keep_prob,name='relu3drop') #Out layer Wout = tf.Variable(tf.truncated_normal([num_features,num_classes],stddev=1.0),dtype=tf.float32,name='wout') bout = tf.Variable(tf.zeros([num_classes]),dtype=tf.float32,name='bout') logits = tf.add(tf.matmul(relu3,Wout),bout,name='logits') #Predictions logits_sigmoid = tf.nn.sigmoid(logits,name='logits_sigmoid') #Cost & Optimizer cost = tf.losses.sigmoid_cross_entropy(y,logits) optimizer = tf.train.AdamOptimizer(LEARNING_RATE).minimize(cost)
在测试数据上的评估结果:
ROC AUC - 微平均: 0.5ROC AUC - 宏平均: 0.5精确度 - 微平均: 0.34146163499985405精确度 - 宏平均: 0.34146163499985405精确度 - 加权平均: 0.3712475781926326召回率 - 微平均: 1.0召回率 - 宏平均: 1.0召回率 - 加权平均: 1.0
如您所见,在使用Dropout版本的召回率值中,神经网络的输出始终为1,每个样本的每个类别始终为正类。
确实,这不是一个简单的问题,但在应用Dropout之后,我至少期望得到与不使用Dropout时相似的结果,而不是更差的结果,当然也不是这种饱和输出。
为什么会发生这种情况?我如何避免这种行为?你在代码中看到有什么奇怪或不妥的地方吗?
超参数:
Dropout率:训练时0.5 / 推理时1.0
轮次:500
学习率:0.0001
数据集信息:
样本数量:超过22,000
类别数量:6
谢谢!
回答: