我用TensorFlow编写了以下多层感知器模型,但它未能进行训练。准确率保持在9%左右,相当于随机猜测,而交叉熵保持在2.56左右,且变化不大。
模型架构如下:
def create_model(fingerprint_input, model_settings, is_training): if is_training: dropout_prob = tf.placeholder(tf.float32, name='dropout_prob') fingerprint_size = model_settings['fingerprint_size'] label_count = model_settings['label_count'] weights_1 = tf.Variable(tf.truncated_normal([fingerprint_size, 128], stddev=0.001)) weights_2 = tf.Variable(tf.truncated_normal([128, 128], stddev=0.001)) weights_3 = tf.Variable(tf.truncated_normal([128, 128], stddev=0.001)) weights_out = tf.Variable(tf.truncated_normal([128, label_count], stddev=0.001)) bias_1 = tf.Variable(tf.zeros([128])) bias_2 = tf.Variable(tf.zeros([128])) bias_3 = tf.Variable(tf.zeros([128])) bias_out = tf.Variable(tf.zeros([label_count])) layer_1 = tf.matmul(fingerprint_input, weights_1) + bias_1 layer_1 = tf.nn.relu(layer_1) layer_2 = tf.matmul(layer_1, weights_2) + bias_2 layer_2 = tf.nn.relu(layer_2) layer_3 = tf.matmul(layer_2, weights_3) + bias_3 layer_3 = tf.nn.relu(layer_3) logits = tf.matmul(layer_3, weights_out) + bias_out if is_training: return logits, dropout_prob else: return logits
它接受输入大小为fingerprint_size
,标签大小为label_count
。它有三个隐藏层,每层有128个神经元。我遵循了TensorFlow关于语音数据集的示例,该示例为其他一切提供了框架。在文档中,我所需做的就是包含我自己的神经网络架构,我的代码需要定义这些参数并返回logits。
当我训练另一个预定义的架构时,使用相同的输入和输出,神经网络可以训练。但这个架构却无法训练。这里是一个预定义的架构:
def create_single_fc_model(fingerprint_input, model_settings, is_training): if is_training: dropout_prob = tf.placeholder(tf.float32, name='dropout_prob') fingerprint_size = model_settings['fingerprint_size'] label_count = model_settings['label_count'] weights = tf.Variable( tf.truncated_normal([fingerprint_size, label_count], stddev=0.001)) bias = tf.Variable(tf.zeros([label_count])) logits = tf.matmul(fingerprint_input, weights) + bias if is_training: return logits, dropout_prob else:return logits
学习率在前15000步为0.001,在最后3000步为0.0001。这些是默认设置。我也尝试过0.01和0.001,但结果相同。我认为问题出在上述实现中。
有什么想法吗?
提前感谢!
回答:
您可能遇到了梯度消失问题,您的变量被初始化为非常小的值(由stddev
参数控制),这种设置在一个层中有效,但在多层的情况下,会导致反向传播过程中梯度消失。尝试增加随机初始化权重变量的标准偏差,例如:
weights_n = tf.Variable(tf.truncated_normal([a, b], stddev=0.1))
并用非零值初始化偏置,例如:
bias_n = tf.Variable(tf.constant(0.1, shape=[b])))