如何使用TensorFlow急切执行更新权重？

所以我尝试了TensorFlow的急切执行，我的实现并不成功。我使用了gradient.tape，虽然程序能够运行，但是没有任何权重显示出可见的更新。我看到了一些示例算法和教程中使用optimizer.apply_gradients()来更新所有变量，但我认为我没有正确使用它。

import tensorflow as tfimport tensorflow.contrib.eager as tfe# 启用急切执行tf.enable_eager_execution()# 设定超参数LEARNING_RATE = 20TRAINING_ITERATIONS = 3# 设定所有标签LABELS = tf.constant(tf.random_normal([3, 1]))# print(LABELS)# 输入的存根语句init = tf.Variable(tf.random_normal([3, 1]))# 声明并初始化所有权重weight1 = tfe.Variable(tf.random_normal([2, 3]))bias1 = tfe.Variable(tf.random_normal([2, 1]))weight2 = tfe.Variable(tf.random_normal([3, 2]))bias2 = tfe.Variable(tf.random_normal([3, 1]))weight3 = tfe.Variable(tf.random_normal([2, 3]))bias3 = tfe.Variable(tf.random_normal([2, 1]))weight4 = tfe.Variable(tf.random_normal([3, 2]))bias4 = tfe.Variable(tf.random_normal([3, 1]))weight5 = tfe.Variable(tf.random_normal([3, 3]))bias5 = tfe.Variable(tf.random_normal([3, 1]))VARIABLES = [weight1, bias1, weight2, bias2, weight3, bias3, weight4, bias4, weight5, bias5]def thanouseEyes(input):  # 神经网络模型，也称为：Thanouse的眼睛    layerResult = tf.nn.relu(tf.matmul(weight1, input) + bias1)    input = layerResult    layerResult = tf.nn.relu(tf.matmul(weight2, input) + bias2)    input = layerResult    layerResult = tf.nn.relu(tf.matmul(weight3, input) + bias3)    input = layerResult    layerResult = tf.nn.relu(tf.matmul(weight4, input) + bias4)    input = layerResult    layerResult = tf.nn.softmax(tf.matmul(weight5, input) + bias5)    return layerResult# 开始训练并更新变量optimizer = tf.train.AdamOptimizer(LEARNING_RATE)with tf.GradientTape(persistent=True) as tape:  # 梯度计算    for i in range(TRAINING_ITERATIONS):        COST = tf.reduce_sum(LABELS - thanouseEyes(init))        GRADIENTS = tape.gradient(COST, VARIABLES)        optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))        print(weight1)

回答：

optimizer的使用似乎没问题，然而由thanouseEyes()定义的计算无论变量如何变化，总是返回[1., 1., 1.]，因此梯度总是0，所以变量永远不会被更新（print(thanouseEyes(init))和print(GRADIENTS)应该能证明这一点）。

进一步探究，tf.nn.softmax被应用于x = tf.matmul(weight5, input) + bias5，其形状为[3, 1]。所以tf.nn.softmax(x)实际上是计算[softmax(x[0]), softmax(x[1]), softmax(x[2])]，因为tf.nn.softmax默认在输入的最后一个轴上应用。x[0]、x[1]和x[2]都是只有一个元素的向量，所以softmax(x[i])总是1.0。

希望这对你有帮助。

一些与你的问题无关但你可能感兴趣的额外点：

从TensorFlow 1.11开始，你的程序中不需要tf.contrib.eager模块。将所有tfe替换为tf（即，用tf.Variable代替tfe.Variable），你会得到相同的结果
在GradientTape的上下文中执行的计算会被“记录”，即，它会保留中间张量，以便以后可以计算梯度。长话短说，你应该将GradientTape移到循环体内：

–

for i in range(TRAINING_ITERATIONS):    with tf.GradientTape() as tape:        COST = tf.reduce_sum(LABELS - thanouseEyes(init))    GRADIENTS = tape.gradient(COST, VARIABLES)    optimizer.apply_gradients(zip(GRADIENTS, VARIABLES))

学技术

如何使用TensorFlow急切执行更新权重？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复