我在尝试启动分布式Tensorflow时遇到了以下错误。我的代码如下所示:
sv = tf.train.Supervisor(is_chief=(task_index == 0), logdir="/tmp/train_logs", init_op=init_op, summary_op=summary_op, saver=saver, global_step=global_step, save_model_secs=600)with sv.managed_session(server.target) as sess:step = 0while not sv.should_stop() and step < nnc.steps: mini_batches = random_mini_batches(x_train, y_train, mini_batch_size) for mini_batch in mini_batches: (batch_x, batch_y) = mini_batch _, step = sess.run([train_op, global_step], feed_dict={x: batch_x, y: batch_y})
当我遇到错误时,它是在random_mini_batches
函数上失败的。但我完全不明白这是怎么发生的以及为什么会这样。random_mini_batches
函数是用纯Python和numpy编写的,与TensorFlow无关。x_train
和y_train
之前没有被使用过。
这是我得到的错误信息:
File "/Users/curr_user/PycharmProjects/curr_project/src/nn.py", line 36, in random_mini_batches num_complete_minibatches = int(math.floor(m / mini_batch_size)) # number of mini batches of size mini_batch_size File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 880, in r_binary_op_wrapper x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x") File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 611, in convert_to_tensor as_ref=False) File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function return constant(v, dtype=dtype, name=name) File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 106, in constant attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0] File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2582, in create_op self._check_not_finalized() File "/Users/curr_user/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2290, in _check_not_finalized raise RuntimeError("Graph is finalized and cannot be modified.")
任何帮助将不胜感激!谢谢
回答:
这不在你的问题中,但我认为mini_batch_size
是一个常量张量。尽管random_mini_batches
是用纯Python和numpy编写的,TensorFlow重载了许多与张量相关的操作符,因此这一行
num_complete_minibatches = int(math.floor(m / mini_batch_size))
实际上是在对张量执行__div__
操作,这会强制将m
也转换为张量。但tf.train.Supervisor()
会强制图最终确定,即不能再创建节点,因此转换失败了。
解决方案是将mini_batch_size
设为普通常量,并确保random_mini_batches
内部不使用张量。