TensorFlow 1.0 上的多 GPU 似乎不起作用

我正在使用 TensorFlow 1.0，并且开发了一个简单的程序来测量性能。我有一个如下所示的简单模型：

def model(example_batch):    h1 = tf.layers.dense(inputs=example_batch, units=64, activation=tf.nn.relu)    h2 = tf.layers.dense(inputs=h1, units=2)    return h2

以及一个用于运行模拟的简单函数：

def testPerformanceFromMemory(model, iter=1000 num_cores=2):  example_batch = tf.placeholder(np.float32, shape=(64, 128))  for core in range(num_cores):    with tf.device('/gpu:%d'%core):      prediction = model(example_batch)  init_op = tf.global_variables_initializer()  sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))  sess.run(init_op)  tf.train.start_queue_runners(sess=sess)  input_array = np.random.random((64,128))  for step in range(iter):    myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

如果我运行 Python 脚本，然后运行 nvidia-smi 命令，我可以看到 GPU0 运行时的使用率很高，但 GPU1 的使用率为 0%。

我阅读了以下内容：https://www.tensorflow.org/tutorials/using_gpu 和 https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py，但我不知道为什么我的示例不能在多 GPU 上运行。

附注：如果我从 TensorFlow 仓库下载 cifar10 示例，它会在多 GPU 模式下运行。

编辑：正如 @[隐藏人名] 所说，我重复覆盖了 prediction，所以我在这里发布正确的做法：

def testPerformanceFromMemory(model, iter=1000 num_cores=2):  example_batch = tf.placeholder(np.float32, shape=(64, 128))  prediction = []  for core in range(num_cores):    with tf.device('/gpu:%d'%core):      prediction.append([model(example_batch)])  init_op = tf.global_variables_initializer()  sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))  sess.run(init_op)  tf.train.start_queue_runners(sess=sess)  input_array = np.random.random((64,128))  for step in range(iter):    myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

回答：

看你的程序，你在不同的 GPU 设备上创建了几个并行的子图（通常称为“塔”），但在第一个 for 循环的每次迭代中覆盖了 prediction 张量：

for core in range(num_cores):  with tf.device('/gpu:%d'%core):    prediction = model(example_batch)# ...for step in range(iter):  myprediction = sess.run(prediction, feed_dict={example_batch:input_array})

结果，当你调用 sess.run(prediction, ...) 时，你只会运行第一个 for 循环的最后一次迭代中创建的子图，而该子图只在一个 GPU 上运行。

学技术

TensorFlow 1.0 上的多 GPU 似乎不起作用

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复