Tensorflow 在被 Ray 工作进程调用时无法检测到 GPU

当我尝试使用以下代码示例结合 Ray 使用 Tensorflow 时，Tensorflow 在被 “远程” 工作进程调用时无法检测到我的机器上的 GPU，但当在 “本地” 调用时却能找到 GPU。我用引号标注 “远程” 和 “本地”，因为所有操作都在我的桌面上运行，我的桌面有两个 GPU，并且运行的是 Ubuntu 16.04，我使用 tensorflow-gpu Anaconda 包安装了 Tensorflow。

local_network 似乎负责在日志中生成这些消息：

2018-01-26 17:24:33.149634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Quadro M5000, pci bus id: 0000:03:00.0)2018-01-26 17:24:33.149642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Quadro M5000, pci bus id: 0000:04:00.0)

而 remote_network 似乎负责生成这条消息：

2018-01-26 17:24:34.309270: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_NO_DEVICE

为什么 Tensorflow 在一种情况下能检测到 GPU，而在另一种情况下却不能呢？

import tensorflow as tfimport numpy as npimport rayray.init()BATCH_SIZE = 100NUM_BATCHES = 1NUM_ITERS = 201class Network(object):    def __init__(self, x, y):        # Seed TensorFlow to make the script deterministic.        tf.set_random_seed(0)        # Define the inputs.        x_data = tf.constant(x, dtype=tf.float32)        y_data = tf.constant(y, dtype=tf.float32)        # Define the weights and computation.        w = tf.Variable(tf.random_uniform([1], -1.0, 1.0))        b = tf.Variable(tf.zeros([1]))        y = w * x_data + b        # Define the loss.        self.loss = tf.reduce_mean(tf.square(y - y_data))        optimizer = tf.train.GradientDescentOptimizer(0.5)        self.grads = optimizer.compute_gradients(self.loss)        self.train = optimizer.apply_gradients(self.grads)        # Define the weight initializer and session.        init = tf.global_variables_initializer()        self.sess = tf.Session()        # Additional code for setting and getting the weights        self.variables = ray.experimental.TensorFlowVariables(self.loss, self.sess)        # Return all of the data needed to use the network.        self.sess.run(init)    # Define a remote function that trains the network for one step and returns the    # new weights.    def step(self, weights):        # Set the weights in the network.        self.variables.set_weights(weights)        # Do one step of training. We only need the actual gradients so we filter over the list.        actual_grads = self.sess.run([grad[0] for grad in self.grads])        return actual_grads    def get_weights(self):        return self.variables.get_weights()# Define a remote function for generating fake data.@ray.remote(num_return_vals=2)def generate_fake_x_y_data(num_data, seed=0):    # Seed numpy to make the script deterministic.    np.random.seed(seed)    x = np.random.rand(num_data)    y = x * 0.1 + 0.3    return x, y# Generate some training data.batch_ids = [generate_fake_x_y_data.remote(BATCH_SIZE, seed=i) for i in range(NUM_BATCHES)]x_ids = [x_id for x_id, y_id in batch_ids]y_ids = [y_id for x_id, y_id in batch_ids]# Generate some test data.x_test, y_test = ray.get(generate_fake_x_y_data.remote(BATCH_SIZE, seed=NUM_BATCHES))# Create actors to store the networks.remote_network = ray.remote(Network)actor_list = [remote_network.remote(x_ids[i], y_ids[i]) for i in range(NUM_BATCHES)]local_network = Network(x_test, y_test)# Get initial weights of local network.weights = local_network.get_weights()# Do some steps of training.for iteration in range(NUM_ITERS):    # Put the weights in the object store. This is optional. We could instead pass    # the variable weights directly into step.remote, in which case it would be    # placed in the object store under the hood. However, in that case multiple    # copies of the weights would be put in the object store, so this approach is    # more efficient.    weights_id = ray.put(weights)    # Call the remote function multiple times in parallel.    gradients_ids = [actor.step.remote(weights_id) for actor in actor_list]    # Get all of the weights.    gradients_list = ray.get(gradients_ids)    # Take the mean of the different gradients. Each element of gradients_list is a list    # of gradients, and we want to take the mean of each one.    mean_grads = [sum([gradients[i] for gradients in gradients_list]) / len(gradients_list) for i in range(len(gradients_list[0]))]    feed_dict = {grad[0]: mean_grad for (grad, mean_grad) in zip(local_network.grads, mean_grads)}    local_network.sess.run(local_network.train, feed_dict=feed_dict)    weights = local_network.get_weights()    # Print the current weights. They should converge to roughly to the values 0.1    # and 0.3 used in generate_fake_x_y_data.    if iteration % 20 == 0:        print("Iteration {}: weights are {}".format(iteration, weights))

回答：

GPU 被 ray.remote 装饰器本身切断。从其源代码来看：

def remote(*args, **kwargs):    ...    num_cpus = kwargs["num_cpus"] if "num_cpus" in kwargs else 1    num_gpus = kwargs["num_gpus"] if "num_gpus" in kwargs else 0  # !!!    ...

所以以下调用实际上设置了 num_gpus=0：

remote_network = ray.remote(Network)

Ray API 有点奇怪，你不能简单地说 ray.remote(Network, num_gpus=2)（尽管这正是你想要的）。这是我所做的，在我的机器上似乎有效：

ray.init(num_gpus=2)...@ray.remote(num_gpus=2)class RemoteNetwork(Network):    passactor_list = [RemoteNetwork.remote(x_ids[i],y_ids[i]) for i in range(NUM_BATCHES)]

学技术

Tensorflow 在被 Ray 工作进程调用时无法检测到 GPU

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复