不确定tensorflow-gpu是否实际使用GPU

我目前正在尝试使用Keras在tensorflow后端运行一个卷积神经网络，借助于Udemy上的深度学习课程。然而，它运行得非常慢，每个epoch大约需要1000秒，而讲师的机器大约只需要60秒（顺便说一下，他是在CPU上运行的）。

这个CNN是一个简单的图像识别网络，用于识别图像是猫还是狗。训练和测试数据总共包含10,000张图像，所有图像在我的SSD上占用了237 MB的空间。

当我在Python shell中运行CNN时，我得到了以下输出：

Epoch 1/252017-05-28 13:23:03.967337: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.967574: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968153: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968329: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.968576: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:04.505726: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:name: GeForce GTX 1070major: 6 minor: 1 memoryClockRate (GHz) 1.835pciBusID 0000:28:00.0Total memory: 8.00GiBFree memory: 6.68GiB2017-05-28 13:23:04.505944: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 02017-05-28 13:23:04.506637: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0:   Y2017-05-28 13:23:04.506895: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:28:00.0)2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427

这应该表明tensorflow正在使用GPU进行计算。然而，当我检查nvidia-smi时，我得到了以下输出：

 $ nvidia-smiSun May 28 13:25:46 2017+-----------------------------------------------------------------------------+| NVIDIA-SMI 376.53                 Driver Version: 376.53                    ||-------------------------------+----------------------+----------------------+| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce GTX 1070   WDDM  | 0000:28:00.0      On |                  N/A ||  0%   49C    P2    36W / 166W |   7240MiB /  8192MiB |      4%      Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID  Type  Process name                               Usage      ||=============================================================================||    0      7676  C+G   ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A      ||    0      8580  C+G   Insufficient Permissions                     N/A      ||    0      9704  C+G   ...x86)\Google\Chrome\Application\chrome.exe N/A      ||    0     10532    C   ...\Anaconda3\envs\tensorflow-gpu\python.exe N/A      ||    0     11384  C+G   Insufficient Permissions                     N/A      ||    0     12896  C+G   C:\Windows\explorer.exe                      N/A      ||    0     13868  C+G   Insufficient Permissions                     N/A      ||    0     14068  C+G   Insufficient Permissions                     N/A      ||    0     14568  C+G   Insufficient Permissions                     N/A      ||    0     15260  C+G   ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A      ||    0     16912  C+G   ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A      ||    0     18196  C+G   ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A      ||    0     18228  C+G   ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A      ||    0     20032  C+G   ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |+-----------------------------------------------------------------------------+

请注意，每个进程都显示为同时使用CPU和GPU（Type C+G），而tensorflow进程是唯一仅使用CPU的进程（Type C）。

对此有任何合理的解释吗？我已经尝试了一整周来解决这个问题，但毫无进展。

我使用的是Windows 10 Pro系统，配备了Asus的Nvidia GTX 1070显卡，24GB内存和Intel Xeon X5670 CPU @2.93GHz。我使用以下命令创建了我的Anaconda环境：

conda create -n tensorflow-gpu python=3.5 anacondasource activate tensorflow-gpuconda install theano conda install mingw libpython pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whlpip install kerasconda update --all

我还安装了CUDA Toolkit和CUDNN，并将它们的相应文件夹添加到了我的%PATH%中

任何帮助都将不胜感激。

[EDIT]

如果代码有任何问题，以下是代码：

# 导入Keras库和包from keras.models import Sequentialfrom keras.layers import Conv2Dfrom keras.layers import MaxPooling2Dfrom keras.layers import Flattenfrom keras.layers import Dense# 定义CNNclassifier = Sequential()# 卷积层1classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 卷积层2classifier.add(Conv2D(32, (3, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 展平 + MLPclassifier.add(Flatten())classifier.add(Dense(units = 128, activation = 'relu'))classifier.add(Dense(units = 1, activation = 'sigmoid'))classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])# 拟合CNN到图像from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale = 1./255,                                   shear_range = 0.2,                                   zoom_range = 0.2,                                   horizontal_flip = True)test_datagen = ImageDataGenerator(rescale = 1./255)training_set = train_datagen.flow_from_directory('dataset/training_set',                                                 target_size = (64, 64),                                                 batch_size = 32,                                                 class_mode = 'binary')test_set = test_datagen.flow_from_directory('dataset/test_set',                                            target_size = (64, 64),                                            batch_size = 32,                                            class_mode = 'binary')classifier.fit_generator(training_set,                         steps_per_epoch = 8000,                         epochs = 25,                         validation_data = test_set,                         validation_steps = 2000)

回答：

这与你的机器无关，我在Udemy上的帖子中讨论了这个问题。每个人似乎都有同样的问题，并且想知道为什么讲师的机器上只需要20分钟。答案很简单：讲师发布的源代码与他在视频中展示的不同！

查看steps_per_epoch的文档

steps_per_epoch: 从生成器中生成的样本批次总数，用于声明一个epoch结束并开始下一个epoch。通常应等于数据集中的唯一样本数除以批次大小。

目前，对于单个epoch，你处理了8000 * 32 = 256000张图像。这就是你在每个epoch中处理的样本数量。如果考虑到你的数据集只有10000张（通过增强后为20000张），这完全没有意义。

如果你查看视频，你会发现讲师使用的是samples_per_epoch，这意味着处理的数据量少了32倍。问题解决了。

学技术

不确定tensorflow-gpu是否实际使用GPU

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复