不确定tensorflow-gpu是否实际使用GPU

我目前正在尝试使用Keras在tensorflow后端运行一个卷积神经网络,借助于Udemy上的深度学习课程。然而,它运行得非常慢,每个epoch大约需要1000秒,而讲师的机器大约只需要60秒(顺便说一下,他是在CPU上运行的)。

这个CNN是一个简单的图像识别网络,用于识别图像是猫还是狗。训练和测试数据总共包含10,000张图像,所有图像在我的SSD上占用了237 MB的空间。

当我在Python shell中运行CNN时,我得到了以下输出:

Epoch 1/252017-05-28 13:23:03.967337: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.967574: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968153: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968329: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.968576: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:04.505726: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:name: GeForce GTX 1070major: 6 minor: 1 memoryClockRate (GHz) 1.835pciBusID 0000:28:00.0Total memory: 8.00GiBFree memory: 6.68GiB2017-05-28 13:23:04.505944: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 02017-05-28 13:23:04.506637: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0:   Y2017-05-28 13:23:04.506895: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:28:00.0)2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427

这应该表明tensorflow正在使用GPU进行计算。然而,当我检查nvidia-smi时,我得到了以下输出:

 $ nvidia-smiSun May 28 13:25:46 2017+-----------------------------------------------------------------------------+| NVIDIA-SMI 376.53                 Driver Version: 376.53                    ||-------------------------------+----------------------+----------------------+| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce GTX 1070   WDDM  | 0000:28:00.0      On |                  N/A ||  0%   49C    P2    36W / 166W |   7240MiB /  8192MiB |      4%      Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID  Type  Process name                               Usage      ||=============================================================================||    0      7676  C+G   ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A      ||    0      8580  C+G   Insufficient Permissions                     N/A      ||    0      9704  C+G   ...x86)\Google\Chrome\Application\chrome.exe N/A      ||    0     10532    C   ...\Anaconda3\envs\tensorflow-gpu\python.exe N/A      ||    0     11384  C+G   Insufficient Permissions                     N/A      ||    0     12896  C+G   C:\Windows\explorer.exe                      N/A      ||    0     13868  C+G   Insufficient Permissions                     N/A      ||    0     14068  C+G   Insufficient Permissions                     N/A      ||    0     14568  C+G   Insufficient Permissions                     N/A      ||    0     15260  C+G   ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A      ||    0     16912  C+G   ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A      ||    0     18196  C+G   ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A      ||    0     18228  C+G   ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A      ||    0     20032  C+G   ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |+-----------------------------------------------------------------------------+

请注意,每个进程都显示为同时使用CPU和GPU(Type C+G),而tensorflow进程是唯一仅使用CPU的进程(Type C)。

对此有任何合理的解释吗?我已经尝试了一整周来解决这个问题,但毫无进展。

我使用的是Windows 10 Pro系统,配备了Asus的Nvidia GTX 1070显卡,24GB内存和Intel Xeon X5670 CPU @2.93GHz。我使用以下命令创建了我的Anaconda环境:

conda create -n tensorflow-gpu python=3.5 anacondasource activate tensorflow-gpuconda install theano conda install mingw libpython pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whlpip install kerasconda update --all 

我还安装了CUDA Toolkit和CUDNN,并将它们的相应文件夹添加到了我的%PATH%

任何帮助都将不胜感激。

[EDIT]

如果代码有任何问题,以下是代码:

# 导入Keras库和包from keras.models import Sequentialfrom keras.layers import Conv2Dfrom keras.layers import MaxPooling2Dfrom keras.layers import Flattenfrom keras.layers import Dense# 定义CNNclassifier = Sequential()# 卷积层1classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 卷积层2classifier.add(Conv2D(32, (3, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 展平 + MLPclassifier.add(Flatten())classifier.add(Dense(units = 128, activation = 'relu'))classifier.add(Dense(units = 1, activation = 'sigmoid'))classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])# 拟合CNN到图像from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale = 1./255,                                   shear_range = 0.2,                                   zoom_range = 0.2,                                   horizontal_flip = True)test_datagen = ImageDataGenerator(rescale = 1./255)training_set = train_datagen.flow_from_directory('dataset/training_set',                                                 target_size = (64, 64),                                                 batch_size = 32,                                                 class_mode = 'binary')test_set = test_datagen.flow_from_directory('dataset/test_set',                                            target_size = (64, 64),                                            batch_size = 32,                                            class_mode = 'binary')classifier.fit_generator(training_set,                         steps_per_epoch = 8000,                         epochs = 25,                         validation_data = test_set,                         validation_steps = 2000)

回答:

这与你的机器无关,我在Udemy上的帖子中讨论了这个问题。每个人似乎都有同样的问题,并且想知道为什么讲师的机器上只需要20分钟。答案很简单:讲师发布的源代码与他在视频中展示的不同!

查看steps_per_epoch的文档

steps_per_epoch: 从生成器中生成的样本批次总数,用于声明一个epoch结束并开始下一个epoch。通常应等于数据集中的唯一样本数除以批次大小。

目前,对于单个epoch,你处理了8000 * 32 = 256000张图像。这就是你在每个epoch中处理的样本数量。如果考虑到你的数据集只有10000张(通过增强后为20000张),这完全没有意义。

如果你查看视频,你会发现讲师使用的是samples_per_epoch,这意味着处理的数据量少了32倍。问题解决了。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注