我目前正在尝试使用Keras在tensorflow后端运行一个卷积神经网络,借助于Udemy上的深度学习课程。然而,它运行得非常慢,每个epoch大约需要1000秒,而讲师的机器大约只需要60秒(顺便说一下,他是在CPU上运行的)。
这个CNN是一个简单的图像识别网络,用于识别图像是猫还是狗。训练和测试数据总共包含10,000张图像,所有图像在我的SSD上占用了237 MB的空间。
当我在Python shell中运行CNN时,我得到了以下输出:
Epoch 1/252017-05-28 13:23:03.967337: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.967574: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968153: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on yourmachine and could speed up CPU computations.2017-05-28 13:23:03.968329: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:03.968576: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.2017-05-28 13:23:04.505726: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:name: GeForce GTX 1070major: 6 minor: 1 memoryClockRate (GHz) 1.835pciBusID 0000:28:00.0Total memory: 8.00GiBFree memory: 6.68GiB2017-05-28 13:23:04.505944: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 02017-05-28 13:23:04.506637: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0: Y2017-05-28 13:23:04.506895: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:28:00.0)2684/8000 [=========>....................] - ETA: 845s - loss: 0.5011 - acc: 0.7427
这应该表明tensorflow正在使用GPU进行计算。然而,当我检查nvidia-smi
时,我得到了以下输出:
$ nvidia-smiSun May 28 13:25:46 2017+-----------------------------------------------------------------------------+| NVIDIA-SMI 376.53 Driver Version: 376.53 ||-------------------------------+----------------------+----------------------+| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 GeForce GTX 1070 WDDM | 0000:28:00.0 On | N/A || 0% 49C P2 36W / 166W | 7240MiB / 8192MiB | 4% Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| 0 7676 C+G ...ost_cw5n1h2txyewy\ShellExperienceHost.exe N/A || 0 8580 C+G Insufficient Permissions N/A || 0 9704 C+G ...x86)\Google\Chrome\Application\chrome.exe N/A || 0 10532 C ...\Anaconda3\envs\tensorflow-gpu\python.exe N/A || 0 11384 C+G Insufficient Permissions N/A || 0 12896 C+G C:\Windows\explorer.exe N/A || 0 13868 C+G Insufficient Permissions N/A || 0 14068 C+G Insufficient Permissions N/A || 0 14568 C+G Insufficient Permissions N/A || 0 15260 C+G ...osoftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A || 0 16912 C+G ...am Files (x86)\Dropbox\Client\Dropbox.exe N/A || 0 18196 C+G ...I\AppData\Local\hyper\app-1.3.3\Hyper.exe N/A || 0 18228 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdgeCP.exe N/A || 0 20032 C+G ...indows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |+-----------------------------------------------------------------------------+
请注意,每个进程都显示为同时使用CPU和GPU(Type C+G
),而tensorflow进程是唯一仅使用CPU的进程(Type C
)。
对此有任何合理的解释吗?我已经尝试了一整周来解决这个问题,但毫无进展。
我使用的是Windows 10 Pro系统,配备了Asus的Nvidia GTX 1070显卡,24GB内存和Intel Xeon X5670 CPU @2.93GHz。我使用以下命令创建了我的Anaconda环境:
conda create -n tensorflow-gpu python=3.5 anacondasource activate tensorflow-gpuconda install theano conda install mingw libpython pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whlpip install kerasconda update --all
我还安装了CUDA Toolkit和CUDNN,并将它们的相应文件夹添加到了我的%PATH%
中
任何帮助都将不胜感激。
[EDIT]
如果代码有任何问题,以下是代码:
# 导入Keras库和包from keras.models import Sequentialfrom keras.layers import Conv2Dfrom keras.layers import MaxPooling2Dfrom keras.layers import Flattenfrom keras.layers import Dense# 定义CNNclassifier = Sequential()# 卷积层1classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 卷积层2classifier.add(Conv2D(32, (3, 3), activation = 'relu'))classifier.add(MaxPooling2D(pool_size = (2, 2)))# 展平 + MLPclassifier.add(Flatten())classifier.add(Dense(units = 128, activation = 'relu'))classifier.add(Dense(units = 1, activation = 'sigmoid'))classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])# 拟合CNN到图像from keras.preprocessing.image import ImageDataGeneratortrain_datagen = ImageDataGenerator(rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True)test_datagen = ImageDataGenerator(rescale = 1./255)training_set = train_datagen.flow_from_directory('dataset/training_set', target_size = (64, 64), batch_size = 32, class_mode = 'binary')test_set = test_datagen.flow_from_directory('dataset/test_set', target_size = (64, 64), batch_size = 32, class_mode = 'binary')classifier.fit_generator(training_set, steps_per_epoch = 8000, epochs = 25, validation_data = test_set, validation_steps = 2000)
回答:
这与你的机器无关,我在Udemy上的帖子中讨论了这个问题。每个人似乎都有同样的问题,并且想知道为什么讲师的机器上只需要20分钟。答案很简单:讲师发布的源代码与他在视频中展示的不同!
查看steps_per_epoch
的文档
steps_per_epoch: 从生成器中生成的样本批次总数,用于声明一个epoch结束并开始下一个epoch。通常应等于数据集中的唯一样本数除以批次大小。
目前,对于单个epoch,你处理了8000 * 32 = 256000张图像。这就是你在每个epoch中处理的样本数量。如果考虑到你的数据集只有10000张(通过增强后为20000张),这完全没有意义。
如果你查看视频,你会发现讲师使用的是samples_per_epoch
,这意味着处理的数据量少了32倍。问题解决了。