我正在尝试进行迁移学习,以便使用Coursera深度学习专业课程中的预训练YOLO模型。YOLO模型用于图像检测和识别:因此我想在这个模型中添加一些额外的层,以便识别检测对象的性别。
所以,我有m张图像,并试图将它们通过现有的YOLO模型传递,以便获取输出,并将这些输出作为新添加层的训练集使用。这就是问题发生的地方:当我尝试在一行代码中传递m个示例时,我得到了一个错误…
我将详细说明所做的所有步骤以及得到的输出:
导入库
import argparseimport osimport matplotlib.pyplot as pltfrom matplotlib.pyplot import imshowimport scipy.ioimport scipy.miscimport numpy as npimport pandas as pdimport PILimport tensorflow as tffrom keras import backend as Kfrom keras.preprocessing import imagefrom keras.layers import Input, Lambda, Conv2Dfrom keras.models import load_model, Modelfrom yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxesfrom yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_bodyfrom keras.models import Sequentialfrom scipy.misc import imreadget_ipython().magic('matplotlib inline')import matplotlib.pyplot as pltimport numpy as npimport kerasfrom keras.layers import Denseimport pandas as pd%matplotlib inline
导入数据集:
2155张图像,形状为(608,608,3)
train=pd.read_csv("datset.csv",sep=';')train_img=[] for i in range(len(train)): (img, train_img_data)=preprocess_image('path_dataset'+train['ImageURL'][i],model_image_size = (608, 608)) train_img.append(train_img_data)train_img= np.array(train_img)train_img=train_img.reshape(2155,608,608,3)
验证数据集维度
print('train_img的形状: ',train_img.shape)print("train_img中第一个元素的形状: ",train_img[0].shape)print("重塑train_img中的第一个元素: ",train_img[0].reshape(1,608,608,3).shape)
数据集维度的输出
train_img的形状: (2155, 608, 608, 3)train_img中第一个元素的形状: (608, 608, 3)重塑train_img中的第一个元素: (1, 608, 608, 3)
导入YOLO模型
yolo_model = load_model("model_data/yolo.h5")
将train_img输入YOLO模型以获取输出,这些输出将用作添加层的训练集。
sess = K.get_session()output=sess.run([yolo_model.output], feed_dict={yolo_model.input: train_img , K.learning_phase(): 0})
我得到的错误:
---------------------------------------------------------------------------ResourceExhaustedError Traceback (most recent call last)C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args) 1360 try:-> 1361 return fn(*args) 1362 except errors.OpError as e:C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata) 1339 return tf_session.TF_Run(session, options, feed_dict, fetch_list,-> 1340 target_list, status, run_metadata) 1341 C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg) 515 compat.as_text(c_api.TF_Message(self.status.status)),--> 516 c_api.TF_GetCode(self.status.status)) 517 # Delete the underlying status object from memory otherwise it stays aliveResourceExhaustedError: OOM when allocating tensor with shape[2155,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[Node: conv2d_1/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_1_0_1, conv2d_1/kernel/read)]]Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.During handling of the above exception, another exception occurred:ResourceExhaustedError Traceback (most recent call last)<ipython-input-14-067537a70066> in <module>() 1 sess = K.get_session()----> 2 output=sess.run([yolo_model.output], feed_dict={yolo_model.input: train_img , K.learning_phase(): 0})C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata) 903 try: 904 result = self._run(None, fetches, feed_dict, options_ptr,--> 905 run_metadata_ptr) 906 if run_metadata: 907 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 1135 if final_fetches or final_targets or (handle and feed_dict_tensor): 1136 results = self._do_run(handle, final_targets, final_fetches,-> 1137 feed_dict_tensor, options, run_metadata) 1138 else: 1139 results = []C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata) 1353 if handle is None: 1354 return self._do_call(_run_fn, self._session, feeds, fetches, targets,-> 1355 options, run_metadata) 1356 else: 1357 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args) 1372 except KeyError: 1373 pass-> 1374 raise type(e)(node_def, op, message) 1375 1376 def _extend_graph(self):ResourceExhaustedError: OOM when allocating tensor with shape[2155,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[Node: conv2d_1/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_1_0_1, conv2d_1/kernel/read)]]Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.Caused by op 'conv2d_1/convolution', defined at: File "C:\ProgramData\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py", line 16, in <module> app.launch_new_instance() File "C:\ProgramData\Anaconda3\lib\site-packages\traitlets\config\application.py", line 658, in launch_instance app.start() File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelapp.py", line 478, in start self.io_loop.start() File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\ioloop.py", line 177, in start super(ZMQIOLoop, self).start() File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\ioloop.py", line 888, in start handler_func(fd_obj, events) File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 440, in _handle_events self._handle_recv() File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 472, in _handle_recv self._run_callback(callback, msg) File "C:\ProgramData\Anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 414, in _run_callback callback(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\tornado\stack_context.py", line 277, in null_wrapper return fn(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 283, in dispatcher return self.dispatch_shell(stream, msg) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 233, in dispatch_shell handler(stream, idents, msg) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\kernelbase.py", line 399, in execute_request user_expressions, allow_stdin) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\ipkernel.py", line 208, in do_execute res = shell.run_cell(code, store_history=store_history, silent=silent) File "C:\ProgramData\Anaconda3\lib\site-packages\ipykernel\zmqshell.py", line 537, in run_cell return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2728, in run_cell interactivity=interactivity, compiler=compiler, result=result) File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2850, in run_ast_nodes if self.run_code(code, result): File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-11-c868ea7b7486>", line 7, in <module> yolo_model = load_model("model_data/yolo.h5") File "C:\ProgramData\Anaconda3\lib\site-packages\keras\models.py", line 243, in load_model model = model_from_config(model_config, custom_objects=custom_objects) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\models.py", line 317, in model_from_config return layer_module.deserialize(config, custom_objects=custom_objects) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize printable_module_name='layer') File "C:\ProgramData\Anaconda3\lib\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object list(custom_objects.items()))) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2524, in from_config process_node(layer, node_data) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2481, in process_node layer(input_tensors[0], **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\topology.py", line 619, in __call__ output = self.call(inputs, **kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\convolutional.py", line 168, in call dilation_rate=self.dilation_rate) File "C:\ProgramData\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 3335, in conv2d data_format=tf_data_format) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 781, in convolution return op(input, filter) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 869, in __call__ return self.conv_op(inp, filter) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 521, in __call__ return self.call(inp, filter) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 205, in __call__ name=self.name) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 717, in conv2d data_format=data_format, dilations=dilations, name=name) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op op_def=op_def) File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-accessResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2155,608,608,32] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu [[Node: conv2d_1/convolution = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_1_0_1, conv2d_1/kernel/read)]]Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
请注意
输入的占位符大小为(None,608,608,3),所以如果我发送一个大小为(2155,608,608,3)的数据集,应该没有问题(这是我无法理解的地方)。此外,如果我向网络输入一个大小为(1,608,608,3)的示例,我没有得到错误!我可以遍历数据集中的所有元素,每次向网络输入2155次(每次输入(1,608,608,3)),但这很耗时且不是最佳方法。
顺便说一句,我以为占位符中的None是用来让我可以同时发送m个训练示例的。
根据输出,我真的无法理解错误是什么。我在等待你的帮助来解决这个问题。
回答:
错误告诉你这是资源耗尽错误。所以你的内存很可能是问题所在,因为你似乎不是在CPU上训练。一个包含1*608*608*3个元素的张量远小于一个包含2155*608*608*3个元素的张量,这就是你的解释。解决方案很简单:只需使用更小的batch_size进行训练。