为什么我在批次大小为1时仍然遇到内存分配错误？

我（仍然）在尝试使用Tensorflow 2.0后端的Keras实现一个简单的Unet网络。

我的模板和掩码是1536×1536的RGB图像（掩码是黑白的）。根据这篇文章，可以测量所需内存的数量。

我的模型在处理张量[1,16,1536,1536]时因内存分配错误而崩溃。使用上述文章中给出的公式，我计算了该张量所需的内存量：1 * 16 * 1536 * 1536 * 4 = 144 Mbytes。我有一块GTX 1080 Ti，约有9 Gbytes的内存可供Tensorflow使用。哪里出了问题？我是否遗漏了什么？

这是一个几乎完整的追溯信息：

2020-03-02 15:59:13.841967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll2020-03-02 15:59:16.083234: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX22020-03-02 15:59:16.087240: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll2020-03-02 15:59:16.210856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607pciBusID: 0000:41:00.02020-03-02 15:59:16.210988: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.2020-03-02 15:59:16.211429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 02020-03-02 15:59:16.947775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:2020-03-02 15:59:16.947868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 2020-03-02 15:59:16.947922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 2020-03-02 15:59:16.948594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)2020-03-02 15:59:16.994676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607pciBusID: 0000:41:00.02020-03-02 15:59:16.994849: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.2020-03-02 15:59:16.995291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 02020-03-02 15:59:16.995793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607pciBusID: 0000:41:00.02020-03-02 15:59:16.995908: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.2020-03-02 15:59:16.996301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 02020-03-02 15:59:16.996406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:2020-03-02 15:59:16.996491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 2020-03-02 15:59:16.996541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 2020-03-02 15:59:16.996942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)2020-03-02 15:59:18.191834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607pciBusID: 0000:41:00.02020-03-02 15:59:18.191964: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.2020-03-02 15:59:18.192383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 02020-03-02 15:59:18.192499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:2020-03-02 15:59:18.192591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 2020-03-02 15:59:18.192644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 2020-03-02 15:59:18.193053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)Epoch 1/1002020-03-02 15:59:18.421211: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll2020-03-02 15:59:19.577897: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 512.00M (536870912 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:19.616600: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 460.80M (483183872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:19.638395: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on WindowsRelying on driver to perform ptx compilation. This message will be only logged once.2020-03-02 15:59:19.644478: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:19.644601: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.2020-03-02 15:59:19.653644: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:19.653767: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 259.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.2020-03-02 15:59:19.865828: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:19.874844: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:29.884662: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:29.893593: I tensorflow/stream_executor/cuda/cuda_driver.cc:830] failed to allocate 1.00G (1073741824 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory2020-03-02 15:59:29.893792: W tensorflow/core/common_runtime/bfc_allocator.cc:419] Allocator (GPU_0_bfc) ran out of memory trying to allocate 144.00MiB (rounded to 150994944).  Current allocation summary follows.2020-03-02 15:59:29.919126: I tensorflow/core/common_runtime/bfc_allocator.cc:923] total_region_allocated_bytes_: 1054574080 memory_limit_: 9210949796 available bytes: 8156375716 curr_region_allocation_bytes_: 10737418242020-03-02 15:59:29.919304: I tensorflow/core/common_runtime/bfc_allocator.cc:929] Stats: Limit:                  9210949796InUse:                  1010432000MaxInUse:               1010432000NumAllocs:                     594MaxAllocSize:            2838707202020-03-02 15:59:29.919520: W tensorflow/core/common_runtime/bfc_allocator.cc:424] *****__****************xxxxxxxxxx***************xxxxxxxxxx******************************xxxxxxxxxxxx2020-03-02 15:59:29.919696: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at conv_ops.cc:947 : Resource exhausted: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfcTraceback (most recent call last):  File "E:/Explorium/python/unet_trainer.py", line 82, in <module>    results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1297, in fit_generator    steps_name='steps_per_epoch')  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 265, in model_iteration    batch_outs = batch_function(*batch_data)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 973, in train_on_batch    class_weight=class_weight, reset_metrics=reset_metrics)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 264, in train_on_batch    output_loss_metrics=model._output_loss_metrics)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 311, in train_on_batch    output_loss_metrics=output_loss_metrics))  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 252, in _process_single_batch    training=training))  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 127, in _model_loss    outs = model(inputs, **kwargs)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__    outputs = self.call(cast_inputs, *args, **kwargs)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call    convert_kwargs_to_constants=base_layer_utils.call_context().saving)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph    output_tensors = layer(computed_tensors, **kwargs)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__    outputs = self.call(cast_inputs, *args, **kwargs)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call    outputs = self._convolution_op(inputs, self.kernel)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__    return self.conv_op(inp, filter)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__    return self.call(inp, filter)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__    name=self.name)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d    name=name)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d    data_format=data_format, dilations=dilations, name=name, ctx=_ctx)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback    ctx=_ctx, name=name)  File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute    six.raise_from(core._status_to_exception(e.code, message), None)  File "<string>", line 3, in raise_fromtensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,16,1536,1536] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Conv2D]Process finished with exit code 1

这是我的模型：

...

回答：

你遇到的问题在于图像的尺寸。

问题不是模型的尺寸，如评论中其他人所述，而是你的图像的输入尺寸，需要更多的GPU内存来处理。

在你的情况下，解决方案是将图像的尺寸缩小一半。你需要以相同的因子同时缩小宽度和高度，以保持纵横比，从而允许网络在较小的图像上学习，而不会丢失太多信息和引入失真。

你将能够在768×768的尺寸上使用批次大小为1在你的GTX 1080上进行训练（我有一块GTX 1080Ti，我测试了几种分割网络和几种输入尺寸）。如果由于其他进程（如YouTube或类似的）占用了你的GPU内存，那么将其缩小到512×512肯定会有效（即使在批次大小为1的情况下，768×768也应该有效）。

学技术

为什么我在批次大小为1时仍然遇到内存分配错误？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复