如何修复”ResourceExhaustedError: OOM when allocating tensor”

我想创建一个具有多个输入的模型。因此,我尝试构建这样一个模型。

# 定义两组输入inputA = Input(shape=(32,64,1))inputB = Input(shape=(32,1024)) # CNNx = layers.Conv2D(32, kernel_size = (3, 3), activation = 'relu')(inputA)x = layers.Conv2D(32, (3,3), activation='relu')(x)x = layers.MaxPooling2D(pool_size=(2,2))(x)x = layers.Dropout(0.2)(x)x = layers.Flatten()(x)x = layers.Dense(500, activation = 'relu')(x)x = layers.Dropout(0.5)(x)x = layers.Dense(500, activation='relu')(x)x = Model(inputs=inputA, outputs=x) # DNNy = layers.Flatten()(inputB)y = Dense(64, activation="relu")(y)y = Dense(250, activation="relu")(y)y = Dense(500, activation="relu")(y)y = Model(inputs=inputB, outputs=y) # 合并两个模型的输出combined = concatenate([x.output, y.output]) # 合并输出sz = Dense(300, activation="relu")(combined)z = Dense(100, activation="relu")(combined)z = Dense(1, activation="softmax")(combined)model = Model(inputs=[x.input, y.input], outputs=z)model.summary()opt = Adam(lr=1e-3, decay=1e-3 / 200)model.compile(loss = 'sparse_categorical_crossentropy', optimizer = opt,    metrics = ['accuracy'])

并且模型的概要如下:

但是,当我尝试训练这个模型时,

history = model.fit([trainimage, train_product_embd],train_label,    validation_data=([validimage,valid_product_embd],valid_label), epochs=10,     steps_per_epoch=100, validation_steps=10)

问题就出现了….:

 ResourceExhaustedError                    Traceback (most recent call last) <ipython-input-18-2b79f16d63c0> in <module>() ----> 1 history = model.fit([trainimage, train_product_embd],train_label, validation_data=([validimage,valid_product_embd],valid_label), epochs=10, steps_per_epoch=100, validation_steps=10) 4 frames /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)    1470         ret = tf_session.TF_SessionRunCallable(self._session._session,    1471       self._handle, args, -> 1472                                                run_metadata_ptr)    1473         if run_metadata:    1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)  ResourceExhaustedError: 2 root error(s) found.   (0) Resource exhausted: OOM when allocating tensor with shape[800000,32,30,62] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc     [[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.      [[metrics/acc/Mean_1/_185]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.    (1) Resource exhausted: OOM when allocating tensor with shape[800000,32,30,62] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc     [[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.  0 successful operations. 0 derived errors ignored.

感谢阅读并希望能帮助到我 🙂


回答:

OOM 是”内存不足”的缩写。你的GPU内存不足,无法为此张量分配内存。你可以尝试以下几种方法:

  • 减少DenseConv2D层中的过滤器数量
  • 使用较小的batch_size(或增加steps_per_epochvalidation_steps
  • 使用灰度图像(你可以使用tf.image.rgb_to_grayscale
  • 减少层的数量
  • 在卷积层后使用MaxPooling2D
  • 缩小图像尺寸(你可以使用tf.image.resize来实现)
  • 对于输入使用较小的float精度,即np.float32
  • 如果你使用预训练模型,冻结前几层(就像这样

关于这个错误还有更多有用的信息:

OOM when allocating tensor with shape[800000,32,30,62]

这是一个奇怪的形状。如果你在处理图像,通常应该有3个或1个通道。此外,看起来你一次性传递了整个数据集;你应该以批次的方式传递数据。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注