### Keras, memoryerror – data = data.astype(“float”) / 255.0. 无法为形状为(13165, 32, 32, 3)的数组分配309 MiB的内存

我目前正在处理Smiles数据集,并尝试使用深度学习来检测笑容是正面还是负面。我使用的机器是Raspberry Pi 3,运行的Python版本是3.7(不是2.7)。

我的训练集中总共有13165张图片。我希望将这些图片存储在一个数组中。然而,我遇到了一个问题,就是无法为形状为(13165, 32, 32, 3)的数组分配内存。

以下是源代码(shallownet_smile.py):

from sklearn.preprocessing import LabelBinarizerfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportfrom pyimagesearch.preprocessing import ImageToArrayPreprocessorfrom pyimagesearch.preprocessing import SimplePreprocessorfrom pyimagesearch.datasets import SimpleDatasetLoaderfrom pyimagesearch.nn.conv.shallownet import ShallowNetfrom keras.optimizers import SGDfrom imutils import pathsimport matplotlib.pyplot as pltimport numpy as npimport argparseap = argparse.ArgumentParser()ap.add_argument("-d", "--dataset", required=True, help="path to input dataset")args = vars(ap.parse_args())# grab the list of images we'll be describingprint("[INFO] loading images...")imagePaths = list(paths.list_images(args["dataset"]))sp = SimplePreprocessor(32, 32)iap = ImageToArrayPreprocessor()sdl = SimpleDatasetLoader(preprocessors=[sp, iap])(data, labels) = sdl.load(imagePaths, verbose=1)# convert values to between 0-1data = data.astype("float") / 255.0# partition our data into training and test sets(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25,    random_state=42)# convert the labels from integers to vectorstrainY = LabelBinarizer().fit_transform(trainY)testY = LabelBinarizer().fit_transform(testY)# initialize the optimizer and modelprint("[INFO] compiling model...")# initialize stochastic gradient descent with learning rate of 0.005opt = SGD(lr=0.005)model = ShallowNet.build(width=32, height=32, depth=3, classes=2)model.compile(loss="categorical_crossentropy", optimizer=opt,    metrics=["accuracy"])# train the networkprint("[INFO] training network...")H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=32,    epochs=100, verbose=1)print("[INFO] evaluating network...")predictions = model.predict(testX, batch_size=32)print(classification_report(    testY.argmax(axis=1),    predictions.argmax(axis=1),    target_names=["positive", "negative"]))plt.style.use("ggplot")plt.figure()plt.plot(np.arange(0, 100), H.history["loss"], label="train_loss")plt.plot(np.arange(0, 100), H.history["val_loss"], label="val_loss")plt.plot(np.arange(0, 100), H.history["acc"], label="train_acc")plt.plot(np.arange(0, 100), H.history["val_acc"], label="val_acc")plt.title("Training Loss and Accuracy")plt.xlabel("Epoch #")plt.ylabel("Loss/Accuracy")plt.legend()plt.show()

假设数据集在我当前目录下。以下是我得到的错误信息:

python3 shallownet_smile.py -d=datasets/Smiles

错误信息

我仍然感到困惑,不知道哪里出了问题。我非常希望有经验的深度学习/机器学习专家能够解释并澄清我做错了什么。

感谢您的帮助和关注。


回答:

首先,您的系统内存非常低,因此请尝试使用较小的图片。

错误主要源自这一行代码 data = data.astype("float") / 255.0

原因是数据已经是一个uint8类型的numpy数组,而您又在创建一个float32类型的numpy数组,这会占用额外的内存。

我将修改一些部分的simpledataloader,以便您可以进行训练。

转到 from pyimagesearch.datasets import SimpleDatasetLoader。它应该在文件夹pyimagesearch/datasets/simpledatasetloader.py中(示例代码:https://github.com/whydna/Deep-Learning-For-Computer-Vision/blob/master/pyimagesearch/datasets/simpledatasetloader.py

用我的代码更改此.py文件,并更改max_image的值(除非您能处理现有的内存,否则请减少它),还要删除这一行 data = data.astype("float") / 255.0,因为我从函数中发送了预处理后的数组。

# import the necessary packagesimport numpy as npimport cv2import osmax_image = 1000class SimpleDatasetLoader:    def __init__(self, preprocessors=None):        # store the image preprocessor        self.preprocessors = preprocessors        # if the preprocessors are None, initialize them as an        # empty list        if self.preprocessors is None:            self.preprocessors = []    def load(self, imagePaths, verbose=-1):        # initialize the list of features and labels        data = []        labels = []        cnt = 0        # loop over the input images        for (i, imagePath) in enumerate(imagePaths):            if cnt >= max_image:                break            # load the image and extract the class label assuming            # that our path has the following format:            # /path/to/dataset/{class}/{image}.jpg            image = cv2.imread(imagePath)            label = imagePath.split(os.path.sep)[-2]            # check to see if our preprocessors are not None            if self.preprocessors is not None:                # loop over the preprocessors and apply each to                # the image                for p in self.preprocessors:                    image = p.preprocess(image)            # treat our processed image as a "feature vector"            # by updating the data list followed by the labels            data.append(image)            labels.append(label)            # show an update every `verbose` images            cnt += 1            if verbose > 0 and i > 0 and (i + 1) % verbose == 0:                print("[INFO] processed {}/{}".format(i + 1,                    len(imagePaths)))        # return a tuple of the data and labels        return (np.array(data, dtype='float32')/255., np.array(labels))

如果您仍然遇到内存问题,请在此处减少batch_size

H = model.fit(trainX, trainY, validation_data=(testX, testY), batch_size=4,    epochs=100, verbose=1)

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注