我是否错误地使用了LMDB?首次插入数据后就提示环境映射大小限制已达

我在尝试为我的Caffe机器学习项目创建一个LMDB数据库。但是在第一次尝试插入数据点时,LMDB就抛出了一个错误,称环境映射大小已满。

下面是尝试填充数据库的代码:

import numpy as npfrom PIL import Imageimport osimport lmdbimport random# my data structure for holding image/label pairsfrom serialization import DataPointclass LoadImages(object):    def __init__(self, image_data_path):        self.image_data_path = image_data_path        self.dirlist = os.listdir(image_data_path)        # find the number of images that are to be read from disk        # in this case there are 370 images.        num = len(self.dirlist)        # shuffle the list of image files so that they are read in a random order        random.shuffle(self.dirlist)        map_size = num*10        j=0        # load images from disk        for image_filename in os.listdir(image_data_path):            # check that every image belongs to either category _D_ or _P_            assert (image_filename[:3] == '_D_' or image_filename[:3] == '_P_'), "ERROR: unknown category"            # set up the LMDB datbase object            env = lmdb.open('image_lmdb', map_size=map_size)            with env.begin(write=True) as txn:                # iterate over (shuffled) list of image files                for image_filename in self.dirlist:                    print "Loading " + str(j) + "th image from disk - percentage complete:  " + str((float(j)/num) * 100) + " %"                    # open the image                    with open(str(image_data_path + "/" + image_filename), 'rb') as f:                        image = Image.open(f)                        npimage = np.asarray(image, dtype=np.float64)                    # discard alpha channel, if necessary                    if npimage.shape[2] == 4:                        npimage = npimage[:,:,:3]                        print image_filename + " had its alpha channel removed."                    # get category                    if image_filename[:3] == '_D_':                        category = 0                    elif image_filename[:3] == '_P_':                        category = 1                    # wrap image data and label into a serializable data structure                    datapoint = DataPoint(npimage, category)                    serialized_datapoint = datapoint.serialize()                    # a database key                    str_id = '{:08}'.format(j)                    # put the data point in the LMDB                    txn.put(str_id.encode('ascii'), serialized_datapoint)                j+=1

我还创建了一个小的数据结构来保存图像和标签并对其进行序列化,上述代码中使用了这个结构:

import numpy as npclass DataPoint(object):    def __init__(self, image=None, label=None, dtype=np.float64):        self.image = image        if self.image is not None:            self.image = self.image.astype(dtype)        self.label = label    def serialize(self):        image_string = self.image.tobytes()        label_string = chr(self.label)        datum_string = label_string + image_string        return datum_string    def deserialize(self, string):        image_string = string[1:]        label_string = string[:1]        image = np.fromstring(image_string, dtype=np.float64)        label = ord(label_string)        return DataPoint(image, label)

以下是错误信息:

/usr/bin/python2.7 /home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.pyLoading 0th image from disk - percentage complete:  0.0 %Traceback (most recent call last):  File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 69, in <module>    g = LoadImages(path)  File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 62, in __init__    txn.put(str_id.encode('ascii'), serialized_datapoint)lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

回答:

map size 是整个数据库的最大大小,包括元数据 – 看起来您使用了预期记录的数量来设置它。

您需要增加这个数值

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注