我在尝试为我的Caffe机器学习项目创建一个LMDB数据库。但是在第一次尝试插入数据点时,LMDB就抛出了一个错误,称环境映射大小已满。
下面是尝试填充数据库的代码:
import numpy as npfrom PIL import Imageimport osimport lmdbimport random# my data structure for holding image/label pairsfrom serialization import DataPointclass LoadImages(object): def __init__(self, image_data_path): self.image_data_path = image_data_path self.dirlist = os.listdir(image_data_path) # find the number of images that are to be read from disk # in this case there are 370 images. num = len(self.dirlist) # shuffle the list of image files so that they are read in a random order random.shuffle(self.dirlist) map_size = num*10 j=0 # load images from disk for image_filename in os.listdir(image_data_path): # check that every image belongs to either category _D_ or _P_ assert (image_filename[:3] == '_D_' or image_filename[:3] == '_P_'), "ERROR: unknown category" # set up the LMDB datbase object env = lmdb.open('image_lmdb', map_size=map_size) with env.begin(write=True) as txn: # iterate over (shuffled) list of image files for image_filename in self.dirlist: print "Loading " + str(j) + "th image from disk - percentage complete: " + str((float(j)/num) * 100) + " %" # open the image with open(str(image_data_path + "/" + image_filename), 'rb') as f: image = Image.open(f) npimage = np.asarray(image, dtype=np.float64) # discard alpha channel, if necessary if npimage.shape[2] == 4: npimage = npimage[:,:,:3] print image_filename + " had its alpha channel removed." # get category if image_filename[:3] == '_D_': category = 0 elif image_filename[:3] == '_P_': category = 1 # wrap image data and label into a serializable data structure datapoint = DataPoint(npimage, category) serialized_datapoint = datapoint.serialize() # a database key str_id = '{:08}'.format(j) # put the data point in the LMDB txn.put(str_id.encode('ascii'), serialized_datapoint) j+=1
我还创建了一个小的数据结构来保存图像和标签并对其进行序列化,上述代码中使用了这个结构:
import numpy as npclass DataPoint(object): def __init__(self, image=None, label=None, dtype=np.float64): self.image = image if self.image is not None: self.image = self.image.astype(dtype) self.label = label def serialize(self): image_string = self.image.tobytes() label_string = chr(self.label) datum_string = label_string + image_string return datum_string def deserialize(self, string): image_string = string[1:] label_string = string[:1] image = np.fromstring(image_string, dtype=np.float64) label = ord(label_string) return DataPoint(image, label)
以下是错误信息:
/usr/bin/python2.7 /home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.pyLoading 0th image from disk - percentage complete: 0.0 %Traceback (most recent call last): File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 69, in <module> g = LoadImages(path) File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 62, in __init__ txn.put(str_id.encode('ascii'), serialized_datapoint)lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached
回答:
map size 是整个数据库的最大大小,包括元数据 – 看起来您使用了预期记录的数量来设置它。
您需要增加这个数值