我对机器学习和Python都比较新手!我希望我的代码能够预测对象,主要是汽车。当我启动脚本时,它运行得很顺畅,但处理了大约20张图片后,系统就因为内存泄漏而挂掉了。我希望这个脚本能够运行在我的整个数据库上,而我的数据库远不止20张图片。
我尝试使用pympler tracker来跟踪哪些对象占用了最多的内存 –
这是我尝试运行的预测图片中对象的代码:
from imageai.Prediction import ImagePredictionimport osimport urllib.requestimport mysql.connectorfrom pympler.tracker import SummaryTrackertracker = SummaryTracker()mydb = mysql.connector.connect( host="localhost", user="phpmyadmin", passwd="anshu", database="python_test")counter = 0mycursor = mydb.cursor()sql = "SELECT id, image_url FROM `used_cars` " \ "WHERE is_processed = '0' AND image_url IS NOT NULL LIMIT 1"mycursor.execute(sql)result = mycursor.fetchall()def dl_img(url, filepath, filename): fullpath = filepath + filename urllib.request.urlretrieve(url,fullpath)for eachfile in result: id = eachfile[0] print(id) filename = "image.jpg" url = eachfile[1] filepath = "/home/priyanshu/PycharmProjects/untitled/images/" print(filename) print(url) print(filepath) dl_img(url, filepath, filename) execution_path = "/home/priyanshu/PycharmProjects/untitled/images/" prediction = ImagePrediction() prediction.setModelTypeAsResNet() prediction.setModelPath( os.path.join(execution_path, "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h 5")) prediction.loadModel() predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1) for eachPrediction, eachProbability in zip(predictions, probabilities): per = 0.00 label = "" print(eachPrediction, " : ", eachProbability) label = eachPrediction per = eachProbability print("Label: " + label) print("Per:" + str(per)) counter = counter + 1 print("Picture Number: " + str(counter)) sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \ "VALUE ('%s', '%s', '%s') " % (id, label, per) print("done") mycursor.execute(sql1) mycursor.execute(sql2) mydb.commit() tracker.print_diff()
这是我从单张图片中得到的结果,经过几次迭代后,它会消耗掉所有的RAM。我应该做哪些更改来阻止内存泄漏?
seat_belt : 12.617655098438263Label: seat_beltPer:12.617655098438263Picture Number: 1donetypes | objects | total size<class 'tuple | 130920 | 11.98 MB<class 'dict | 24002 | 6.82 MB<class 'list | 56597 | 5.75 MB<class 'int | 175920 | 4.70 MB<class 'str | 26047 | 1.92 MB<class 'set | 740 | 464.38 KB<class 'tensorflow.python.framework.ops.Tensor | 6515 | 356.29 KB<class 'tensorflow.python.framework.ops.Operation._InputList | 6097 | 333.43 KB<class 'tensorflow.python.framework.ops.Operation | 6097 | 333.43 KB<class 'SwigPyObject | 6098 | 285.84 KB<class 'tensorflow.python.pywrap_tensorflow_internal.TF_Output | 4656 | 254.62 KB<class 'tensorflow.python.framework.traceable_stack.TraceableObject | 3309 | 180.96 KB<class 'tensorflow.python.framework.tensor_shape.Dimension | 1767 | 96.63 KB<class 'tensorflow.python.framework.tensor_shape.TensorShapeV1 | 1298 | 70.98 KB<class 'weakref | 807 | 63.05 KB
回答:
在这种情况下,模型在每次处理图片的for循环中都会加载。应该将模型放在for循环之外,这样模型就不会每次都重新启动,也不会占用程序当前所占用的内存。代码应该这样工作 ->
execution_path = "/home/priyanshu/PycharmProjects/untitled/images/"prediction = ImagePrediction()prediction.setModelTypeAsResNet()prediction.setModelPath( os.path.join(execution_path, "/home/priyanshu/Downloads/resnet50_weights_tf_dim_ordering_tf_kernels.h 5"))prediction.loadModel()for eachfile in result: id = eachfile[0] print(id) filename = "image.jpg"url = eachfile[1]filepath = "/home/priyanshu/PycharmProjects/untitled/images/"print(filename)print(url)print(filepath)dl_img(url, filepath, filename)predictions, probabilities = prediction.predictImage(os.path.join(execution_path, "image.jpg"), result_count=1)for eachPrediction, eachProbability in zip(predictions, probabilities): per = 0.00 label = "" print(eachPrediction, " : ", eachProbability) label = eachPrediction per = eachProbability print("Label: " + label) print("Per:" + str(per)) counter = counter + 1 print("Picture Number: " + str(counter)) sql1 = "UPDATE used_cars SET is_processed = '1' WHERE id = '%s'" % id sql2 = "INSERT into label (used_car_image_id, object_label, percentage) " \ "VALUE ('%s', '%s', '%s') " % (id, label, per) print("done") mycursor.execute(sql1) mycursor.execute(sql2) mydb.commit() tracker.print_diff()