我正在实现Yolo v3用于多类对象检测
yolo是一种基于区域建议的算法,将最高置信度的区域建议视为yolo的预测,了解更多信息可以阅读这里
对于这个特定任务,我参考了这个murtuza教程,它从头开始指导了我
由于复杂的网络架构需要数小时的训练,我更喜欢使用迁移学习,即使用预训练的网络和权重(参数),你可以在这里找到这两个链接
架构配置:cfg
网络参数(权重):weights
我在这里使用了yolov3 tiny,因为我需要更高的帧率来处理视频,但结果并不像教程中展示的那样有前景,我不知道自己哪里出了问题,即使将网络cfg和权重文件更改为原始的yolov3(320),也无法得到正确的结果,我得到了所有5个空间数据作为坐标和置信度[cx,cy,h,w,confidence],但所有80个类别的概率仍然是零向量[0.0,0.0,0.0—0.0],即使更换视频源并选择另一个视频,结果仍然是零向量,而在教程中这些是正常工作的
实现代码:
# YOLO Algorithm# Network Weights and configuration Files yolov3_tiny_cfg='/root/Downloads/ML TASK/yolov3-tiny.cfg' # configuration fileyolov3_tiny_weights='/root/Downloads/ML TASK/yolov3-tiny.weights' # weightscoco_names='/root/Downloads/ML TASK/coco.names' # coco classes# for yolo genral 320 architecture# put paths to directoryyolov3_cfg='/root/Downloads/ML TASK/yolov3.cfg'yolov3_weights='/root/Downloads/ML TASK/yolov3.weights'# Test VideosTest_video_1='/root/Downloads/ML TASK/mn.mp4'Test_video_2='/root/Downloads/ML TASK/bg.mp4'# Dependenciesimport cv2import numpy as np# Dataset Classes:# there are around 80 classes in the coco dataset so manually writing them would not be right choice so instead of them we are getting them from a file name coco.names stored in drive# getting list of classesclasses=[] # empty list intializationwith open(coco_names,'r')as f: classes=f.read().splitlines()# viewing the multiclass list around 80 classes in coco dataset# Loading the yolov3 using configuration file and weightsnetwork=cv2.dnn.readNetFromDarknet(yolov3_cfg,yolov3_weights)network.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)# to use opencv CPU as backendnetwork.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)#NOTE: The network won't feed directly the image we have to First Preprocess it To match the input shape of network also the type i.e. Blob it genrally refers to a mathematical form of binary Images Like BitmapWidth,Height=320,320 # sqaure image so the network grid should be n*n equal on both dimensionConfidence_Threshould=0.5 # minimum problity for claiming the predictionNMS_Threshould=0.3cap=cv2.VideoCapture('game.mp4')fps = cap.get(cv2.CAP_PROP_FPS)timestamps = [cap.get(cv2.CAP_PROP_POS_MSEC)]# function to find objects on captured video streamdef findObjects(outputs,image): h,w,c=image.shape bound_box=[] # for feeding through function classIds=[] confidence=[] for output in outputs: # getting o/p from 2 layers(v3 tiny) 3 if use yolov3 320 for detection in output: scores=detection[5:] #slice first five values cause we are gonnause them in bounding classId=np.argmax(scores) confs=scores[classIds] # filtering object putting them as final prediction only when its breaches the minimum threshould of confidence if confs > Confidence_Threshould: w,h=int(detection[2]*Width),int(detection[3]*Height) # to convert % into pixel x,y=int((detection[0]*Width)-(w/2)),int((detection[1]*Height)-(h/2)) bound_box.append([x,y,w,h]) classIds.append(classId) confidence.append(float(confs)) print(len(bound_box)) # to downsample the no. of boxes on frame we use nms boxes it give indices by which spatial info to keep indices=cv2.dnn.NMSBoxes(bound_box,confs,Confidence_Threshould,NMS_Threshould) for i in indices: i=i[0] box=bound_box[i] x,y,w,h=box[0],box[1],box[2],box[3] cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,0),2) cv2.puttext(image,f'{classes[classIds[i]]}{int(confidence[i]*100)}%', (x,y-10),cv2.FONT_HERSHEY_PLAIN,0.6,(0,255,0),2) cv2.puttext(image,f'FPS:{fps}',(0,150),cv2.FONT_HERSHEY_PLAIN,0.6,(0,255,0),2) cv2.puttext(image,f'TIMESTAMPS:{timestamps}',(150,0),cv2.FONT_HERSHEY_PLAIN,0.6,(0,255,0),2)while True: success,image=cap.read() # coverting image into blob for network i/O processing try: blob=cv2.dnn.blobFromImage(image,1/255,(Width,Height),[0,0,0],crop=False) except: continue # I/P network.setInput(blob) # Setting Input # O/P # As Yolo Architecture Produces Three O/p[Genral Architecture] From The Respective Layer And By Summarize The Max Of Confidence to Decide Final Predictions # But here only 2 o/p of network as we are using the tiny version for higher frame rates # In Order to Get The Outputs We Have To Know the Name Of the Respective Layers #i.e. Not Names Actually But Getting indexes(starting from 1 Not zero) Here By Use Of getUnconnectedOutLayers Function layers_names=network.getLayerNames() #print(network.getUnconnectedOutLayers()) #36th and 48th indexes #looping over as we are traversing multiple values of OutLayers outputNames=[layers_names[i[0]-1]for i in network.getUnconnectedOutLayers()] #-1 cause the index are starting from one not zero #print(outputNames) # for v3 tiny its 16 and 23 are layer name # forwading the image to network outputs=network.forward(outputNames) # finding objects # print(outputs[0].shape)=>(300,85) 300=>no.of boxes 85=>[cx,cy,height,width,confidence,probablity of 80 classes] # using the cx,cy,h,w we are gonna determine the bounding box # print(outputs[1].shape)=>(1200,85) 1200 boxes this shape present in m*n format i.e. matrix faishion where 1200 rows of boxes map with 85 vector details explained aboved #print(outputs[0][0]) findObjects(outputs,image) cv2.imshow('Window',image) if cv2.waitKey(15) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows()
回答:
你的代码有很多问题。
- 你必须使用从图像中获取的h,w,而不是你用于将图像转换为yoloV3 blob的默认宽度和高度。
更改
w,h=int(detection[2]*Width),int(detection[3]*Height) x,y=int((detection[0]*Width)-(w/2)),int((detection[1]*Height)-(h/2))
为
w,h = int(det[2]*w) , int(det[3]*h) x,y = int((det[0]*w)-Width/2) , int((det[1]*h)-Height/2)
- 你混淆了confs和confidence,这造成了混乱,你可以参考murtaza教程进行检查,但这需要一些时间。
可能还有一些我遗漏的小错误。
———————————- 最终解决方案: ———————————-
为了节省你的时间,这里是你的项目可以工作的正确代码风格。
注意1:我稍微改变了coco.names标签的加载方法,你的方法在我Macbook Pro上效果不好。
注意2:在我的代码中,你需要将文件路径改回你原始代码中的路径。
yolov3_cfg='/root/Downloads/ML TASK/yolov3.cfg'
yolov3_weights='/root/Downloads/ML TASK/yolov3.weights'