官方演示显示我们可以使用 show_result(img, result, out_file='result.jpg')
API 在图片上绘制结果。
model = init_detector('configs/any-config.py', 'checkpoints/any-checkpoints.pth', device='cpu')results = inference_detector(model, 'some_pic.png')model.show_result('some_pic.png', results, 'some_pic_results.png')
在调试工具中我发现 results
的格式是一个包含 list[][]
的元组。我们应该如何从中获取坐标/形状?
是否有对该格式的更详细描述,或者有一个直接的 API 可以将 results
转换成更易用的 JSON(例如 COCO 数据集格式)?
回答:
好的,我结合了几种方法,得到了一个可用的方法。🤣
如果你们有更好的方法,请告诉我。
convert_polygon:
# 此方法结合了:# mmdetection.mmdet.models.detectors.base.BaseDetector.show_result# open-mmlab\Lib\site-packages\mmdet\core\visualization\image.py imshow_det_bboxes, draw_bboxes, draw_labels, draw_masksdef convert_polygon( result, score_thr=0.3,): from matplotlib.patches import Polygon import numpy as np import torch import cv2 ms_bbox_result, ms_segm_result = result if isinstance(ms_bbox_result, dict): result = (ms_bbox_result['ensemble'], ms_segm_result['ensemble']) if isinstance(result, tuple): bbox_result, segm_result = result if isinstance(segm_result, tuple): segm_result = segm_result[0] # ms rcnn else: bbox_result, segm_result = result, None bboxes = np.vstack(bbox_result) labels = [ np.full(bbox.shape[0], i, dtype=np.int32) for i, bbox in enumerate(bbox_result) ] labels = np.concatenate(labels) # 绘制分割掩码 segms = segm_result if segm_result is not None and len(labels) > 0: # 非空 segms = mmcv.concat_list(segm_result) if isinstance(segms[0], torch.Tensor): segms = torch.stack(segms, dim=0).detach().cpu().numpy() else: segms = np.stack(segms, axis=0) assert bboxes is None or bboxes.ndim == 2, \ f' bboxes ndim 应为 2,但其 ndim 为 {bboxes.ndim}。' assert labels.ndim == 1, \ f' labels ndim 应为 1,但其 ndim 为 {labels.ndim}。' assert bboxes is None or bboxes.shape[1] == 4 or bboxes.shape[1] == 5, \ f' bboxes.shape[1] 应为 4 或 5,但其为 {bboxes.shape[1]}。' assert bboxes is None or bboxes.shape[0] <= labels.shape[0], \ 'labels.shape[0] 不应小于 bboxes.shape[0]。' assert segms is None or segms.shape[0] == labels.shape[0], \ 'segms.shape[0] 和 labels.shape[0] 应具有相同的长度。' assert segms is not None or bboxes is not None, \ 'segms 和 bboxes 不应同时为 None。' if score_thr > 0: assert bboxes is not None and bboxes.shape[1] == 5 scores = bboxes[:, -1] inds = scores > score_thr bboxes = bboxes[inds, :] labels = labels[inds] if segms is not None: segms = segms[inds, ...] num_bboxes = 0 ret_label = None ret_bbox = None ret_polygon = None ret_area = None ret_position = None ret_mask = None if bboxes is not None: num_bboxes = bboxes.shape[0] ret_bbox = bboxes ret_polygon = [] for i, bbox in enumerate(bboxes): bbox_int = bbox.astype(np.int32) poly = [[bbox_int[0], bbox_int[1]], [bbox_int[0], bbox_int[3]], [bbox_int[2], bbox_int[3]], [bbox_int[2], bbox_int[1]]] np_poly = np.array(poly).reshape((4, 2)) ret_polygon.append(Polygon(np_poly)) ret_label = labels[:num_bboxes] if segms is not None: ret_mask = [] for i, mask in enumerate(segms): temp_mask = [] from mmdet.core.mask.structures import bitmap_to_polygon contours, _ = bitmap_to_polygon(mask) temp_mask += [Polygon(c) for c in contours] ret_mask.append(temp_mask) if num_bboxes < segms.shape[0]: segms = segms[num_bboxes:] areas = [] positions = [] for mask in segms: _, _, stats, centroids = cv2.connectedComponentsWithStats( mask.astype(np.uint8), connectivity=8) largest_id = np.argmax(stats[1:, -1]) + 1 positions.append(centroids[largest_id]) areas.append(stats[largest_id, -1]) areas = np.stack(areas, axis=0) ret_area = areas ret_position = positions return {'labels': ret_label, 'bboxes': ret_bbox, 'polygons': ret_polygon, 'areas': ret_area, 'positions': ret_position, 'masks': ret_mask}
这些代码的关键部分:
ret_mask = []for i, mask in enumerate(segms): temp_mask = [] from mmdet.core.mask.structures import bitmap_to_polygon contours, _ = bitmap_to_polygon(mask) temp_mask += [Polygon(c) for c in contours] ret_mask.append(temp_mask)
测试代码:
model = init_detector(config_file, checkpoint_file, device='cpu')results = inference_detector(model, test_pic_file)poly = convert_polygon(results)
将 poly
转换为 JSON 后,格式如下:
{ "labels": [1, 1, 2, ...], "bboxes": [ [499.54632568359375, 0.0, 599.1744384765625, 332.5544128417969, 0.9999723434448242], ... ], "polygons": [ [ [499.0, 0.0], [499.0, 332.0], [599.0, 332.0], [599.0, 0.0], [499.0, 0.0] ], ... ], ... ], "areas": null, "positions": null, "masks": [ [ [ [510.0, 0.0], [509.0, 1.0], [508.0, 1.0], ... ], ... ], ... ],}
一些字段很容易猜测。
labels
是每个实例的类别 ID
bboxes
中的前四个数字是矩形边界框的左上角 x, 左上角 y, 右下角 x, 右下角 y
。最后一个数字是该实例的置信度值polygons
包含与上述相同的坐标值- 关于
areas
和positions
没有头绪,因为在测试时它们总是null
masks
包含实例的坐标数组。如果实例没有孔洞,则只有一个数组
2023-07-31 更新:
最近我再次研究 MMDetection,发现其 API 发生了很大变化。最重要的变化是,在 MMDetection3 中,inference_detector
的返回类型变为了 DetDataSample
。
任何新的更新都将推送到 这个 GitHub 仓库。