如何从 MMDetection 的结果中转换并获取坐标/形状？

官方演示显示我们可以使用 show_result(img, result, out_file='result.jpg') API 在图片上绘制结果。

model = init_detector('configs/any-config.py', 'checkpoints/any-checkpoints.pth', device='cpu')results = inference_detector(model, 'some_pic.png')model.show_result('some_pic.png', results, 'some_pic_results.png')

在调试工具中我发现 results 的格式是一个包含 list[][] 的元组。我们应该如何从中获取坐标/形状？

是否有对该格式的更详细描述，或者有一个直接的 API 可以将 results 转换成更易用的 JSON（例如 COCO 数据集格式）？

回答：

好的，我结合了几种方法，得到了一个可用的方法。🤣
如果你们有更好的方法，请告诉我。

convert_polygon:

# 此方法结合了：# mmdetection.mmdet.models.detectors.base.BaseDetector.show_result# open-mmlab\Lib\site-packages\mmdet\core\visualization\image.py imshow_det_bboxes, draw_bboxes, draw_labels, draw_masksdef convert_polygon(        result,        score_thr=0.3,):    from matplotlib.patches import Polygon    import numpy as np    import torch    import cv2    ms_bbox_result, ms_segm_result = result    if isinstance(ms_bbox_result, dict):        result = (ms_bbox_result['ensemble'],                  ms_segm_result['ensemble'])    if isinstance(result, tuple):        bbox_result, segm_result = result        if isinstance(segm_result, tuple):            segm_result = segm_result[0]  # ms rcnn    else:        bbox_result, segm_result = result, None    bboxes = np.vstack(bbox_result)    labels = [        np.full(bbox.shape[0], i, dtype=np.int32)        for i, bbox in enumerate(bbox_result)    ]    labels = np.concatenate(labels)    # 绘制分割掩码    segms = segm_result    if segm_result is not None and len(labels) > 0:  # 非空        segms = mmcv.concat_list(segm_result)        if isinstance(segms[0], torch.Tensor):            segms = torch.stack(segms, dim=0).detach().cpu().numpy()        else:            segms = np.stack(segms, axis=0)    assert bboxes is None or bboxes.ndim == 2, \        f' bboxes ndim 应为 2，但其 ndim 为 {bboxes.ndim}。'    assert labels.ndim == 1, \        f' labels ndim 应为 1，但其 ndim 为 {labels.ndim}。'    assert bboxes is None or bboxes.shape[1] == 4 or bboxes.shape[1] == 5, \        f' bboxes.shape[1] 应为 4 或 5，但其为 {bboxes.shape[1]}。'    assert bboxes is None or bboxes.shape[0] <= labels.shape[0], \        'labels.shape[0] 不应小于 bboxes.shape[0]。'    assert segms is None or segms.shape[0] == labels.shape[0], \        'segms.shape[0] 和 labels.shape[0] 应具有相同的长度。'    assert segms is not None or bboxes is not None, \        'segms 和 bboxes 不应同时为 None。'    if score_thr > 0:        assert bboxes is not None and bboxes.shape[1] == 5        scores = bboxes[:, -1]        inds = scores > score_thr        bboxes = bboxes[inds, :]        labels = labels[inds]        if segms is not None:            segms = segms[inds, ...]    num_bboxes = 0    ret_label = None    ret_bbox = None    ret_polygon = None    ret_area = None    ret_position = None    ret_mask = None    if bboxes is not None:        num_bboxes = bboxes.shape[0]        ret_bbox = bboxes        ret_polygon = []        for i, bbox in enumerate(bboxes):            bbox_int = bbox.astype(np.int32)            poly = [[bbox_int[0], bbox_int[1]], [bbox_int[0], bbox_int[3]],                    [bbox_int[2], bbox_int[3]], [bbox_int[2], bbox_int[1]]]            np_poly = np.array(poly).reshape((4, 2))            ret_polygon.append(Polygon(np_poly))        ret_label = labels[:num_bboxes]    if segms is not None:        ret_mask = []        for i, mask in enumerate(segms):            temp_mask = []            from mmdet.core.mask.structures import bitmap_to_polygon            contours, _ = bitmap_to_polygon(mask)            temp_mask += [Polygon(c) for c in contours]            ret_mask.append(temp_mask)        if num_bboxes < segms.shape[0]:            segms = segms[num_bboxes:]            areas = []            positions = []            for mask in segms:                _, _, stats, centroids = cv2.connectedComponentsWithStats(                    mask.astype(np.uint8), connectivity=8)                largest_id = np.argmax(stats[1:, -1]) + 1                positions.append(centroids[largest_id])                areas.append(stats[largest_id, -1])            areas = np.stack(areas, axis=0)            ret_area = areas            ret_position = positions    return {'labels': ret_label,            'bboxes': ret_bbox,            'polygons': ret_polygon,            'areas': ret_area,            'positions': ret_position,            'masks': ret_mask}

这些代码的关键部分：

ret_mask = []for i, mask in enumerate(segms):    temp_mask = []    from mmdet.core.mask.structures import bitmap_to_polygon    contours, _ = bitmap_to_polygon(mask)    temp_mask += [Polygon(c) for c in contours]    ret_mask.append(temp_mask)

测试代码：

model = init_detector(config_file, checkpoint_file, device='cpu')results = inference_detector(model, test_pic_file)poly = convert_polygon(results)

将 poly 转换为 JSON 后，格式如下：

{    "labels": [1, 1, 2, ...],    "bboxes": [            [499.54632568359375, 0.0, 599.1744384765625, 332.5544128417969, 0.9999723434448242],            ...    ],    "polygons": [        [ [499.0, 0.0], [499.0, 332.0], [599.0, 332.0], [599.0, 0.0], [499.0, 0.0] ],        ...    ],    ...    ],    "areas": null,    "positions": null,    "masks": [        [            [                [510.0, 0.0],                [509.0, 1.0],                [508.0, 1.0],                ...            ],            ...        ],        ...    ],}

一些字段很容易猜测。

labels 是每个实例的 类别 ID
bboxes 中的前四个数字是矩形边界框的 左上角 x, 左上角 y, 右下角 x, 右下角 y。最后一个数字是该实例的置信度值
polygons 包含与上述相同的坐标值
关于 areas 和 positions 没有头绪，因为在测试时它们总是 null
masks 包含实例的坐标数组。如果实例没有孔洞，则只有一个数组

2023-07-31 更新：

最近我再次研究 MMDetection，发现其 API 发生了很大变化。最重要的变化是，在 MMDetection3 中，inference_detector 的返回类型变为了 DetDataSample。

任何新的更新都将推送到这个 GitHub 仓库。

学技术

如何从 MMDetection 的结果中转换并获取坐标/形状？

发表回复取消回复

相关文章：

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复