我在尝试使用COCO数据集训练Detectron2模型时遇到了问题。虽然我的数据集看起来加载正常,但在使用DefaultTrainer
训练模型时,我收到了以下错误:
TypeError: Caught TypeError in DataLoader worker process 1.
这是我的设置:
from detectron2.engine import DefaultTrainer# TOTAL_NUM_IMAGES = 10531cfg = get_cfg()cfg.OUTPUT_DIR = os.path.join('./output')cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))cfg.DATASETS.TRAIN = ("my_dataset_train",)cfg.DATASETS.TEST = ()cfg.DATALOADER.NUM_WORKERS = 2cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoocfg.SOLVER.IMS_PER_BATCH = 2cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR# single_iteration = cfg.SOLVER.IMS_PER_BATCH# iterations_for_one_epoch = TOTAL_NUM_IMAGES / single_iteration# cfg.SOLVER.MAX_ITER = int(iterations_for_one_epoch) * 20cfg.SOLVER.STEPS = [] # do not decay learning ratecfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (person). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False)trainer.train()
在几次迭代后,我得到了这个错误:
[01/06 15:14:00 d2.utils.events]: eta: 11:25:20 iter: 125 total_loss: 0.9023 loss_cls: 0.1827 loss_box_reg: 0.1385 loss_mask: 0.5601 loss_rpn_cls: 0.009945 loss_rpn_loc: 0.0023 time: 0.5232 data_time: 0.3085 lr: 3.1219e-05 max_mem: 3271M---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-17-8c48e6e17647> in <module>() 26 trainer = DefaultTrainer(cfg) 27 trainer.resume_or_load(resume=False)---> 28 trainer.train()8 frames/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self) 432 # instantiate since we don't know how to 433 raise RuntimeError(msg) from None--> 434 raise exception 435 436 TypeError: Caught TypeError in DataLoader worker process 1.Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch data.append(next(self.dataset_iter)) File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 201, in __iter__ yield self.dataset[idx] File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 90, in __getitem__ data = self._map_func(self._dataset[cur_idx]) File "/usr/local/lib/python3.7/dist-packages/detectron2/utils/serialize.py", line 26, in __call__ return self._obj(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 189, in __call__ self._transform_annotations(dataset_dict, transforms, image_shape) File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 128, in _transform_annotations for obj in dataset_dict.pop("annotations") File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 129, in <listcomp> if obj.get("iscrowd", 0) == 0 File "/usr/local/lib/python3.7/dist-packages/detectron2/data/detection_utils.py", line 297, in transform_instance_annotations p.reshape(-1) for p in transforms.apply_polygons(polygons) File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 297, in <lambda> return lambda x: self._apply(x, name) File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 291, in _apply x = getattr(t, meth)(x) File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in apply_polygons return [self.apply_coords(p) for p in polygons] File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in <listcomp> return [self.apply_coords(p) for p in polygons] File "/usr/local/lib/python3.7/dist-packages/detectron2/data/transforms/transform.py", line 150, in apply_coords coords[:, 0] = coords[:, 0] * (self.new_w * 1.0 / self.w)TypeError: can't multiply sequence by non-int of type 'float'
回答:
原来是”annotations”中的一些id是以科学记数法书写的,导致这些id的类型为浮点数。将这些id转换为整数后问题得到了解决。