0.遇到訓練問題
./tool/dish_train.sh
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). (prepare_for_backward at /pytorch/torch/csrc/distributed/c10d/reducer.cpp:518)
問題解決:
vim mmdet/apis/train.py
# put model on gpus
if distributed:
find_unused_parameters = True #cfg.get('find_unused_parameters', False)
# Sets the `find_unused_parameters` parameter in
# torch.nn.parallel.DistributedDataParallel
model = MMDistributedDataParallel(
model.cuda(),
device_ids=[torch.cuda.current_device()],
broadcast_buffers=False,
find_unused_parameters=find_unused_parameters)
else:
model = MMDataParallel(
model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)
網上是這樣修改,但是這是存在問題的
單卡用戶建議使用python train.py
1.改類別數
vim configs/ms_rcnn/ms_rcnn_r50_fpn_1x_coco.py
vim configs/_base_/models/mask_rcnn_r50_fpn.py
新版本的改類別需要改這兩個文件:
我是一類,所以num_classes=1